首页 > 代码库 > Some in urllib2 - python2.7
Some in urllib2 - python2.7
1. urlopen可以给一个Request Object返回一个response object,read()读取相应对象的内容,这时候的print(the_page)可以输出网页的html内容
1 import urllib22 3 req = urllib2.Request(‘http://www.voidspace.org.uk‘)4 response = urllib2.urlopen(req)5 the_page = response.read()6 7 print(the_page)
2. Request对象可以给server传输数据,还可以传输一些额外信息(metadata),如HTTP"headers"
3.如我们所知request可以用POST方式给server传输数据,这些数据可以通过标准方式进行编码之后进行传输,这里用了urlencode函数进行编码
1 import urllib2 2 import urllib 3 4 url = ‘http://www.someserver.com/cgi-bin/register.cgi‘ 5 6 values = {‘name‘:‘Michael Foord‘, 7 ‘location‘: ‘Northampton‘, 8 ‘language‘: ‘Python‘ 9 }10 11 data =http://www.mamicode.com/ urllib.urlencode(values)12 req = urllib2.Request(url, data)13 response = urllib2.urlopen(req)14 15 the_page = response.read()
当然也可以用GET模式来传输数据,默认没有加data参数的时候就是使用GET模式,实际上我们知道POST是将数据编码后打包发送,GET类似与将数据加在url的末尾进行传输
1 import urllib2 2 import urllib 3 4 5 values = {‘name‘:‘Michael Foord‘, 6 ‘location‘: ‘Northampton‘, 7 ‘language‘: ‘Python‘ 8 } 9 10 data =http://www.mamicode.com/ urllib.urlencode(values)11 print(data) # encoded data12 13 url = ‘http://www.example.com/example.cgi‘14 full_url = url + ‘?‘ + data #use ‘?‘ to add data at the end15 req = urllib2.Request(full_url)16 response = urllib2.urlopen(req)17 18 the_page = response.read()19 print(the_page)
4.Headers
一些服务器只提供给浏览器访问,而上面的方式默认以名字python-urllib/2.7进行访问,所以需要将自己“伪装”成浏览器的名字
1 import urllib 2 import urllib2 3 4 url = ‘http://www.someserver.com/cgi-bin/register.cgi‘ 5 user_agent = ‘Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)‘ 6 7 values = {‘name‘ : ‘Michael Foord‘, 8 ‘location‘ : ‘Northampton‘, 9 ‘language‘ : ‘Python‘ }10 11 headers = { ‘User-Agent‘ : user_agent }12 data =http://www.mamicode.com/ urllib.urlencode(values)13 14 req = urllib2.Request(url, data, headers)15 response = urllib2.urlopen(req)16 the_page = response.read()
5. URLError with a "reason" attribute
1 import urllib 2 import urllib2 3 from urllib2 import URLError 4 5 req = urllib2.Request(‘http://www.pretend_server.org‘) 6 7 try: 8 urllib2.urlopen(req) 9 except URLError as e:10 print e.reason
6. HTTPError with a "code" attribute, codes in the 100-299 range indicatesuccess, you will usually only see error codes in the 400-599 range.
1 import urllib 2 import urllib2 3 from urllib2 import URLError 4 5 req = urllib2.Request(‘http://www.python.org/fish.html‘) 6 7 try: 8 urllib2.urlopen(req) 9 except urllib2.HTTPError as e:10 print e.code11 print e.read()
7. Two basic approaches
1 #1 2 from urllib2 import Request, urlopen, URLError, HTTPError 3 4 req = Request(someurl) 5 6 try: 7 response = urlopen(req) 8 except HTTPError as e: 9 print ‘The server couldn\‘t fulfill the request.‘10 print ‘Error code: ‘, e.code11 except URLError as e:12 print ‘We failed to reach a server.‘13 print ‘Reason: ‘, e.reason14 else:15 print(‘everything is fine‘)16 17 #218 from urllib2 import Request, urlopen, URLError19 20 req = Request(someurl)21 try:22 response = urlopen(req)23 except URLError as e:24 if hasattr(e, ‘reason‘):25 print ‘We failed to reach a server.‘26 print ‘Reason: ‘, e.reason27 elif hasattr(e, ‘code‘):28 print ‘The server couldn\‘t fulfill the request.‘29 print ‘Error code: ‘, e.code30 else:31 # everything is fine
8. Basic Authentication
当需要认证的时候,服务器会发出一个header来请求认证,如WWW-Authenticate: Basic realm="cPanel Users",然后用户可以把用户名和密码作为一个header加在requese中再次请求.
一般不需要考虑格式范围的话可以直接用HTTPPasswordMgrWithDefaultRealm来设定某个URL的用户和密码
1 from urllib2 import Request, urlopen, URLError 2 import urllib2 3 4 #create a password manager 5 password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() 6 7 username = ‘Prime‘ 8 password = ‘Bee‘ 9 10 top_level_url = "http://example.com/foo/"11 password_mgr.add_password(None, top_level_url, username, password)12 13 handler = urllib2.HTTPBasicAuthHandler(password_mgr)14 15 opener = urllib2.build_opener(handler)16 opener.open(someurl)17 18 # Install the opener, not necessarily19 urllib2.install_opener(opener)
9. 设置socket的默认等待时间
1 import socket2 3 timeout = 104 socket.setdefaulttimeout(timeout)