首页 > 代码库 > python学习-爬虫

python学习-爬虫

转载自 静觅的博客

最普通下载网页

1 import urrlib2 2 response = urllib2.urlopen("http://www.baidu.com")3 print response.read()

Post方式

1 import urllib2 import urllib23 4 values = {"username":"*****", "password":"*****"}5 data =http://www.mamicode.com/ urllib.urlencode(values)6 url = "   "7 request = urllib2.Request(url,data)8 response = urlopen(request)9 print response.read()

Get方式

 1 import urllib2 2 import urllib 3  4 values = {} 5 values["username"] =  6 values["password"] =  7 data =http://www.mamicode.com/ urlencode(values) 8 url =  9 geturl = url + "?" + data10 request = urllib2.Request(geturl)11 response = urllib2.urlopen(request)12 print response.read()

 设置代理

1 import urllib22 enable_proxy = True3 proxy_handler = urllib2.ProxyHandler({"http" : http://some-proxy.com:8080})4 null_proxy_handler = urllib2.ProxyHandler({})5 if enable_proxy:6     opener = urllib2.build_opener(proxy_handler)7 else:8     opener = urllib2.build_opener(null_proxy_handler)9 urllib2.install_opener(opener)

设置延时

1 import urllib22 response = urllib2.urlopen(http://www.baidu.com,data, 10)

 异常处理

 1 import urllib2 2  3 req = urllib2.Request(http://blog.csdn.net/cqcre) 4 try: 5     urllib2.urlopen(req) 6 except urllib2.URLError, e: 7     if hasattr(e,"code"): 8         print e.code 9     if hasattr(e,"reason"):10         print e.reason11 else:12     print "OK"

设置cookie

 1 import urllib 2 import urllib2 3 import cookielib 4  5 filename = cookie.txt 6 #声明一个MozillaCookieJar对象实例来保存cookie,之后写入文件 7 cookie = cookielib.MozillaCookieJar(filename) 8 opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie)) 9 postdata =http://www.mamicode.com/ urllib.urlencode({10             stuid:201200131012,11             pwd:2334232112         })13 #登录教务系统的URL14 loginUrl = http://jwxt.sdu.edu.cn:7890/pls/wwwbks/bks_login2.login15 #模拟登录,并把cookie保存到变量16 result = opener.open(loginUrl,postdata)17 #保存cookie到cookie.txt中18 cookie.save(ignore_discard=True, ignore_expires=True)19 #利用cookie请求访问另一个网址,此网址是成绩查询网址20 gradeUrl = http://jwxt.sdu.edu.cn:7890/pls/wwwbks/bkscjcx.curscopre21 #请求访问成绩查询网址22 result = opener.open(gradeUrl)23 print result.read()

 

python学习-爬虫