python学习-爬虫

首页 > 代码库 > python学习-爬虫

2024-08-13 14:46:09 227人阅读

转载自静觅的博客

最普通下载网页

1 import urrlib2 2 response = urllib2.urlopen("http://www.baidu.com")3 print response.read()

Post方式

1 import urllib2 import urllib23 4 values = {"username":"*****", "password":"*****"}5 data =http://www.mamicode.com/ urllib.urlencode(values)6 url = "   "7 request = urllib2.Request(url,data)8 response = urlopen(request)9 print response.read()

Get方式

 1 import urllib2 2 import urllib 3  4 values = {} 5 values["username"] =  6 values["password"] =  7 data =http://www.mamicode.com/ urlencode(values) 8 url =  9 geturl = url + "?" + data10 request = urllib2.Request(geturl)11 response = urllib2.urlopen(request)12 print response.read()

设置代理

1 import urllib22 enable_proxy = True3 proxy_handler = urllib2.ProxyHandler({"http" : ‘http://some-proxy.com:8080‘})4 null_proxy_handler = urllib2.ProxyHandler({})5 if enable_proxy:6     opener = urllib2.build_opener(proxy_handler)7 else:8     opener = urllib2.build_opener(null_proxy_handler)9 urllib2.install_opener(opener)

设置延时

1 import urllib22 response = urllib2.urlopen(‘http://www.baidu.com‘,data, 10)

异常处理

 1 import urllib2 2  3 req = urllib2.Request(‘http://blog.csdn.net/cqcre‘) 4 try: 5     urllib2.urlopen(req) 6 except urllib2.URLError, e: 7     if hasattr(e,"code"): 8         print e.code 9     if hasattr(e,"reason"):10         print e.reason11 else:12     print "OK"

设置cookie

 1 import urllib 2 import urllib2 3 import cookielib 4  5 filename = ‘cookie.txt‘ 6 #声明一个MozillaCookieJar对象实例来保存cookie，之后写入文件 7 cookie = cookielib.MozillaCookieJar(filename) 8 opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie)) 9 postdata =http://www.mamicode.com/ urllib.urlencode({10             ‘stuid‘:‘201200131012‘,11             ‘pwd‘:‘23342321‘12         })13 #登录教务系统的URL14 loginUrl = ‘http://jwxt.sdu.edu.cn:7890/pls/wwwbks/bks_login2.login‘15 #模拟登录，并把cookie保存到变量16 result = opener.open(loginUrl,postdata)17 #保存cookie到cookie.txt中18 cookie.save(ignore_discard=True, ignore_expires=True)19 #利用cookie请求访问另一个网址，此网址是成绩查询网址20 gradeUrl = ‘http://jwxt.sdu.edu.cn:7890/pls/wwwbks/bkscjcx.curscopre‘21 #请求访问成绩查询网址22 result = opener.open(gradeUrl)23 print result.read()

python学习-爬虫

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > python学习-爬虫

python学习-爬虫

看完仍有疑问？有类似问题直接问程序猿