首页 > 代码库 > 爬虫实现模拟登陆豆瓣
爬虫实现模拟登陆豆瓣
一:
# -*- encoding:utf-8 -*- import requests from bs4 import BeautifulSoup import urllib import re loginUrl = ‘http://accounts.douban.com/login‘ formData={ "redir":"http://movie.douban.com/mine?status=collect", "form_email":"ewew2150@126.com", "form_password":2150306, "login":u‘登录‘ } headers = {"User-Agent":‘Mozilla/5.0 (Windows NT 6.1)\AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36‘} r = requests.post(loginUrl,data=http://www.mamicode.com/formData,headers=headers) page = r.text #print r.url ‘‘‘‘‘获取验证码图片‘‘‘ #利用bs4获取captcha地址 soup = BeautifulSoup(page,"html.parser") captchaAddr = soup.find(‘img‘,id=‘captcha_image‘)[‘src‘] print captchaAddr #利用正则表达式获取captcha的ID reCaptchaID = r‘<input type="hidden" name="captcha-id" value="http://www.mamicode.com/(.*?)"/‘ captchaID = re.findall(reCaptchaID,page) print captchaID #保存到本地 urllib.urlretrieve(captchaAddr,"D:\captcha.jpg") captcha = raw_input(‘please input the captcha:‘) formData[‘captcha-solution‘] = captcha formData[‘captcha-id‘] = captchaID print "test" r = requests.post(loginUrl,data=http://www.mamicode.com/formData,headers=headers) page = r.text if r.url==‘http://movie.douban.com/mine?status=collect‘: print ‘Login successfully!!!‘ print ‘我看过的电影‘,‘-‘*60 #获取看过的电影 soup = BeautifulSoup(page,"html.parser") result = soup.findAll(‘li‘,attrs={"class":"title"}) #print result for item in result: print item.find(‘a‘).get_text() else: print "failed!"
二:比较好的写法
# -*- encoding:utf-8 -*- import requests from bs4 import BeautifulSoup import urllib import re loginUrl = ‘http://accounts.douban.com/login‘ formData={ "redir":"http://movie.douban.com/mine?status=collect", "form_email":"ewew2150@126.com", "form_password":2150306, "login":u‘登录‘ } headers = {"User-Agent":‘Mozilla/5.0 (Windows NT 6.1)\AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36‘} r = requests.post(loginUrl,data=http://www.mamicode.com/formData,headers=headers) page = r.text #print r.url ‘‘‘‘‘获取验证码图片‘‘‘ #利用bs4获取captcha地址 soup = BeautifulSoup(page,"html.parser") captchaAddr = soup.find(‘img‘,id=‘captcha_image‘)[‘src‘] print captchaAddr #利用正则表达式获取captcha的ID reCaptchaID = r‘<input type="hidden" name="captcha-id" value="http://www.mamicode.com/(.*?)"/‘ captchaID = re.findall(reCaptchaID,page) print captchaID #保存到本地 urllib.urlretrieve(captchaAddr,"D:\captcha.jpg") captcha = raw_input(‘please input the captcha:‘) formData[‘captcha-solution‘] = captcha formData[‘captcha-id‘] = captchaID print "test" r = requests.post(loginUrl,data=http://www.mamicode.com/formData,headers=headers) page = r.text if r.url==‘http://movie.douban.com/mine?status=collect‘: print ‘Login successfully!!!‘ print ‘我看过的电影‘,‘-‘*60 #获取看过的电影 soup = BeautifulSoup(page,"html.parser") result = soup.findAll(‘li‘,attrs={"class":"title"}) #print result for item in result: print item.find(‘a‘).get_text() else: print "failed!"
爬虫实现模拟登陆豆瓣
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。