首页 > 代码库 > python 爬图
python 爬图
利用bs库进行爬取,在下载html时,使用代理user_agent来下载,并且下载次数是2次,当第一次下载失败后,并且http状态码是500-600之间,然后会重新下载一次
soup = BeautifulSoup(html, "html.parser")
当前页面时html的
当当前页面时html5时
soup = BeautifulSoup(html, "html5lib")
#-*- coding:utf-8 -*- import re import urllib import urllib2 import lxml.html import itertools import os from bs4 import BeautifulSoup def download(url,user_agent=‘wswp‘,num_try = 2): print ‘Downloading:‘,url headers = {‘User_agent‘:user_agent} request = urllib2.Request(url,headers=headers) try: html = urllib2.urlopen(request).read() except urllib2.URLError as e: print ‘Download error‘,e.reason html = None if num_try > 0: if hasattr(e,‘code‘) and 500 <= e.code <600: return download(url,user_agent,num_try-1) return html def download_picture(url,path,name): if not os.path.isdir(path): os.mkdir(path) f = open(path+‘/‘ + name + ‘.jpg‘, ‘wb‘) f.write(download(url)) f.close() def bs_scraper(html): soup = BeautifulSoup(html, "html.parser") results = soup.find_all(name=‘img‘,attrs={‘class‘:‘BDE_Image‘}) tt = 0 for each in results: src = each.get(‘src‘) print src download_picture(src,‘/picture‘,str(tt)) tt = tt + 1 url = ‘https://tieba.baidu.com/p/4693368072‘ html = download(url) bs_scraper(html)
python 爬图
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。