python3抓取异步百度瀑布流动态图片（二）get、json下载代码讲解

首页 > 代码库 > python3抓取异步百度瀑布流动态图片（二）get、json下载代码讲解

python3抓取异步百度瀑布流动态图片（二）get、json下载代码讲解

2024-08-11 03:11:26 219人阅读

制作解析网址的get

 1 def gethtml(url,postdata): 2  3     header = {‘User-Agent‘: 4                 ‘Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0‘, 5                 ‘Referer‘: 6                 ‘http://image.baidu.com‘, 7                 ‘Host‘: ‘image.baidu.com‘, 8                 ‘Accept‘: ‘text/plain, */*; q=0.01‘, 9                 ‘Accept-Encoding‘:‘gzip, deflate‘,10                 ‘Accept-Language‘:‘zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3‘,11                 ‘Connection‘:‘keep-alive‘}12 13     # 解析网页14     html_bytes = requests.get(url, headers=header,params = postdata)15 16     return html_bytes

头部的央视请参考上一篇博文：

python3抓取异步百度瀑布流动态图片（一）查找post并伪装头方法

分析网址：

http://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&is=&fp=result&queryWord=gif&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&word=gif&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&pn=30&rn=30&gsm=1e&1472364207674=

分解为：

url = ‘http://image.baidu.com/search/acjson?‘ + postdata + lasturl

lasturl为时间戳，精确到后三位小数的时间戳，构造这个时间戳，后三位小数我就随机生成一个三位数了：

1 import time2 import random3 timerandom = random.randint(100,999)4 nowtime = int(time.time())5 lasturl = str(nowtime) + str(timerandom) + ‘=‘

最后制作postdata：

 1 # 构造post 2 postdata =http://www.mamicode.com/ { 3     ‘tn‘:‘resultjson_com‘, 4     ‘ipn‘:‘rj‘, 5     ‘ct‘:201326592, 6     ‘is‘:‘‘, 7     ‘fp‘:‘result‘, 8     ‘queryWord‘: keyword, 9     ‘cl‘: 2,10     ‘lm‘: -1,11     ‘ie‘: ‘utf-8‘,12     ‘oe‘: ‘utf-8‘,13     ‘adpicid‘: ‘‘,14     ‘st‘: -1,15     ‘z‘:‘‘,16     ‘ic‘: 0,17     ‘word‘: keyword,18     ‘s‘: ‘‘,19     ‘se‘: ‘‘,20     ‘tab‘: ‘‘,21     ‘width‘: ‘‘,22     ‘height‘: ‘‘,23     ‘face‘: 0,24     ‘istype‘: 2,25     ‘qc‘: ‘‘,26     ‘nc‘: 1,27     ‘fr‘: ‘‘,28     ‘pn‘: pn,29     ‘rn‘: 30,30     ‘gsm‘: ‘1e‘31 }

其中页数pn和搜索关键字keywork为：

1 # 搜索的关键字2 # keywork = input(‘请输入你要查找的关键字‘)3 keyword = ‘gif‘4 5 # 页数6 # pn = int(input(‘你要抓取多少页：‘))7 pn = 30

将得到的信息保存在本地，当所有都保存下来了再去下载图片：

1 # 解析网址2 contents = gethtml(url,postdata)3 4 # 将文件以json的格式保存在json文件夹5 file = open(‘../json/‘ + str(pn) + ‘.json‘, ‘wb‘)6 file.write(contents.content)7 file.close()

读取文件夹里面的所有文件：

 1 # 找出文件夹下所有xml后缀的文件 2 def listfiles(rootdir, prefix=‘.xml‘): 3     file = [] 4     for parent, dirnames, filenames in os.walk(rootdir): 5         if parent == rootdir: 6             for filename in filenames: 7                 if filename.endswith(prefix): 8                     file.append(rootdir + ‘/‘ + filename) 9             return file10         else:11             pass

遍历json文件夹，读取里面的东西：

 1 # 找到json文件夹下的所有文件名字 2 files = listfiles(‘../json/‘, ‘.json‘) 3 for filename in files: 4     print(filename) 5     # 读取json得到图片网址 6     doc = open(filename, ‘rb‘) 7     # (‘UTF-8‘)(‘unicode_escape‘)(‘gbk‘,‘ignore‘) 8     doccontent = doc.read().decode(‘utf-8‘, ‘ignore‘) 9     product = doccontent.replace(‘ ‘, ‘‘).replace(‘\n‘, ‘‘)10     product = json.loads(product)

查询字典data：

# 得到字典dataonefile = product[‘data‘]

将字典里面的图片网址和图片名称放到数组里面：

技术分享

制作一个解析头来解析图片下载：

 1 def getimg(url): 2  3     # 制作一个专家 4     opener = urllib.request.build_opener() 5  6     # 打开专家头部 7     opener.addheaders = [(‘User-Agent‘, 8                           ‘Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0‘), 9                          (‘Referer‘,10                           ‘http://image.baidu.com‘),11                          (‘Host‘, ‘image.baidu.com‘)]12     # 分配专家13     urllib.request.install_opener(opener)14 15     # 解析img16     html_img = urllib.request.urlopen(url)17 18     return html_img

最后将图片下载到本地的gif文件夹：

 1 for item in onefile: 2     try: 3         pic = getimg(item[‘thumbURL‘]) 4         # 保存地址和名称 5         filenamep = ‘../gif/‘ + validateTitle(item[‘fromPageTitleEnc‘] + ‘.gif‘) 6         # 保存为gif 7         filess = open(filenamep, ‘wb‘) 8         filess.write(pic.read()) 9         filess.close()10 11         # 每一次下载都暂停1-3秒12         loadimg = random.randint(1, 3)13         print(‘图片‘ + filenamep + ‘下载完成‘)14         print(‘暂停‘ + loadimg + ‘秒‘)15         time.sleep(loadimg)16 17     except Exception as err:18         print(err)19         print(‘暂停‘ + loadimg + ‘秒‘)20         time.sleep(loadimg)21         pass

得到效果如下：

技术分享

python3抓取异步百度瀑布流动态图片（二）get、json下载代码讲解

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > python3抓取异步百度瀑布流动态图片（二）get、json下载代码讲解

python3抓取异步百度瀑布流动态图片（二）get、json下载代码讲解

python3抓取异步百度瀑布流动态图片（一）查找post并伪装头方法

看完仍有疑问？有类似问题直接问程序猿