python3 网页爬虫图片下载无效链接处理 try except

2024-08-20 13:20:10 217人阅读

代码比较粗糙，主要是备忘容易出错的地方。供自己以后查阅。

#图片下载

import re

import urllib.request #python3中模块名和2.x（urllib）的不一样

site=‘https://world.taobao.com/item/530762904536.htm?spm=a21bp.7806943.topsale_XX.4.jcjxZC‘

page=urllib.request.urlopen(site)

html=page.read()

html=html.decode(‘utf-8‘) #读取下来的网页源码需要转换成utf-8格式

reg=r‘src="http://(gd.*?jpg)‘

imgre=re.compile(reg)

imglist=re.findall(imgre,html)

trueurls=[]

for i in imglist:

trueurls.append(i.replace(‘gd‘,‘http://gd‘))

trueurls[2]=‘http://wlgsad.com.jpg‘

print (trueurls)

x=200

for j in trueurls:

try:

urllib.request.urlretrieve(j,‘%s.jpg‘ %x)

except Exception : #except Exception as e:

pass # print (e)

# print (‘有无效链接‘)

x=x+1

在except子句可以打印出一些提示信息

下载图片的时候，如果有无效的链接，可以用try except跳过无效链接继续下一个图片的下载

python3 网页爬虫图片下载无效链接处理 try except

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们