Python 爬虫网页抓图保存

首页 > 代码库 > Python 爬虫网页抓图保存

2024-08-02 12:01:09 223人阅读

网站选择桌面壁纸网站的汽车主题：

下面的两个print在调试时打开

#print tag
#print attrs

#!/usr/bin/env python
import re
import urllib2
import HTMLParser
base = "http://desk.zol.com.cn"
path = '/home/mk/cars/'
star = ''
def get_url(html):
	parser = parse(False)
	request = urllib2.Request(html)
	response = urllib2.urlopen(request)
	resp = response.read()
	parser.feed(resp)
def download(url):
	content = urllib2.urlopen(url).read()
	format = '[0-9]*\.jpg';
	res = re.search(format,url);
	print 'downloading:',res.group()
	filename = path+res.group()
	f = open(filename,'w+')
	f.write(content)
	f.close()	 
class parse(HTMLParser.HTMLParser):
	def __init__(self,Index):
		self.Index = Index;
		HTMLParser.HTMLParser.__init__(self)
	def handle_starttag(self,tag,attrs):
		#print tag
		#print attrs
		if(self.Index):
			if not cmp(tag,'a'):
				if(len(attrs) == 4):
					if(attrs[0] ==('class','pic')):
						#print tag
						#print attrs
						new = base+attrs[1][1]
						print 'found a link:',new
						global star
						star = new
						get_url(new)
		else:
			if not cmp(tag,'img'):
				if(attrs[0] == ('id','bigImg')):
					#print tag
					#print attrs
					Image_url = attrs[1][1]
					print 'found a picture:',Image_url
					download(Image_url)
			if not cmp(tag,'a'):
				if (len(attrs) == 4):
					if (attrs[1] == ('class','next')):
						#print tag
						#print attrs
						next = base + attrs[2][1]
						print 'found a link:',next
						if (star != next):
							get_url(next)
Index_url = 'http://desk.zol.com.cn/qiche/'
con = urllib2.urlopen(Index_url).read()
Parser_index = parse(True)
Parser_index.feed(con)

仅仅就是抓桌面壁纸网站上的优美的壁纸。。。

Python 爬虫网页抓图保存

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > Python 爬虫网页抓图保存

Python 爬虫网页抓图保存

看完仍有疑问？有类似问题直接问程序猿