python多线程实现抓取网页

首页 > 代码库 > python多线程实现抓取网页

python多线程实现抓取网页

2024-07-10 02:37:27 219人阅读

Python实现抓取网页

下面的Python抓取网页的程序比较初级，只能抓取第一页的url所属的页面，只要预定URL足够多，保证你抓取的网页是无限级别的哈，下面是代码：

##coding:utf-8
'''
	无限抓取网页
	@author wangbingyu
	@date 2014-06-26
'''
import sys,urllib,re,thread,time,threading

'''
创建下载线程类
'''
class download(threading.Thread):
	def __init__(self,url,threadName):
		threading.Thread.__init__(self,name=threadName)
		self.thread_stop = False
		self.url = url
	
	def run(self):
		while not self.thread_stop:
			self.list = self.getUrl(self.url)
			self.downloading(self.list)
	
	def stop(self):
		self.thread_stop = True
			
	def downloading(self,list):
		try:
			for i in range(len(list) - 1):
				urllib.urlretrieve(list[i],'E:\upload\download\%s.html' %  time.time())
		except Exception,ex:
			print Exception,'_upload:',ex
	
	def getUrl(self,url):
		result = []
		s = urllib.urlopen(url).read();
		ss = s.replace(' ','')
		urls=re.findall('<a.*?href=http://www.mamicode.com/.*?',ss,re.I)>

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > python多线程实现抓取网页

python多线程实现抓取网页

看完仍有疑问？有类似问题直接问程序猿