网页抓取 - 程序员工具箱

2000万优秀解决方案库，覆盖所有编程及软件开发类，极速查询

今日已更新 1500 篇代码解决方案

热搜：

首页 > 代码库 > 网页抓取

网页抓取

2024-07-15 15:09:43 221人阅读

### -*- coding: cp936 -*-
###<a href="http://home.51cto.com" target="_blank">家园</a>
##import urllib
##str0=‘<a href="http://home.51cto.com" target="_blank">家园</a>‘
##href=http://www.mamicode.com/str0.find(‘##print href
##com=str0.find(‘.com"‘)
##print com
##url=str0[href+9:com+4]
##print url
##content=urllib.urlopen(url).read()
###print content
##filename=url[-9:]
##print filename
##open(filename,‘w‘).write(content)
####_________________________________
import urllib
url = [‘‘]*50
con = urllib.urlopen(‘http://blog.sina.com.cn/s/articlelist_1191258123_0_1.html‘).read()
i = 0
title = con.find(r‘<a title=‘)
href = http://www.mamicode.com/con.find(‘href=‘,title)
html = con.find(‘.html‘,href)

while title !=-1 and href != -1 and html != -1 and i < 50 :
url[i] = con[href + 6:html + 5]
print url[i]
title = con.find(‘<a title=‘,html)
href = http://www.mamicode.com/con.find(‘href=‘,title)
html = con.find(‘.html‘,href)
i = i + 1
else:
    print ‘find end!‘
j = 0
while j < 50:
    content = urllib.urlopen(url[j]).read()
    open(‘hanhan/‘+url[j][-26:],‘w‘).write(content)
    j = j + 1
else:
    print "over"

##
##--------------------------------------------

本文出自 “sai” 博客，请务必保留此出处http://qingsto.blog.51cto.com/3570923/1535126

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们