爬取糗事百科的热门段子，以及热图链接

首页 > 代码库 > 爬取糗事百科的热门段子，以及热图链接

爬取糗事百科的热门段子，以及热图链接

2024-08-24 08:34:46 218人阅读

# -*- coding:utf-8 -*-
import urllib
import urllib2
from bs4 import BeautifulSoup
import re
import os


page = 1
while page<10 :

    url = ‘http://www.qiushibaike.com/hot/page/‘ + str(page)
    user_agent = ‘Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)‘
    headers = { ‘User-Agent‘ : user_agent }
    try:
        request = urllib2.Request(url,headers = headers)
        response = urllib2.urlopen(request)

        qiubai_html = response.read()
        #print qiubai_html
        soup = BeautifulSoup(qiubai_html,"html.parser")
        #print soup.find("a",class_="contentHerf")
        #print soup.find("a",class_="contenHerf").span.text

        file = open(‘imgsrc.txt‘,‘a‘)

        qiubailist = soup.find_all("a",class_="contentHerf")
        print ‘this is page ‘,page
        for x in qiubailist:
            print x.span.text
            file.write(x.span.text.encode(‘utf-8‘)+‘\r\n‘)
            print ‘\n‘

        imgSrclist = soup.find_all("div",class_="thumb")
        for x in imgSrclist:
            file.write(x.img[‘src‘].encode(‘utf-8‘)+‘\r\n‘)
        file.close()

        print soup.find("div",class_="thumb").img[‘src‘]

        page = page + 1
    except urllib2.URLError, e:
        if hasattr(e,"code"):
            print e.code
            if hasattr(e,"reason"):
                print e.reason

爬取糗事百科的热门段子，以及热图链接

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > 爬取糗事百科的热门段子，以及热图链接

爬取糗事百科的热门段子，以及热图链接

看完仍有疑问？有类似问题直接问程序猿