首页 > 代码库 > python beautifulsoup bs4爬虫 爬取糗事百科

python beautifulsoup bs4爬虫 爬取糗事百科

  1. 声明:仅用于学习语法,请勿用于非法用途


  2. import urllib.request

  3. import re

  4. from bs4 import BeautifulSoup

  5. # -*- coding:utf-8 -*-


  6. url = ‘http://www.qiushibaike.com/hot/‘

  7. user_agent=‘Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)‘

  8. headers={‘User-Agent‘:user_agent}

  9. request = urllib.request.Request(url=url,headers=headers)

  10. response = urllib.request.urlopen(request)

  11. bsobj = BeautifulSoup(response.read(), "html5lib")

  12. #content = response.read().decode(‘utf-8‘)

  13. #print(bsobj)

  14. nameList = bsobj.find_all("div", {"class":"content"})

  15. for name in nameList:

  16.    print(name.get_text())

  17.    input_enter = str(input())

  18.    if input_enter ==‘‘:

  19.        continue


本文出自 “净空蓝星” 博客,谢绝转载!

python beautifulsoup bs4爬虫 爬取糗事百科