首页 > 代码库 > 爬取某电影网站最新电影
爬取某电影网站最新电影
1 # -*- coding: utf-8 -*- 2 """ 3 Created on Wed Oct 12 16:48:33 2016 4 5 @author: fuzzier 6 """ 7 8 import requests 9 from bs4 import BeautifulSoup10 import re11 import os12 import codecs13 14 URL = ‘http://www.xxxxx.net‘15 16 def download_page(url):17 headers = {‘User_Agent‘:‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1581.2 Safari/537.36‘}18 html = requests.get(url,headers=headers).content19 return html20 21 def parser_html(data):22 soup = BeautifulSoup(data,‘html.parser‘)23 films = []24 trs = soup.find(‘div‘,class_=‘bd3rl‘).find(‘div‘,class_=‘co_content8‘).find_all(‘tr‘)25 for i in trs:26 tr = i.find(‘a‘,href=http://www.mamicode.com/re.compile(r‘/\w+?/\w+?/\w+?/\d+?/\d+?.html‘)).string27 if tr:28 films.append(tr)29 else:30 films.append(‘None‘)31 return films32 33 if __name__ == ‘__main__‘:34 html = download_page(URL)35 film_list = parser_html(html)36 with codecs.open(os.getcwd()+‘\\dytt8_hot.txt‘,‘w‘,encoding=‘utf8‘) as f:37 for i in film_list:38 f.write(i+‘\r\n‘)
爬取某电影网站最新电影
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。