2017.08.04 Python网络爬虫之Scrapy爬虫实战二天气预报

首页 > 代码库 > 2017.08.04 Python网络爬虫之Scrapy爬虫实战二天气预报

2017.08.04 Python网络爬虫之Scrapy爬虫实战二天气预报

2024-11-17 11:19:02 203人阅读

1.项目准备：网站地址：http://quanzhou.tianqi.com/

技术分享

2.创建编辑Scrapy爬虫：

scrapy startproject weather

scrapy genspider HQUSpider quanzhou.tianqi.com

技术分享

项目文件结构如图：

技术分享

3.修改Items.py：

技术分享

4.修改Spider文件HQUSpider.py：

（1）先使用命令：scrapy shell http://quanzhou.tianqi.com/ 测试和获取选择器：

技术分享

（2）试验选择器：打开chrome浏览器，查看网页源代码：

技术分享

（3）执行命令查看response结果：

技术分享

（4）编写HQUSpider.py文件：

# -*- coding: utf-8 -*-
import scrapy
from weather.items import WeatherItem

class HquspiderSpider(scrapy.Spider):
    name = ‘HQUSpider‘
    allowed_domains = [‘tianqi.com‘]
    citys=[‘quanzhou‘,‘datong‘]
    start_urls = []
    for city in citys:
        start_urls.append(‘http://‘+city+‘.tianqi.com/‘)
    def parse(self, response):
        subSelector=response.xpath(‘//div[@class="tqshow1"]‘)
        items=[]
        for sub in subSelector:
            item=WeatherItem()
            cityDates=‘‘
            for cityDate in sub.xpath(‘./h3//text()‘).extract():
                cityDates+=cityDate
            item[‘cityDate‘]=cityDates
            item[‘week‘]=sub.xpath(‘./p//text()‘).extract()[0]
            item[‘img‘]=sub.xpath(‘./ul/li[1]/img/@src‘).extract()[0]
            temps=‘‘
            for temp in sub.xpath(‘./ul/li[2]//text()‘).extract():
                temps+=temp
            item[‘temperature‘]=temps
            item[‘weather‘]=sub.xpath(‘./ul/li[3]//text()‘).extract()[0]
            item[‘wind‘]=sub.xpath(‘./ul/li[4]//text()‘).extract()[0]
            items.append(item)
        return items

(5)修改pipelines.py我，处理Spider的结果：

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don‘t forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
import time
import os.path
import urllib2
import sys
reload(sys)
sys.setdefaultencoding(‘utf8‘)

class WeatherPipeline(object):
    def process_item(self, item, spider):
        today=time.strftime(‘%Y%m%d‘,time.localtime())
        fileName=today+‘.txt‘
        with open(fileName,‘a‘) as fp:
            fp.write(item[‘cityDate‘].encode(‘utf-8‘)+‘\t‘)
            fp.write(item[‘week‘].encode(‘utf-8‘)+‘\t‘)
            imgName=os.path.basename(item[‘img‘])
            fp.write(imgName+‘\t‘)
            if os.path.exists(imgName):
                pass
            else:
                with open(imgName,‘wb‘) as fp:
                    response=urllib2.urlopen(item[‘img‘])
                    fp.write(response.read())
            fp.write(item[‘temperature‘].encode(‘utf-8‘)+‘\t‘)
            fp.write(item[‘weather‘].encode(‘utf-8‘)+‘\t‘)
            fp.write(item[‘wind‘].encode(‘utf-8‘)+‘\n\n‘)
            time.sleep(1)
        return item

（6）修改settings.py文件，决定由哪个文件来处理获取的数据：

技术分享

（7）执行命令：scrapy crawl HQUSpider

到此为止，一个完整的Scrapy爬虫就完成了。

2017.08.04 Python网络爬虫之Scrapy爬虫实战二天气预报

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > 2017.08.04 Python网络爬虫之Scrapy爬虫实战二 天气预报

2017.08.04 Python网络爬虫之Scrapy爬虫实战二 天气预报

看完仍有疑问？有类似问题直接问程序猿

首页 > 代码库 > 2017.08.04 Python网络爬虫之Scrapy爬虫实战二天气预报

2017.08.04 Python网络爬虫之Scrapy爬虫实战二天气预报