首页 > 代码库 > python 第二周(第十天) 我的python成长记 一个月搞定python数据挖掘!(18) -mongodb
python 第二周(第十天) 我的python成长记 一个月搞定python数据挖掘!(18) -mongodb
1. 首先导入工具
from scrapy.selector import Selector
2. selectors的使用
实例:response.selector.xpath(‘//span/text()‘).extract()
(1)选择title标签中text的文本内容
response.selector.xpath(‘//title/text()‘)
提供两个更简单的方法
response.xpath(‘//title/text()‘)
response.css(‘title::text‘)
例子:
response.css(‘img‘).xpath(‘@src‘).extract()
response.xpath(‘//div[@id="images"]/a/text()‘).extract_first()
response.xpath(‘//div[@id="not-exists"]/text()‘).extract_first(default=‘not-found‘)
(2)使用正则匹配的
response.xpath(‘//a[contains(@href, "image")]/text()‘).re(r‘Name:\s*(.*)‘)
response.xpath(‘//a[contains(@href, "image")]/text()‘).re_first(r‘Name:\s*(.*)‘)
(3)Working with relative XPaths
divs = response.xpath(‘//div‘)
for p in divs.xpath(‘.//p‘):
print p.extract()
for p in divs.xpath(‘p‘):
print p.extract()
(4)
(5)
官方实例:
>>> links = response.xpath(‘//a[contains(@href, "image")]‘)
>>> links.extract()
[u‘<a href="http://www.mamicode.com/image1.html">Name: My image 1 <br><img src="http://www.mamicode.com/image1_thumb.jpg"></a>‘,
u‘<a href="http://www.mamicode.com/image2.html">Name: My image 2 <br><img src="http://www.mamicode.com/image2_thumb.jpg"></a>‘,
u‘<a href="http://www.mamicode.com/image3.html">Name: My image 3 <br><img src="http://www.mamicode.com/image3_thumb.jpg"></a>‘,
u‘<a href="http://www.mamicode.com/image4.html">Name: My image 4 <br><img src="http://www.mamicode.com/image4_thumb.jpg"></a>‘,
u‘<a href="http://www.mamicode.com/image5.html">Name: My image 5 <br><img src="http://www.mamicode.com/image5_thumb.jpg"></a>‘]
>>> for index, link in enumerate(links):
... args = (index, link.xpath(‘@href‘).extract(), link.xpath(‘img/@src‘).extract())
... print ‘Link number %d points to url %s and image %s‘ % args
Link number 0 points to url [u‘image1.html‘] and image [u‘image1_thumb.jpg‘]
Link number 1 points to url [u‘image2.html‘] and image [u‘image2_thumb.jpg‘]
Link number 2 points to url [u‘image3.html‘] and image [u‘image3_thumb.jpg‘]
Link number 3 points to url [u‘image4.html‘] and image [u‘image4_thumb.jpg‘]
Link number 4 points to url [u‘image5.html‘] and image [u‘image5_thumb.jpg‘]
python 第二周(第十天) 我的python成长记 一个月搞定python数据挖掘!(18) -mongodb
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。