scrapy中Request中常用参数

首页 > 代码库 > scrapy中Request中常用参数

2024-09-20 05:17:48 214人阅读

url: 就是需要请求，并进行下一步处理的url
callback: 指定该请求返回的Response，由那个函数来处理。
method: 一般不需要指定，使用默认GET方法请求即可
headers: 请求时，包含的头文件。一般不需要。内容一般如下：使用 urllib2 自己写过爬虫的肯定知道
        Host: media.readthedocs.org
        User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
        Accept: text/css,*/*;q=0.1
        Accept-Language: zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3
        Accept-Encoding: gzip, deflate
        Referer: http://scrapy-chs.readthedocs.org/zh_CN/0.24/
        Cookie: _ga=GA1.2.1612165614.1415584110;
        Connection: keep-alive
        If-Modified-Since: Mon, 25 Aug 2014 21:59:35 GMT
        Cache-Control: max-age=0
meta: 比较常用，在不同的请求之间传递数据使用的。字典dict型
        request_with_cookies = Request(url="http://www.example.com",
                                       cookies={‘currency‘: ‘USD‘, ‘country‘: ‘UY‘},
                                       meta={‘dont_merge_cookies‘: True})
encoding: 使用默认的 ‘utf-8‘ 就行。
dont_filter: indicates that this request should not be filtered by the scheduler. 
             This is used when you want to perform an identical request multiple times, 
             to ignore the duplicates filter. Use it with care, or you will get into crawling loops. 
             Default to False.
errback: 指定错误处理函数

scrapy中Request中常用参数

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > scrapy中Request中常用参数

scrapy中Request中常用参数

看完仍有疑问？有类似问题直接问程序猿