首页 > 代码库 > Scrapy之Scrapy shell

Scrapy之Scrapy shell

Scrapy Shell

Scrapy终端是一个交互终端,我们可以在未启动spider的情况下尝试及调试代码,也可以用来测试XPath或CSS表达式,查看他们的工作方式,方便我们爬取的网页中提取的数据。

如果安装了 IPython ,Scrapy终端将使用 IPython (替代标准Python终端)。 IPython 终端与其他相比更为强大,提供智能的自动补全,高亮输出,及其他特性。(推荐安装IPython)

启动Scrapy Shell

进入项目的根目录,执行下列命令来启动shell:

scrapy shell "http://www.itcast.cn/channel/teacher.shtml"

技术分享

Scrapy Shell根据下载的页面会自动创建一些方便使用的对象,例如 Response 对象,以及 Selector 对象 (对HTML及XML内容)

  • 当shell载入后,将得到一个包含response数据的本地 response 变量,输入 response.body将输出response的包体,输出 response.headers 可以看到response的包头。

  • 输入 response.selector 时, 将获取到一个response 初始化的类 Selector 的对象,此时可以通过使用 response.selector.xpath()response.selector.css() 来对 response 进行查询。

  • Scrapy也提供了一些快捷方式, 例如 response.xpath()response.css()同样可以生效(如之前的案例)。

Selectors选择器

Scrapy Selectors 内置 XPath 和 CSS Selector 表达式机制

Selector有四个基本的方法,最常用的还是xpath:

  • xpath(): 传入xpath表达式,返回该表达式所对应的所有节点的selector list列表
  • extract(): 序列化该节点为Unicode字符串并返回list
  • css(): 传入CSS表达式,返回该表达式所对应的所有节点的selector list列表,语法同 BeautifulSoup4
  • re(): 根据传入的正则表达式对数据进行提取,返回Unicode字符串list列表

XPath表达式的例子及对应的含义:

/html/head/title: 选择<HTML>文档中 <head> 标签内的 <title> 元素
/html/head/title/text(): 选择上面提到的 <title> 元素的文字
//td: 选择所有的 <td> 元素
//div[@class="mine"]: 选择所有具有 class="mine" 属性的 div 元素

尝试Selector

我们用腾讯社招的网站http://hr.tencent.com/position.php?&start=0#a举例:

# 启动
scrapy shell "http://hr.tencent.com/position.php?&start=0#a"

# 返回 xpath选择器对象列表
response.xpath(‘//title‘)
[<Selector xpath=‘//title‘ data=http://www.mamicode.com/u‘/u804c/u4f4d/u641c/u7d22 | /u793e/u4f1a/u62db/u8058 | Tencent /u817e/u8baf/u62db/u8058</title‘>]"even"]‘)
职位名称:

print site[0].xpath(‘./td[1]/a/text()‘).extract()[0]
TEG15-运营开发工程师(深圳)
职位名称详情页:

print site[0].xpath(‘./td[1]/a/@href‘).extract()[0]
position_detail.php?id=20744&keywords=&tid=0&lid=0
职位类别:

print site[0].xpath(‘./td[2]/text()‘).extract()[0]
技术类
</code></pre>
<p>以后做数据提取的时候,可以把现在Scrapy Shell中测试,测试通过后再应用到代码中。</p>
<p>当然Scrapy Shell作用不仅仅如此,但是不属于我们课程重点,不做详细介绍。</p>
<p>官方文档:http://scrapy-chs.readthedocs.io/zh_CN/latest/topics/shell.html</p><p>Scrapy之Scrapy shell</p></span>
</div>
<nav class="p-0 mb-4 bg-white text-left">
<a href="/itag/3186/" title="根目录" class="tag" target="_blank">根目录</a> <a href="/itag/43265/" title="含义" class="tag" target="_blank">含义</a> <a href="/itag/9381/" title="工作方式" class="tag" target="_blank">工作方式</a> <a href="/itag/53/" title="htm" class="tag" target="_blank">htm</a> <a href="/itag/12242/" title="举例" class="tag" target="_blank">举例</a> </nav>

<div class="alert alert-secondary alert-dismissible fade show font-weight-light" role="alert">
<i class="bi bi-info-square-fill"></i> 声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉:<a class="badge badge-dark font-weight-light" data-toggle="modal" data-target="#exampleModal" data-whatever="@mdo" href="#"><i class="bi bi-envelope"></i> 投诉/举报</a> 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。
<button type="button" class="close" data-dismiss="alert" aria-label="Close">
<span aria-hidden="true">×</span>
</button>
</div>

<div class="alert alert-secondary alert-dismissible fade show" role="alert">
<form name='feedback' method='post' enctype='multipart/form-data' action='#'>
<input name='enews' type='hidden' value='AddFeedback'>
<input type="hidden" name="bid" value="2" />
<input type='hidden' name='ecmsfrom' value='https://www.ouer.net/daima/5298.html'>
<div class="mb-3">
<label for="address"><h6><strong><i class="bi bi-chat-right-text-fill"></i> 看完仍有疑问?有类似问题直接问程序猿</strong></h6></label>
<textarea class="form-control" aria-label="With textarea" name="saytext" style="height: 120px;" required placeholder="提问和评论都可以,用心的回复会被更多人看到和认可!"></textarea>
</div>
<button class="btn btn-primary btn-lg btn-block" type='submit' name='submit'>发布提问</button></form>
<button type="button" class="close" data-dismiss="alert" aria-label="Close">
<span aria-hidden="true">×</span>
</button>
</div>



<div class="modal fade" id="exampleModal" tabindex="-1" aria-labelledby="exampleModalLabel" aria-hidden="true">
  <div class="modal-dialog">
    <div class="modal-content">
      <div class="modal-header">
        <h5 class="modal-title" id="exampleModalLabel">投诉/举报</h5>
        <button type="button" class="close" data-dismiss="modal" aria-label="Close">
          <span aria-hidden="true">×</span>
        </button>
      </div>
      <div class="modal-body">
        <form name='feedback' method='post' enctype='multipart/form-data' action='../../e/enews/index.php'>
<input name='enews' type='hidden' value='AddFeedback'>
<input type="hidden" name="bid" value="2" />
<input type='hidden' name='ecmsfrom' value='https://www.u72.net/daima/5298.html'>
<input name='title' type='hidden' value='https://www.u72.net/daima/5298.html'>
       <div class="mb-3">
          <div class="input-group">
            <div class="input-group-prepend">
            <span class="input-group-text"><i class="bi bi-pencil-square"></i></span>
            </div>
            <input class="form-control" type="text" placeholder="Scrapy之Scrapy shell..." readonly>
          </div>
        </div>
       <div class="mb-3">
          <label for="username"><h5><i class="bi bi-person-lines-fill"></i> 您的姓名</h5></label>
          <div class="input-group">
            <div class="input-group-prepend">
            <span class="input-group-text"><i class="bi bi-person-check"></i></span>
            </div>
          <input class="form-control form-control" type="text" name="name">
          </div>
        </div>
        <div class="mb-3">
          <label for="address"><h5><i class="bi bi-chat-left-text"></i> 反馈内容</h5></label>
          <textarea class="form-control form-control" aria-label="With textarea" name="saytext" style="height: 130px;" required></textarea>
        </div>
      <div class="modal-footer">
        <button type="button" class="btn btn-secondary" data-dismiss="modal">关闭</button>
        <button type="submit" name="submit" class="btn btn-primary">提交内容</button>
      </div>
      </form>
    </div>
  </div>
</div>
</div>
<p class="social-share"  data-disabled="google,twitter,facebook,tencent,diandian" ></p>
<p class="text-left"><button type="button" class="itemCopy btn btn-light" id="TKLS"  type="button" data-clipboard-text="https://www.u72.net/daima/5298.html"><i class="bi bi-link-45deg"></i> https://www.u72.net/daima/5298.html</button></p>

</div>

<br>
<nav class="blog-pagination">
<p class="text-left"></p> 
</nav>
</div>
<aside class="col-md-4 blog-sidebar">
<div class="mb-3 bg-light rounded list-group">
<a href="#" class="list-group-item list-group-item-action active">
<div class="d-flex w-100 justify-content-between">
<h5 class="mb-1"><i class="bi bi-building"></i> 相关代码解决方案</h5>
</div>
</a>
<a href="/daima/nnbzv.html" title="scrapy(1)——scrapy介绍" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> scrapy(1)——scrapy介绍</h6></div></a><a href="/daima/hvuh.html" title="Scrapy的shell命令(转)" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> Scrapy的shell命令(转)</h6></div></a><a href="/daima/nrvu0.html" title="scrapy shell 用法(慢慢更新...)" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> scrapy shell 用法(慢慢更新...)</h6></div></a><a href="/daima/nbcma.html" title="scrapy基础知识之 Scrapy 和 scrapy-redis的区别:" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> scrapy基础知识之 Scrapy 和 scrapy-redis的区别:</h6></div></a><a href="/daima/fz4w.html" title="scrapy" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> scrapy</h6></div></a><a href="/daima/nhx8m.html" title="Scrapy" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> Scrapy</h6></div></a><a href="/daima/nuckx.html" title="scrapy" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> scrapy</h6></div></a><a href="/daima/nwnfb.html" title="scrapy" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> scrapy</h6></div></a><a href="/daima/ckcw.html" title="[Scrapy][转]关于scrapy命令" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> [Scrapy][转]关于scrapy命令</h6></div></a><a href="/daima/0d7a.html" title="Scrapy shell调试网页的信息" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> Scrapy shell调试网页的信息</h6></div></a><a href="/daima/nnf6f.html" title="Scrapy shell调试返回403错误" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> Scrapy shell调试返回403错误</h6></div></a><a href="/daima/dzvn.html" title="scrapy 和 scrapy_redis 安装" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> scrapy 和 scrapy_redis 安装</h6></div></a><a href="/daima/k280.html" title="scrapy安装" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> scrapy安装</h6></div></a><a href="/daima/whsk.html" title="scrapy安装" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> scrapy安装</h6></div></a><a href="/daima/rcd1.html" title="【Scrapy】Selectors" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> 【Scrapy】Selectors</h6></div></a><a href="/daima/r6ac.html" title="scrapy备注" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> scrapy备注</h6></div></a><a href="/daima/0xd4.html" title="安装Scrapy" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> 安装Scrapy</h6></div></a><a href="/daima/26z7.html" title="scrapy安装" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> scrapy安装</h6></div></a><a href="/daima/0r0r.html" title="Scrapy 爬虫" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> Scrapy 爬虫</h6></div></a><a href="/daima/xc1z.html" title="Scrapy 入门" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> Scrapy 入门</h6></div></a><a href="/daima/78z9.html" title="Scrapy -- 04" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> Scrapy -- 04</h6></div></a><a href="/daima/35rv.html" title="scrapy 安装" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> scrapy 安装</h6></div></a><a href="/daima/6axv.html" title="爬虫--scrapy" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> 爬虫--scrapy</h6></div></a><a href="/daima/7r3x.html" title="scrapy snippet" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> scrapy snippet</h6></div></a><a href="/daima/m7w0.html" title="Scrapy模块" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> Scrapy模块</h6></div></a><a href="/daima/nz8c0.html" title="scrapy笔记" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> scrapy笔记</h6></div></a><a href="/daima/nk2em.html" title="scrapy使用" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> scrapy使用</h6></div></a><a href="/daima/nn17d.html" title="爬虫scrapy" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> 爬虫scrapy</h6></div></a><a href="/daima/ncm25.html" title="Scrapy 安装" class="list-group-item list-group-item-action" target="_blank"><div class="d-flex w-100 justify-content-between"><h6 class="mb-1"><img class="icon" src="/skin/law/images/icon-answer.svg" width="22" height="22"> Scrapy 安装</h6></div></a></div>

<div class="p-4 mb-3 bg-light rounded">
<h4>当日更新</h4>
<ul class="list-group list-group-flush mb-0"><li class="list-group"><a href="/daima/nx8sv.html" title="hdu 1074 Doing Homework(状压DP)" target="_blank"><i class="bi bi-question-square-fill"></i> hdu 1074 Doing Homework(状</a></li>
<li class="list-group"><a href="/daima/nx8su.html" title="如何编写一个SQL注入工具" target="_blank"><i class="bi bi-question-square-fill"></i> 如何编写一个SQL注入工具</a></li>
<li class="list-group"><a href="/daima/nx8ss.html" title="MFC 记录 CListCtrl 学习使用" target="_blank"><i class="bi bi-question-square-fill"></i> MFC 记录 CListCtrl 学习使</a></li>
<li class="list-group"><a href="/daima/nx8sr.html" title="Android 集成 支付宝支付" target="_blank"><i class="bi bi-question-square-fill"></i> Android 集成 支付宝支付</a></li>
<li class="list-group"><a href="/daima/nx8sc.html" title="Guava-Optional可空类型" target="_blank"><i class="bi bi-question-square-fill"></i> Guava-Optional可空类型</a></li>
<li class="list-group"><a href="/daima/nx8sf.html" title="ELK平台搭建部署" target="_blank"><i class="bi bi-question-square-fill"></i> ELK平台搭建部署</a></li>
<li class="list-group"><a href="/daima/nx8sb.html" title="SQL Server编程系列(1):SMO介绍" target="_blank"><i class="bi bi-question-square-fill"></i> SQL Server编程系列(1):SMO</a></li>
<li class="list-group"><a href="/daima/nx8sd.html" title="ORM:ODB安装使用过程" target="_blank"><i class="bi bi-question-square-fill"></i> ORM:ODB安装使用过程</a></li>
<li class="list-group"><a href="/daima/nx8sk.html" title="CALayer" target="_blank"><i class="bi bi-question-square-fill"></i> CALayer</a></li>
<li class="list-group"><a href="/daima/nx8sh.html" title="CCF 201604-4 游戏" target="_blank"><i class="bi bi-question-square-fill"></i> CCF 201604-4 游戏</a></li>
</ul>
</div>

</aside>
</div>

<div class="floatbar">
        <div class="floatbar-item">
            <a href="/fankui.html" target="_blank" class="floatbar-btn">
                <i>
                  <i class="bi bi-chat-right-text"></i>
                </i>
                <p>
                    联系<br>
                    我们
                </p>
            </a>
        </div>

        <div class="floatbar-item floatbtn-item-top" style="display: none;">
            <a href="javascript:" class="floatbar-btn backtotop">
                <i>
                <i class="bi bi-chevron-double-up"></i>
                </i>
                <p>
                    回到<br>
                    顶部
                </p>
            </a>
        </div>
    </div>
<header>
</header>

</main>
<!-- /.container -->
<script src="/skin/law/top/js/toastr.min.js"></script>
<script src="/skin/law/top/js/site.js"></script>
<div class="container">
<nav class="navbar navbar-expand-lg navbar-light bg-light">
  <a class="navbar-brand" href="#">友情链接:</a>
  <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarNavAltMarkup" aria-controls="navbarNavAltMarkup" aria-expanded="false" aria-label="Toggle navigation">
    <span class="navbar-toggler-icon"></span>
  </button>
  <div class="collapse navbar-collapse" id="navbarNavAltMarkup">
    <div class="navbar-nav">
<a class="nav-link" href="https://www.xuebavip.cn">学霸VIP</a>
    </div>
  </div>
</nav>
</div>

<div id="loginbox"></div>
<div class="container">     
<footer class="pt-4 my-md-3 pt-md-5 border-top text-center font-weight-light">
<p class="mb-1"><a href="/about/nbs.html" class="text-dark">关于我们</a> / <a href="/about/nbv.html" class="text-dark">广告服务</a> / <a href="/about/nbu.html" class="text-dark">免责声明</a></p>
<p class="mb-1">若内容有误或涉及侵权不想在本站出现!</p>
<p class="mb-1"><a class="badge badge-dark font-weight-light" href="/fankui.html"><i class="bi bi-envelope"></i> 请联系我们</a> 我们会及时处理和回复!</p>
<p class="mb-1">Copyright © 2022 程序员工具箱 All Rights Reserved  <a target="_blank" href="https://beian.miit.gov.cn" rel="nofollow" class="text-dark">蜀ICP备14004987号</a>  </p>
</footer>
</div>
<script src="/skin/law/bootstrap4.6.1/js/clipboard.min.js"></script>
<script>
var clipboard = new Clipboard('.itemCopy');
clipboard.on('success',
function(e) {
    if (e.trigger.disabled == false || e.trigger.disabled == undefined) {
        e.trigger.innerHTML = "本问题链接复制成功";
        e.trigger.disabled = true;
        setTimeout(function() {
            e.trigger.innerHTML = "本问题链接一键复制";
            e.trigger.disabled = false;
        },
        2000);
    }
});
clipboard.on('error',
function(e) {
    e.trigger.innerHTML = "复制失败";
});
</script>
<script src="/e/extend/DoTimeRepage/"></script>
<script>
var _hmt = _hmt || [];
(function() {
  var hm = document.createElement("script");
  hm.src = "https://hm.baidu.com/hm.js?f2555e186a2326f28ced45c12daae1cb";
  var s = document.getElementsByTagName("script")[0]; 
  s.parentNode.insertBefore(hm, s);
})();
</script>
</body>
</html>