首页 > 代码库 > FIRST SCRAPY PRJ
FIRST SCRAPY PRJ
zpc@Lenovo-PC:/prj/pyscrapy/a$ scrapy startproject helloword
New Scrapy project ‘helloword‘ created in:
/cygdrive/e/01.prj/pyscrapy/a/helloword
You can start your first spider with:
cd helloword
scrapy genspider example example.com
zpc@Lenovo-PC:/prj/pyscrapy/a/helloword$ scrapy genspider baidu www.baidu.com
Created spider ‘baidu‘ using template ‘basic‘ in module:
helloword.spiders.baidu
问题:
zpc@Lenovo-PC:/prj/pyscrapy/a/tutorial$ scrapy crawl dmoz
/cygdrive/e/01.prj/pyscrapy/a/tutorial/tutorial/spiders/dmoz_spider.py:3: ScrapyDeprecationWarning: tutorial.spiders.dmoz_spider.DmozSpider inherits from deprecated class scrapy.spider.BaseSpider, please inherit from scrapy.spider.Spider. (warning only on first subclass, there may be others)
class DmozSpider(BaseSpider):
2014-12-17 11:32:38+0000 [scrapy] INFO: Scrapy 0.24.4 started (bot: tutorial)
2014-12-17 11:32:38+0000 [scrapy] INFO: Optional features available: ssl, http11
2014-12-17 11:32:38+0000 [scrapy] INFO: Overridden settings: {‘NEWSPIDER_MODULE‘: ‘tutorial.spiders‘, ‘SPIDER_MODULES‘: [‘tutorial.spiders‘], ‘BOT_NAME‘: ‘tutorial‘}
2014-12-17 11:32:40+0000 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-12-17 11:32:41+0000 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-12-17 11:32:41+0000 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-12-17 11:32:41+0000 [scrapy] INFO: Enabled item pipelines:
2014-12-17 11:32:41+0000 [dmoz] INFO: Spider opened
2014-12-17 11:32:41+0000 [dmoz] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2014-12-17 11:32:41+0000 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2014-12-17 11:32:41+0000 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080
iconv [选项...] [文件...]
有如下选项可用:
输入/输出格式规范:
-f, --from-code=名称 原始文本编码
-t, --to-code=名称 输出编码
信息:
-l, --list 列举所有已知的字符集
输出控制:
-c 从输出中忽略无效的字符
-o, --output=FILE 输出文件
-s, --silent 关闭警告
--verbose 打印进度信息
iconv -f utf-8 -t gb2312 /server_test/reports/software_.txt > /server_test/reports/software_asserts.txt
FIRST SCRAPY PRJ