首页 > 代码库 > nutch 采集效率--设置采集间隔

nutch 采集效率--设置采集间隔

fetcher.max.crawl.delay  默认是30秒,这里改为 5秒
修改nutch-default.xml
<property> <name>fetcher.max.crawl.delay</name> <value>5</value> <description> If the Crawl-Delay in robots.txt is set to greater than this value (in seconds) then the fetcher will skip this page, generating an error report. If set to -1 the fetcher will never skip such pages and will wait the amount of time retrieved from robots.txt Crawl-Delay, however long that might be. </description></property>

 

nutch 采集效率--设置采集间隔