首页 > 代码库 > Storm【实践系列-如何写一个爬虫- 对于Protocol进行的封装】
Storm【实践系列-如何写一个爬虫- 对于Protocol进行的封装】
本章描述:对于Protocol的封装
package com.digitalpebble.storm.crawler.fetcher; import com.digitalpebble.storm.crawler.util.Configuration; public interface Protocol { public ProtocolResponse getProtocolOutput(String url) throws Exception; public void configure(Configuration conf); }
对于ProtoclFactory的封装
package com.digitalpebble.storm.crawler.fetcher; import java.net.URL; import java.util.WeakHashMap; import com.digitalpebble.storm.crawler.fetcher.asynchttpclient.AHProtocol; import com.digitalpebble.storm.crawler.util.Configuration; /** * @author Yin Shuai * */ public class ProtocolFactory { private final Configuration config; private final WeakHashMap<String, Protocol> cache = new WeakHashMap<String, Protocol>(); public ProtocolFactory(Configuration conf) { config = conf; } /** Returns an instance of the protocol to use for a given URL **/ public synchronized Protocol getProtocol(URL url) { // get the protocol String protocol = url.getProtocol(); Protocol pp = cache.get(protocol); if (pp != null) return pp; // yuk! hardcoded for now pp = new AHProtocol(); pp.configure(config); cache.put(protocol,pp); return pp; } }
对于ProtocolResponse的封装
package com.digitalpebble.storm.crawler.fetcher; import java.util.HashMap; public class ProtocolResponse { final byte[] content; final int statusCode; final HashMap<String, String[]> metadata; public ProtocolResponse(byte[] c, int s, HashMap<String, String[]> md){ content = c; statusCode = s; metadata = md; } public byte[] getContent() { return content; } public int getStatusCode() { return statusCode; } public HashMap<String, String[]> getMetadata() { return metadata; } }
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。