首页 > 代码库 > 系统设计3:网络爬虫和短链接
系统设计3:网络爬虫和短链接
补充材料:
Web相关:
https://www.zhihu.com/question/22689579
爬虫:
https://www.zhihu.com/question/20899988
http://www-rohan.sdsu.edu/~gawron/python_for_ss/course_core/book_draft/web/web_intro.html
https://www.zhihu.com/question/27621722
http://blog.csdn.net/yiliumu/article/details/21335245
https://scrapy.org/
Socket:
http://www.cnblogs.com/thinksasa/archive/2013/02/26/2934206.html
http://siddontang.com/2012/09/02/step-by-step-network/
http://blog.csdn.net/rock_ray/article/details/22046449
http://coolshell.cn/articles/11564.html
正则表达式:
https://regex101.com/
https://docs.python.org/2/howto/regex.html
https://docs.python.org/2/library/re.html
条件变量:
http://www.wuzesheng.com/?p=1668
http://blog.csdn.net/jnu_simba/article/details/9129939
http://stackoverflow.com/questions/11000725/implementation-of-condition-variables
https://en.wikipedia.org/wiki/Monitor_(synchronization)
http://blog.csdn.net/anonymalias/article/details/9174481
信号量:
http://c.biancheng.net/cpp/html/2598.html
http://www.cnblogs.com/lcw/p/3236602.html
http://www.blogjava.net/fhtdy2004/archive/2009/07/05/285519.html
http://blog.csdn.net/nhn_devlab/article/details/6117239
无锁队列:
https://zh.wikipedia.org/wiki/%E7%94%9F%E4%BA%A7%E8%80%85%E6%B6%88%E8%B4%B9%E8%80%85%E9%97%AE%E9%A2%98
http://www.cnblogs.com/clover-toeic/p/4029269.html
http://ifeve.com/locks-are-bad/
http://coolshell.cn/articles/8239.html
http://coolshell.cn/articles/9169.html
https://www.infoq.com/articles/High-Performance-Java-Inter-Thread-Communications
http://blog.csdn.net/ns_code/article/details/17487337
TinyURL:
https://goo.gl
https://www.zhihu.com/topic/19564386/hot
https://www.hiredintech.com/system-design/the-system-design-process/
https://developers.google.com/url-shortener/
多线程是为了提升性能,但性能最好的往往是单线程,无锁的东西。
通信过程:
Internet分层:
一个经典的任务执行调度器:
高优先级的任务,可以通过Time延后执行。
1s100w请求
- queue
- rate limit
- more server + 负载均衡
- 全内存:redis,memcache
- 异步
系统设计3:网络爬虫和短链接