首页 > 代码库 > 系统设计3:网络爬虫和短链接

系统设计3:网络爬虫和短链接

补充材料:

Web相关:

https://www.zhihu.com/question/22689579

爬虫:

https://www.zhihu.com/question/20899988

http://www-rohan.sdsu.edu/~gawron/python_for_ss/course_core/book_draft/web/web_intro.html

https://www.zhihu.com/question/27621722

http://blog.csdn.net/yiliumu/article/details/21335245

https://scrapy.org/

Socket:

http://www.cnblogs.com/thinksasa/archive/2013/02/26/2934206.html

http://siddontang.com/2012/09/02/step-by-step-network/

http://blog.csdn.net/rock_ray/article/details/22046449

http://coolshell.cn/articles/11564.html

正则表达式:

https://regex101.com/

https://docs.python.org/2/howto/regex.html

https://docs.python.org/2/library/re.html

条件变量:

http://www.wuzesheng.com/?p=1668

http://blog.csdn.net/jnu_simba/article/details/9129939

http://stackoverflow.com/questions/11000725/implementation-of-condition-variables

https://en.wikipedia.org/wiki/Monitor_(synchronization)

http://blog.csdn.net/anonymalias/article/details/9174481

信号量:

http://c.biancheng.net/cpp/html/2598.html

http://www.cnblogs.com/lcw/p/3236602.html

http://www.blogjava.net/fhtdy2004/archive/2009/07/05/285519.html

http://blog.csdn.net/nhn_devlab/article/details/6117239

无锁队列:

https://zh.wikipedia.org/wiki/%E7%94%9F%E4%BA%A7%E8%80%85%E6%B6%88%E8%B4%B9%E8%80%85%E9%97%AE%E9%A2%98

http://www.cnblogs.com/clover-toeic/p/4029269.html

http://ifeve.com/locks-are-bad/

http://coolshell.cn/articles/8239.html

http://coolshell.cn/articles/9169.html

https://www.infoq.com/articles/High-Performance-Java-Inter-Thread-Communications

http://blog.csdn.net/ns_code/article/details/17487337

TinyURL:

https://goo.gl

https://www.zhihu.com/topic/19564386/hot

https://www.hiredintech.com/system-design/the-system-design-process/

https://developers.google.com/url-shortener/

多线程是为了提升性能,但性能最好的往往是单线程,无锁的东西。

通信过程:

技术分享

 

Internet分层:

技术分享

一个经典的任务执行调度器:

高优先级的任务,可以通过Time延后执行。

技术分享

1s100w请求

  • queue
  • rate limit
  • more server + 负载均衡
  • 全内存:redis,memcache
  • 异步

系统设计3:网络爬虫和短链接