首页 > 代码库 > python爬虫爬取海量病毒文件

python爬虫爬取海量病毒文件

因为工作需要,需要做深度学习识别恶意二进制文件,所以爬一些资源。

# -*- coding: utf-8 -*-
import requests
import re
import sys
import logging

reload(sys)
sys.setdefaultencoding(‘utf-8‘)

logger = logging.getLogger("rrjia")
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
file_handler = logging.FileHandler("/home/rrjia/Python/test.log")
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
logger.setLevel("INFO")


if __name__ == ‘__main__‘:
    # url = ‘http://malwaredb.malekal.com‘
    # http://malwaredb.malekal.com/index.php?page=1
    # <td width="30px" align="center"><a href="http://www.mamicode.com/files.php?file=25e8bf41343bda75a9170aad44094647"><img src="http://www.mamicode.com/img/tetedemort.gif" width="26px,height=26px"></a></td>

    count = 1
    error_count = 0

    begin_url = ‘http://malwaredb.malekal.com‘
    begin_html = requests.get(begin_url)

    img_src = http://www.mamicode.com/re.findall(‘

 

python爬虫爬取海量病毒文件