首页 > 代码库 > requests+正则表达式爬取ip
requests+正则表达式爬取ip
1 #requests+正则表达式爬取ip 2 #findall方法,如果表达式中包含有子组,则会把子组单独返回出来,如果有多个子组,则会组合成元祖 3 import requests 4 import re 5 def get_ip(url): 6 headers={‘User-Agent‘:‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36 LBBROWSER‘} 7 response = requests.get(url,headers=headers) 8 pattern= re.compile(r‘(?:(?:[0-1]{0,1}\d{0,1}\d|2[0-4]\d|25[0-5])\.){3}(?:[0-1]{0,1}\d{0,1}\d|2[0-4]\d|25[0-5]).*\s*.*(?:\d+)‘) 9 result = re.findall(pattern,response.text) 10 #print(result) 11 return result 12 13 def make_iplist(iplist,result): 14 15 for ip in result: 16 ip = re.sub(r‘</td>\s*.*<td>‘,‘:‘,ip) 17 iplist.append(ip) 18 return iplist 19 20 def main(num): 21 22 iplist = [] 23 for i in range(1,num): 24 url = ‘http://www.xicidaili.com/nt/‘ 25 url =url + str(num) 26 #print(url) 27 result = get_ip(url) 28 iplist = make_iplist(iplist,result) 29 30 for j in iplist: 31 print(j) 32 if __name__ == ‘__main__‘: 33 num=int(input(‘请输入要抓取的页数:‘)) 34 main(num)
requests+正则表达式爬取ip
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。