首页 > 代码库 > 拉钩网爬取所有python职位信息
拉钩网爬取所有python职位信息
最近在找工作,所以爬取了拉钩网的全部python职位,以便给自己提供一个方向。拉钩网的数据还是比较容易爬取的,得到json数据直接解析就行,废话不多说, 直接贴代码:
1 import json 2 import urllib 3 import urllib2 4 from openpyxl import load_workbook 5 filename = ‘E:\excel\position_number_11_2.xlsx‘ 6 ws = load_workbook(filename=filename) 7 sheet = ws.create_sheet(0) 8 sheet.title = ‘position‘ 9 count = 1 10 11 for page in xrange(100): 12 from_data =http://www.mamicode.com/ { 13 ‘first‘: ‘false‘, 14 ‘pn‘: page, 15 ‘kd‘: ‘Python‘ 16 } 17 18 header = { 19 "User-Agent": ‘Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0‘, 20 ‘Referer‘: ‘https://www.lagou.com/jobs/list_Python?px=default&city=%E5%85%A8%E5%9B%BD‘, 21 } 22 request_url = ‘https://www.lagou.com/jobs/positionAjax.json?px=default&needAddtionalResult=false‘ 23 data =http://www.mamicode.com/ urllib.urlencode(from_data) 24 25 request = urllib2.Request(request_url, headers=header, data=http://www.mamicode.com/data) 26 try: 27 html = urllib2.urlopen(request).read().decode(‘utf-8‘) 28 except Exception: 29 print ‘没有职位信息‘ 30 break 31 # print html 32 jsonobj = json.loads(html) 33 # print jsonobj 34 dict_obj = jsonobj[‘content‘][‘positionResult‘][‘result‘] 35 for item in dict_obj: 36 if item: 37 sheet.cell(row=count, column=1).value = http://www.mamicode.com/item[‘companySize‘] 38 sheet.cell(row=count, column=2).value = http://www.mamicode.com/item[‘workYear‘] 39 sheet.cell(row=count, column=3).value = http://www.mamicode.com/item[‘education‘] 40 sheet.cell(row=count, column=4).value = http://www.mamicode.com/item[‘financeStage‘] 41 sheet.cell(row=count, column=5).value = http://www.mamicode.com/item[‘city‘] 42 sheet.cell(row=count, column=6).value = http://www.mamicode.com/item[‘industryField‘] 43 sheet.cell(row=count, column=7).value = http://www.mamicode.com/item[‘formatCreateTime‘] 44 sheet.cell(row=count, column=8).value = http://www.mamicode.com/item[‘positionName‘] 45 sheet.cell(row=count, column=9).value = http://www.mamicode.com/item[‘companyFullName‘] 46 sheet.cell(row=count, column=10).value = http://www.mamicode.com/item[‘salary‘] 47 count += 1 48 ws.save(‘E:\excel\position_number_11_2.xlsx‘)
代码写得比较急,就没怎么规范。 过两天把微博和豆瓣的代码发出来,希望园里的大神多指点^_^
拉钩网爬取所有python职位信息
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。