首页 > 代码库 > 拉钩网爬取所有python职位信息

拉钩网爬取所有python职位信息

  最近在找工作,所以爬取了拉钩网的全部python职位,以便给自己提供一个方向。拉钩网的数据还是比较容易爬取的,得到json数据直接解析就行,废话不多说, 直接贴代码:

 

 1 import json
 2 import urllib
 3 import urllib2
 4 from openpyxl import load_workbook
 5 filename = E:\excel\position_number_11_2.xlsx
 6 ws = load_workbook(filename=filename)
 7 sheet = ws.create_sheet(0)
 8 sheet.title = position
 9 count = 1
10 
11 for page in xrange(100):
12     from_data =http://www.mamicode.com/ {
13         first: false,
14         pn: page,
15         kd: Python
16     }
17 
18     header = {
19         "User-Agent": Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0,
20         Referer: https://www.lagou.com/jobs/list_Python?px=default&city=%E5%85%A8%E5%9B%BD,
21     }
22     request_url = https://www.lagou.com/jobs/positionAjax.json?px=default&needAddtionalResult=false
23     data =http://www.mamicode.com/ urllib.urlencode(from_data)
24 
25     request = urllib2.Request(request_url, headers=header, data=http://www.mamicode.com/data)
26         try:
27         html = urllib2.urlopen(request).read().decode(utf-8)
28     except Exception:
29         print 没有职位信息
30         break
31     # print html
32     jsonobj = json.loads(html)
33     # print jsonobj
34     dict_obj = jsonobj[content][positionResult][result]
35     for item in dict_obj:
36         if item:
37             sheet.cell(row=count, column=1).value = http://www.mamicode.com/item[companySize]
38             sheet.cell(row=count, column=2).value = http://www.mamicode.com/item[workYear]
39             sheet.cell(row=count, column=3).value = http://www.mamicode.com/item[education]
40             sheet.cell(row=count, column=4).value = http://www.mamicode.com/item[financeStage]
41             sheet.cell(row=count, column=5).value = http://www.mamicode.com/item[city]
42             sheet.cell(row=count, column=6).value = http://www.mamicode.com/item[industryField]
43             sheet.cell(row=count, column=7).value = http://www.mamicode.com/item[formatCreateTime]
44             sheet.cell(row=count, column=8).value = http://www.mamicode.com/item[positionName]
45             sheet.cell(row=count, column=9).value = http://www.mamicode.com/item[companyFullName]
46             sheet.cell(row=count, column=10).value = http://www.mamicode.com/item[salary]
47             count += 1
48             ws.save(E:\excel\position_number_11_2.xlsx)

代码写得比较急,就没怎么规范。 过两天把微博和豆瓣的代码发出来,希望园里的大神多指点^_^

 

拉钩网爬取所有python职位信息