首页 > 代码库 > 一个简单的基于Tornado二手房信息统计项目的开发实现

一个简单的基于Tornado二手房信息统计项目的开发实现

Purpose

最近因为要买房子,扫过了各种信息,貌似lianjia上的数据还是靠点谱的(最起码房源图片没有太大的出入),心血来潮想着做几个图表来显示下房屋的数据信息,顺便练练手。

需求分析

1从lianjia的网站上获取关键的房屋信息数据,然后按照自己的需求通过图表显示出来。

2每天从lianjia的网站上获取一次数据

3以上海地区为主(本人在上海)

4最终生成图表有:房屋交易总量,二手房均价,在售房源,近90天成交量,昨日带看次数

分析获取网站数据

1 数据源

数据的获取主要是从两个地方:

  http://sh.lianjia.com/chengjiao/   //成交量数据统计获取

  页面上的数据(下面显示的是没登录前的量,貌似登录之后会比这个量要多一点):

技术分享

   http://sh.lianjia.com/ershoufang/  //二手房相关数据获取

  页面上数据:

技术分享

2 获取方法获取网页数据的话,首先想到的是scrapy,不过考虑到获取的数据不是很多很复杂,这里只用urllib.request来获取就可以了。后面因为使用到tornado的异步,所以会替换成httpclient.AsyncHTTPClient().fetch()。

 

3 使用urllib.request来获取相关数据。

首先,从网页上爬数据,使用obtain_page_data基础的函数:

 1 def obtain_page_data(target_url):
 2     with urllib.request.urlopen(target_url) as f:
 3         data = http://www.mamicode.com/f.read().decode(utf8)
 4     return data
obtain_page_data()函数的话,主要是访问给定页面,然后返回页面的数据

然后,获取了数据之后,要按照需求来获取网页上的数据,主要是两大块:

1)房屋总成交量(http://sh.lianjia.com/chengjiao/)

定义函数get_total_dealed_house(),函数最终是返回页面上的总成交量,那么在调用obtain_page_data()获取页面的data后,分析下这个数据是在哪个位置。

技术分享

那么看到数据一个div下,那么使用BeautifulSoup解析一下获取的html数据后,通过下面的命令来获取text数据:

dealed_house = soup_obj.html.body.find(‘div‘, {‘class‘: ‘list-head‘}).text

找到了text内容之后通过正则表达式过滤掉非数字的字符,然后就获取到了这个数据,具体如下:

1 def get_total_dealed_house(target_url):
2     # 获取总的房屋成交量
3     page_data =http://www.mamicode.com/ obtain_page_data(target_url)
4     soup_obj = BeautifulSoup(page_data,"html.parser")
5     dealed_house = soup_obj.html.body.find(div, {class: list-head}).text
6     dealed_house_num = re.findall(r\d+, dealed_house)[0]
7 
8     return int(dealed_house_num)

2)获取其他在线数据(http://sh.lianjia.com/ershoufang/)

类似的,要先分析自己要的数据在网页中的哪个位置,然后去获取,过滤,具体如下:

1 def get_online_data(target_url):
2     # 获取 城市挂牌均价,正在出售数量,90天内交易量,昨日看房次数
3     page_data =http://www.mamicode.com/ obtain_page_data(target_url)
4     soup_obj = BeautifulSoup(page_data, "html.parser")
5     online_data_str = soup_obj.html.body.find(div, {class: secondcon}).text
6     online_data = http://www.mamicode.com/online_data_str.replace(\n, ‘‘)
7     avg_price, on_sale, _, sold_in_90, yesterday_check_num = re.findall(r\d+, online_data)
8 
9     return {avg_price:avg_price,on_sale:on_sale,sold_in_90:sold_in_90,yesterday_check_num:yesterday_check_num}

3)数据整合/细分各区

使用shanghai_data_process()函数来整合一下1,2中获取的数据,另外lianjia网页上上海区域的数据其实是可以按照各个区来查询的,那么这里也做一下处理,如下:

 1 def shanghai_data_process():
 2     ‘‘‘
 3     获取上海各个区的数据
 4     :return:
 5     ‘‘‘
 6     chenjiao_page = "http://sh.lianjia.com/chengjiao/"
 7     ershoufang_page = "http://sh.lianjia.com/ershoufang/"
 8     sh_area_dict = {
 9         "all":"",
10         "pudongxinqu": "pudongxinqu/",
11         "minhang": "minhang/",
12         "baoshan": "baoshan/",
13         "xuhui": "xuhui/",
14         "putuo": "putuo/",
15         "yangpu": "yangpu/",
16         "changning": "changning/",
17         "songjiang": "songjiang/",
18         "jiading": "jiading/",
19         "huangpu": "huangpu/",
20         "jingan": "jingan/",
21         "zhabei": "zhabei/",
22         "hongkou": "hongkou/",
23         "qingpu": "qingpu/",
24         "fengxian": "fengxian/",
25         "jinshan": "jinshan/",
26         "chongming": "chongming/",
27         "shanghaizhoubian": "shanghaizhoubian/",
28     }
29     dealed_house_num = get_total_dealed_house(chenjiao_page)
30     sh_online_data =http://www.mamicode.com/ {}
31     for key,value in sh_area_dict.items():
32         sh_online_data[key] = get_online_data(ershoufang_page+sh_area_dict[key])
33     print("dealed_house_num %s" %dealed_house_num)
34     for key,value in sh_online_data.items():
35         print(key,value)

4)整体代码以及输出效果

技术分享
 1 import urllib.request
 2 import re
 3 from bs4 import BeautifulSoup
 4 import time
 5 
 6 def obtain_page_data(target_url):
 7     with urllib.request.urlopen(target_url) as f:
 8         data = http://www.mamicode.com/f.read().decode(utf8)
 9     return data
10 
11 def get_total_dealed_house(target_url):
12     # 获取总的房屋成交量
13     page_data =http://www.mamicode.com/ obtain_page_data(target_url)
14     soup_obj = BeautifulSoup(page_data,"html.parser")
15     dealed_house = soup_obj.html.body.find(div, {class: list-head}).text
16     dealed_house_num = re.findall(r\d+, dealed_house)[0]
17 
18     return int(dealed_house_num)
19 
20 def get_online_data(target_url):
21     # 获取 城市挂牌均价,正在出售数量,90天内交易量,昨日看房次数
22     page_data =http://www.mamicode.com/ obtain_page_data(target_url)
23     soup_obj = BeautifulSoup(page_data, "html.parser")
24     online_data_str = soup_obj.html.body.find(div, {class: secondcon}).text
25     online_data = http://www.mamicode.com/online_data_str.replace(\n, ‘‘)
26     avg_price, on_sale, _, sold_in_90, yesterday_check_num = re.findall(r\d+, online_data)
27 
28     return {avg_price:avg_price,on_sale:on_sale,sold_in_90:sold_in_90,yesterday_check_num:yesterday_check_num}
29 
30 def shanghai_data_process():
31     ‘‘‘
32     获取上海各个区的数据
33     :return:
34     ‘‘‘
35     chenjiao_page = "http://sh.lianjia.com/chengjiao/"
36     ershoufang_page = "http://sh.lianjia.com/ershoufang/"
37     sh_area_dict = {
38         "all":"",
39         "pudongxinqu": "pudongxinqu/",
40         "minhang": "minhang/",
41         "baoshan": "baoshan/",
42         "xuhui": "xuhui/",
43         "putuo": "putuo/",
44         "yangpu": "yangpu/",
45         "changning": "changning/",
46         "songjiang": "songjiang/",
47         "jiading": "jiading/",
48         "huangpu": "huangpu/",
49         "jingan": "jingan/",
50         "zhabei": "zhabei/",
51         "hongkou": "hongkou/",
52         "qingpu": "qingpu/",
53         "fengxian": "fengxian/",
54         "jinshan": "jinshan/",
55         "chongming": "chongming/",
56         "shanghaizhoubian": "shanghaizhoubian/",
57     }
58     dealed_house_num = get_total_dealed_house(chenjiao_page)
59     sh_online_data =http://www.mamicode.com/ {}
60     for key,value in sh_area_dict.items():
61         sh_online_data[key] = get_online_data(ershoufang_page+sh_area_dict[key])
62     print("dealed_house_num %s" %dealed_house_num)
63     for key,value in sh_online_data.items():
64         print(key,value)
65 
66 def main():
67     start_time = time.time()
68     shanghai_data_process()
69     print("time cost: %s" % (time.time() - start_time))
70 
71 
72 if __name__==__main__:
73     main()
初版源码collect_data.py

Result:

技术分享
 1 dealed_house_num 51691
 2 zhabei {yesterday_check_num: 1050, sold_in_90: 533, avg_price: 67179, on_sale: 1674}
 3 changning {yesterday_check_num: 1861, sold_in_90: 768, avg_price: 77977, on_sale: 2473}
 4 baoshan {yesterday_check_num: 2232, sold_in_90: 1410, avg_price: 48622, on_sale: 4655}
 5 putuo {yesterday_check_num: 1695, sold_in_90: 910, avg_price: 64942, on_sale: 3051}
 6 qingpu {yesterday_check_num: 463, sold_in_90: 253, avg_price: 40801, on_sale: 1382}
 7 jinshan {yesterday_check_num: 0, sold_in_90: 8, avg_price: 20370, on_sale: 11}
 8 chongming {yesterday_check_num: 0, sold_in_90: 3, avg_price: 26755, on_sale: 9}
 9 all {yesterday_check_num: 28682, sold_in_90: 14550, avg_price: 59987, on_sale: 49396}
10 jingan {yesterday_check_num: 643, sold_in_90: 277, avg_price: 91689, on_sale: 896}
11 xuhui {yesterday_check_num: 2526, sold_in_90: 878, avg_price: 80623, on_sale: 3254}
12 songjiang {yesterday_check_num: 1571, sold_in_90: 930, avg_price: 44367, on_sale: 3294}
13 yangpu {yesterday_check_num: 2774, sold_in_90: 981, avg_price: 67976, on_sale: 2886}
14 pudongxinqu {yesterday_check_num: 7293, sold_in_90: 3417, avg_price: 62101, on_sale: 12767}
15 shanghaizhoubian {yesterday_check_num: 0, sold_in_90: 2, avg_price: 24909, on_sale: 15}
16 minhang {yesterday_check_num: 3271, sold_in_90: 1989, avg_price: 54968, on_sale: 5862}
17 hongkou {yesterday_check_num: 936, sold_in_90: 444, avg_price: 71654, on_sale: 1605}
18 fengxian {yesterday_check_num: 346, sold_in_90: 557, avg_price: 30423, on_sale: 1279}
19 jiading {yesterday_check_num: 875, sold_in_90: 767, avg_price: 41609, on_sale: 2846}
20 huangpu {yesterday_check_num: 1146, sold_in_90: 423, avg_price: 93880, on_sale: 1437}
21 time cost: 12.94211196899414
Result

 

移植到tornado上

1 为什么要使用tornado

tornado是一个小巧的异步的python框架,这里使用到它是因为在发送request获取网页数据(IO密集)其实可以使用异步来提高效率,特别是在后期访问量大的时候,使用tornado会提高效率。

2 移植上面初步获取数据功能到tornado上

这里的关键点有这么几个:

1)异步获取网页数据

    使用httpclient.AsyncHTTPClient().fetch()来获取页面数据,配合使用gen.coroutine+yield来实现异步。

2)返回数据的时候要使用raise gen.Return(data)

3)初步改造后的版本以及运行结果如下:

技术分享
 1 import re
 2 from bs4 import BeautifulSoup
 3 import time
 4 from tornado import httpclient,gen,ioloop
 5 
 6 @gen.coroutine
 7 def obtain_page_data(target_url):
 8     response = yield httpclient.AsyncHTTPClient().fetch(target_url)
 9     data = http://www.mamicode.com/response.body.decode(utf8)
10     print("start %s %s" %(target_url,time.time()))
11 
12     raise gen.Return(data)
13 
14 @gen.coroutine
15 def get_total_dealed_house(target_url):
16     # 获取总的房屋成交量
17     page_data = http://www.mamicode.com/yield obtain_page_data(target_url)
18     soup_obj = BeautifulSoup(page_data,"html.parser")
19     dealed_house = soup_obj.html.body.find(div, {class: list-head}).text
20     dealed_house_num = re.findall(r\d+, dealed_house)[0]
21 
22     raise gen.Return(int(dealed_house_num))
23 
24 @gen.coroutine
25 def get_online_data(target_url):
26     # 获取 城市挂牌均价,正在出售数量,90天内交易量,昨日看房次数
27     page_data = http://www.mamicode.com/yield obtain_page_data(target_url)
28     soup_obj = BeautifulSoup(page_data, "html.parser")
29     online_data_str = soup_obj.html.body.find(div, {class: secondcon}).text
30     online_data = http://www.mamicode.com/online_data_str.replace(\n, ‘‘)
31     avg_price, on_sale, _, sold_in_90, yesterday_check_num = re.findall(r\d+, online_data)
32 
33     raise gen.Return({avg_price:avg_price,on_sale:on_sale,sold_in_90:sold_in_90,yesterday_check_num:yesterday_check_num})
34 
35 @gen.coroutine
36 def shanghai_data_process():
37     ‘‘‘
38     获取上海各个区的数据
39     :return:
40     ‘‘‘
41     start_time = time.time()
42     chenjiao_page = "http://sh.lianjia.com/chengjiao/"
43     ershoufang_page = "http://sh.lianjia.com/ershoufang/"
44     dealed_house_num = yield get_total_dealed_house(chenjiao_page)
45     sh_area_dict = {
46         "all": "",
47         "pudongxinqu": "pudongxinqu/",
48         "minhang": "minhang/",
49         "baoshan": "baoshan/",
50         "xuhui": "xuhui/",
51         "putuo": "putuo/",
52         "yangpu": "yangpu/",
53         "changning": "changning/",
54         "songjiang": "songjiang/",
55         "jiading": "jiading/",
56         "huangpu": "huangpu/",
57         "jingan": "jingan/",
58         "zhabei": "zhabei/",
59         "hongkou": "hongkou/",
60         "qingpu": "qingpu/",
61         "fengxian": "fengxian/",
62         "jinshan": "jinshan/",
63         "chongming": "chongming/",
64         "shanghaizhoubian": "shanghaizhoubian/",
65     }
66     sh_online_data =http://www.mamicode.com/ {}
67     for key,value in sh_area_dict.items():
68         sh_online_data[key] = yield get_online_data(ershoufang_page+sh_area_dict[key])
69     print("dealed_house_num %s" %dealed_house_num)
70     for key,value in sh_online_data.items():
71         print(key,value)
72 
73     print("tornado time cost: %s" %(time.time()-start_time) )
74 
75 
76 if __name__==__main__:
77     io_loop = ioloop.IOLoop.current()
78     io_loop.run_sync(shanghai_data_process)
tornado初版
技术分享
 1 start http://sh.lianjia.com/chengjiao/ 1480320585.879013
 2 start http://sh.lianjia.com/ershoufang/jinshan/ 1480320586.575354
 3 start http://sh.lianjia.com/ershoufang/chongming/ 1480320587.017322
 4 start http://sh.lianjia.com/ershoufang/yangpu/ 1480320587.515317
 5 start http://sh.lianjia.com/ershoufang/hongkou/ 1480320588.051793
 6 start http://sh.lianjia.com/ershoufang/fengxian/ 1480320588.593865
 7 start http://sh.lianjia.com/ershoufang/jiading/ 1480320589.134367
 8 start http://sh.lianjia.com/ershoufang/qingpu/ 1480320589.6134
 9 start http://sh.lianjia.com/ershoufang/pudongxinqu/ 1480320590.215136
10 start http://sh.lianjia.com/ershoufang/putuo/ 1480320590.696576
11 start http://sh.lianjia.com/ershoufang/zhabei/ 1480320591.34218
12 start http://sh.lianjia.com/ershoufang/changning/ 1480320591.935762
13 start http://sh.lianjia.com/ershoufang/xuhui/ 1480320592.5159
14 start http://sh.lianjia.com/ershoufang/minhang/ 1480320593.096085
15 start http://sh.lianjia.com/ershoufang/songjiang/ 1480320593.749226
16 start http://sh.lianjia.com/ershoufang/ 1480320594.306287
17 start http://sh.lianjia.com/ershoufang/shanghaizhoubian/ 1480320594.807418
18 start http://sh.lianjia.com/ershoufang/huangpu/ 1480320595.2744
19 start http://sh.lianjia.com/ershoufang/jingan/ 1480320595.850909
20 start http://sh.lianjia.com/ershoufang/baoshan/ 1480320596.368479
21 dealed_house_num 51691
22 jinshan {yesterday_check_num: 0, on_sale: 11, avg_price: 20370, sold_in_90: 8}
23 yangpu {yesterday_check_num: 2774, on_sale: 2886, avg_price: 67976, sold_in_90: 981}
24 hongkou {yesterday_check_num: 936, on_sale: 1605, avg_price: 71654, sold_in_90: 444}
25 fengxian {yesterday_check_num: 346, on_sale: 1279, avg_price: 30423, sold_in_90: 557}
26 chongming {yesterday_check_num: 0, on_sale: 9, avg_price: 26755, sold_in_90: 3}
27 pudongxinqu {yesterday_check_num: 7293, on_sale: 12767, avg_price: 62101, sold_in_90: 3417}
28 putuo {yesterday_check_num: 1695, on_sale: 3051, avg_price: 64942, sold_in_90: 910}
29 zhabei {yesterday_check_num: 1050, on_sale: 1674, avg_price: 67179, sold_in_90: 533}
30 changning {yesterday_check_num: 1861, on_sale: 2473, avg_price: 77977, sold_in_90: 768}
31 baoshan {yesterday_check_num: 2232, on_sale: 4655, avg_price: 48622, sold_in_90: 1410}
32 xuhui {yesterday_check_num: 2526, on_sale: 3254, avg_price: 80623, sold_in_90: 878}
33 minhang {yesterday_check_num: 3271, on_sale: 5862, avg_price: 54968, sold_in_90: 1989}
34 songjiang {yesterday_check_num: 1571, on_sale: 3294, avg_price: 44367, sold_in_90: 930}
35 all {yesterday_check_num: 28682, on_sale: 49396, avg_price: 59987, sold_in_90: 14550}
36 shanghaizhoubian {yesterday_check_num: 0, on_sale: 15, avg_price: 24909, sold_in_90: 2}
37 jingan {yesterday_check_num: 643, on_sale: 896, avg_price: 91689, sold_in_90: 277}
38 jiading {yesterday_check_num: 875, on_sale: 2846, avg_price: 41609, sold_in_90: 767}
39 qingpu {yesterday_check_num: 463, on_sale: 1382, avg_price: 40801, sold_in_90: 253}
40 huangpu {yesterday_check_num: 1146, on_sale: 1437, avg_price: 93880, sold_in_90: 423}
41 tornado time cost: 10.953541040420532
初版运行结果

 

存储数据到数据库中

这里我使用的是mysql数据库,那么在tornado中可以使用pymysql来连接数据库,并且我这里使用了sqlalchemy来完成程序中的DML。

sqlalchemy部分的内容详见这里

1)表结构

这里需要的表不是很多,如下:

sh_area   //上海区域表,存放上海各个区域
技术分享

sh_total_city_dealed  //上海地区二手房总成交量

技术分享

online_data  //上海各区二手房数据

技术分享

2) 使用sqlalchemy来初始化表

settings中设置的是数据库连接相关内容。

技术分享
 1 from sqlalchemy import create_engine
 2 from sqlalchemy.orm import sessionmaker
 3 DB={
 4     connector:mysql+pymysql://root:xxxxx@127.0.0.1:3306/devdb1,
 5     max_session:5
 6 }
 7 
 8 engine = create_engine(DB[connector], max_overflow= DB[max_session], echo= False)
 9 SessionCls = sessionmaker(bind=engine)
10 session = SessionCls()
settings.py

初始化脚本

技术分享
 1 from sqlalchemy.ext.declarative import declarative_base
 2 from sqlalchemy import Column,Integer,String,ForeignKey,DateTime
 3 
 4 import os,sys
 5 BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
 6 sys.path.append(BASE_DIR)
 7 
 8 from conf import settings
 9 
10 Base = declarative_base()
11 
12 class SH_Area(Base):
13     __tablename__ = sh_area  # 表名
14     id = Column(Integer, primary_key=True)
15     name = Column(String(64))
16 
17 class Online_Data(Base):
18     __tablename__ = online_data  # 表名
19     id = Column(Integer, primary_key=True)
20     sold_in_90 = Column(Integer)
21     avg_price = Column(Integer)
22     yesterday_check_num = Column(Integer)
23     on_sale = Column(Integer)
24     date = Column(DateTime)
25     belong_area = Column(Integer,ForeignKey(sh_area.id))
26 
27 class SH_Total_city_dealed(Base):
28     __tablename__ = sh_total_city_dealed  # 表名
29     id = Column(Integer, primary_key=True)
30     dealed_house_num = Column(Integer)
31     date = Column(DateTime)
32     memo = Column(String(64),nullable=True)
33 
34 def db_init():
35     Base.metadata.create_all(settings.engine)  # 创建表结构
36     for district in settings.sh_area_dict.keys():
37         item_obj = SH_Area(name = district)
38         settings.session.add(item_obj)
39     settings.session.commit()
40 
41 
42 if __name__ == __main__:
43     db_init()
database_init

 

图表绘制

1前端绘制

图表绘制的话,这里我使用的是Highcharts。图形比较美观,使用的时候只需要提供需要的数据即可。

我使用的是基础折线图,需要在前端引入几个js文件,如下:jquery.min.js,highcharts.js,exporting.js。然后添加一个div,使用id来标示这个div,样例中使用的是id="container"

技术分享

官方js部分的代码如下:

技术分享
 1 $(function () {
 2     $(‘#container‘).highcharts({
 3         title: {
 4             text: ‘Monthly Average Temperature‘,
 5             x: -20 //center
 6         },
 7         subtitle: {
 8             text: ‘Source: WorldClimate.com‘,
 9             x: -20
10         },
11         xAxis: {
12             categories: [‘Jan‘, ‘Feb‘, ‘Mar‘, ‘Apr‘, ‘May‘, ‘Jun‘,
13                          ‘Jul‘, ‘Aug‘, ‘Sep‘, ‘Oct‘, ‘Nov‘, ‘Dec‘]
14         },
15         yAxis: {
16             title: {
17                 text: ‘Temperature (°C)‘
18             },
19             plotLines: [{
20                 value: 0,
21                 width: 1,
22                 color: ‘#808080‘
23             }]
24         },
25         tooltip: {
26             valueSuffix: ‘°C‘
27         },
28         legend: {
29             layout: ‘vertical‘,
30             align: ‘right‘,
31             verticalAlign: ‘middle‘,
32             borderWidth: 0
33         },
34         series: [{
35             name: ‘Tokyo‘,
36             data: [7.0, 6.9, 9.5, 14.5, 18.2, 21.5, 25.2, 26.5, 23.3, 18.3, 13.9, 9.6]
37         }, {
38             name: ‘New York‘,
39             data: [-0.2, 0.8, 5.7, 11.3, 17.0, 22.0, 24.8, 24.1, 20.1, 14.1, 8.6, 2.5]
40         }, {
41             name: ‘Berlin‘,
42             data: [-0.9, 0.6, 3.5, 8.4, 13.5, 17.0, 18.6, 17.9, 14.3, 9.0, 3.9, 1.0]
43         }, {
44             name: ‘London‘,
45             data: [3.9, 4.2, 5.7, 8.5, 11.9, 15.2, 17.0, 16.6, 14.2, 10.3, 6.6, 4.8]
46         }]
47     });
48 });
官方js

我的工作是在这个基础上,修改js内容来画出符合自己的图。

具体的参考github上代码中的修改,最后画出来的图是这样的。

技术分享

 

2 后端获取数据并传输给前端

基本上前端表哥需要的数据是一维或者二维数组,比如横坐标时间数组[time1,time2,time3],纵坐标数据数组[data1,data2,data3]这样子。

这里需要注意几点:

1)tornado后端返回数据,使用render()函数渲染到指定的页面即可。

2) js中使用{{ data_rendered }}来获取数据

3)后端传入前端的时间数据为timestamp时间戳,这里需要format一下显示,如下:

技术分享
 1 function formatDate(timestamp_v) {
 2                   var now = new Date(parseFloat(timestamp_v)*1000);
 3                   var   year=now.getFullYear();
 4                   var   month=now.getMonth()+1;
 5                   var   date=now.getDate();
 6                   var   hour=now.getHours();
 7                   var   minute=now.getMinutes();
 8                   var   second=now.getSeconds();
 9                   return   year+"-"+month+"-"+date+"   "+hour+":"+minute+":"+second;
10 
11             };
formatDate

4)注意js部分二维数组的定义处理

 

3 前端请求传给后端参数

因为需求中可以查询上海各个区的图表,那么可以设计访问地址为r‘/view/(\w+)/(\w+)‘,这样前面是city(比如sh,bj等)后面是具体的哪个区area。后端接收到这两个参数后去数据库中查找数据并返回。

最终成型

在数据库中有了数据之后,后面的内容就是前端后端数据的交互,在前端哪些地方绘制图表,需要什么数据,后端返回即可,最终主要的代码是这样的:

技术分享
  1 import re
  2 from bs4 import BeautifulSoup
  3 import datetime
  4 import time
  5 from tornado import httpclient,gen,ioloop,httpserver
  6 from tornado import web
  7 import tornado.options
  8 import json
  9 
 10 import os,sys
 11 BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
 12 sys.path.append(BASE_DIR)
 13 
 14 from conf import settings
 15 from database_init import Online_Data,SH_Total_city_dealed,SH_Area
 16 from tornado.options import define,options
 17 
 18 define("port",default=8888,type=int)
 19 
 20 
 21 @gen.coroutine
 22 def obtain_page_data(target_url):
 23     response = yield httpclient.AsyncHTTPClient().fetch(target_url)
 24     data = http://www.mamicode.com/response.body.decode(utf8)
 25     print("start %s %s" %(target_url,time.time()))
 26 
 27     raise gen.Return(data)
 28 
 29 @gen.coroutine
 30 def get_total_dealed_house(target_url):
 31     # 获取总的房屋成交量
 32     page_data = http://www.mamicode.com/yield obtain_page_data(target_url)
 33     soup_obj = BeautifulSoup(page_data,"html.parser")
 34     dealed_house = soup_obj.html.body.find(div, {class: list-head}).text
 35     dealed_house_num = re.findall(r\d+, dealed_house)[0]
 36 
 37     raise gen.Return(int(dealed_house_num))
 38 
 39 @gen.coroutine
 40 def get_online_data(target_url):
 41     # 获取 城市挂牌均价,正在出售数量,90天内交易量,昨日看房次数
 42     page_data = http://www.mamicode.com/yield obtain_page_data(target_url)
 43     soup_obj = BeautifulSoup(page_data, "html.parser")
 44     online_data_str = soup_obj.html.body.find(div, {class: secondcon}).text
 45     online_data = http://www.mamicode.com/online_data_str.replace(\n, ‘‘)
 46     avg_price, on_sale, _, sold_in_90, yesterday_check_num = re.findall(r\d+, online_data)
 47 
 48     raise gen.Return({avg_price:avg_price,on_sale:on_sale,sold_in_90:sold_in_90,yesterday_check_num:yesterday_check_num})
 49 
 50 @gen.coroutine
 51 def shanghai_data_process():
 52     ‘‘‘
 53     获取上海各个区的数据
 54     :return:
 55     ‘‘‘
 56     start_time = time.time()
 57     chenjiao_page = "http://sh.lianjia.com/chengjiao/"
 58     ershoufang_page = "http://sh.lianjia.com/ershoufang/"
 59     dealed_house_num = yield get_total_dealed_house(chenjiao_page)
 60     sh_online_data =http://www.mamicode.com/ {}
 61     for key,value in settings.sh_area_dict.items():
 62         sh_online_data[key] = yield get_online_data(ershoufang_page+settings.sh_area_dict[key])
 63     print("dealed_house_num %s" %dealed_house_num)
 64     for key,value in sh_online_data.items():
 65         print(key,value)
 66 
 67     print("tornado time cost: %s" %(time.time()-start_time) )
 68 
 69     #settings.session
 70     update_date = datetime.datetime.now()
 71     dealed_house_num_obj = SH_Total_city_dealed(dealed_house_num=dealed_house_num,
 72                                                 date = update_date)
 73     settings.session.add(dealed_house_num_obj)
 74 
 75     for key,value in sh_online_data.items():
 76         area_obj = settings.session.query(SH_Area).filter_by(name=key).first()
 77         online_data_obj = Online_Data(sold_in_90 = value[sold_in_90],
 78                                       avg_price = value[avg_price],
 79                                       yesterday_check_num = value[yesterday_check_num],
 80                                       on_sale = value[on_sale],
 81                                       date = update_date,
 82                                       belong_area = area_obj.id)
 83         settings.session.add(online_data_obj)
 84     settings.session.commit()
 85 
 86 class IndexHandler(web.RequestHandler):
 87     def get(self, *args, **kwargs):
 88         total_dealed_house_num = settings.session.query(SH_Total_city_dealed).all()
 89         cata_list = []
 90         data_list = []
 91         for item in total_dealed_house_num:
 92             cata_list.append(time.mktime(item.date.timetuple()))
 93             data_list.append(item.dealed_house_num)
 94 
 95         area_id = settings.session.query(SH_Area).filter_by(name=all).first()
 96         area_avg_price = settings.session.query(Online_Data).filter_by(belong_area = area_id.id).all()
 97         area_date_list = []
 98         area_data_list = []
 99         area_on_sale_list = []
100         area_sold_in_90_list = []
101         area_yesterday_check_num = []
102         for item in area_avg_price:
103             area_date_list.append(time.mktime(item.date.timetuple()))
104             area_data_list.append(item.avg_price)
105             area_on_sale_list.append([time.mktime(item.date.timetuple()),item.on_sale])
106             area_sold_in_90_list.append(item.sold_in_90)
107             area_yesterday_check_num.append(item.yesterday_check_num)
108         self.render("index.html",cata_list=cata_list,
109                     data_list=data_list,area_date_list = area_date_list,area_data_list = area_data_list,
110                     area_on_sale_list = area_on_sale_list,area_sold_in_90_list=area_sold_in_90_list,
111                     area_yesterday_check_num = area_yesterday_check_num,city="sh",area="all")
112 
113 class QueryHandler(web.RequestHandler):
114     def get(self,city,area):
115 
116         if city == "sh":
117             total_dealed_house_num = settings.session.query(SH_Total_city_dealed).all()
118 
119             cata_list = []
120             data_list = []
121             for item in total_dealed_house_num:
122                 cata_list.append(time.mktime(item.date.timetuple()))
123                 data_list.append(item.dealed_house_num)
124 
125             area_id = settings.session.query(SH_Area).filter_by(name=area).first()
126             area_avg_price = settings.session.query(Online_Data).filter_by(belong_area=area_id.id).all()
127             area_date_list = []
128             area_data_list = []
129             area_on_sale_list = []
130             area_sold_in_90_list = []
131             area_yesterday_check_num = []
132             for item in area_avg_price:
133                 area_date_list.append(time.mktime(item.date.timetuple()))
134                 area_data_list.append(item.avg_price)
135                 area_on_sale_list.append([time.mktime(item.date.timetuple()), item.on_sale])
136                 area_sold_in_90_list.append(item.sold_in_90)
137                 area_yesterday_check_num.append(item.yesterday_check_num)
138 
139             self.render("index.html", cata_list=cata_list,
140                         data_list=data_list, area_date_list=area_date_list, area_data_list=area_data_list,
141                         area_on_sale_list=area_on_sale_list, area_sold_in_90_list=area_sold_in_90_list,
142                         area_yesterday_check_num=area_yesterday_check_num,city=city,area=area)
143         else:
144             self.redirect("/")
145 
146 
147 
148 
149 class MyApplication(web.Application):
150     def __init__(self):
151         handlers = [
152             (r/,IndexHandler),
153             (r/view/(\w+)/(\w+),QueryHandler),
154 
155         ]
156 
157         settings = {
158             static_path: os.path.join(os.path.dirname(os.path.dirname(__file__)), "static"),
159             template_path: os.path.join(os.path.dirname(os.path.dirname(__file__)), "templates"),
160         }
161 
162         super(MyApplication,self).__init__(handlers,**settings)
163 
164 # ioloop.PeriodicCallback(f2s, 2000).start()
165 
166 if __name__==__main__:
167     http_server = httpserver.HTTPServer(MyApplication())
168     http_server.listen(options.port)
169     ioloop.PeriodicCallback(shanghai_data_process,86400000).start() #毫秒 86400000
170     ioloop.IOLoop.instance().start()
data_collect

几点说明:

1 因为要定期去网页上获取数据,这里使用了ioloop.PeriodicCallback()函数来定时处理。

 

结合nginx部署

自己有一台AWS 的EC2虚机,操作系统是centos7,最后是要把程序放到上面去跑。

1 安装部署nginx

  因为时间关系没有做过深入的研究,只是从网上翻了下几本的东西,如下:

1 使用wget下载nginx包(nginx-1.11.6.tar.gz),并解压
2 进入nginx-1.11.6
3 ./configure
4 make
5 make install

配置文件修改/usr/local/nginx/conf/nginx.conf

reload nginx 使用 /usr/local/nginx/sbin/nginx -s reload

2 调整虚机的inbound 防火墙规则,我添加的是80端口(nginx配置文件中同样监听80端口)

1、登录到AWS console主界面
2、左侧INSTANCES-Instances
3、右侧group security
4、下面inbounds
5、edit
6、edit inbounds rules页面中自己添加规则

3 测试访问nginx

如果正常,会显示Welcome nginx的页面

4 运行tornadao代码后reload nginx

 

效果图以及代码

1 几个效果图如下:

技术分享

 

技术分享

 

技术分享

 

技术分享

 

技术分享

 

2 代码放在github上

 

解决sqlalchemy session问题

在代码运行之后的几天发现,每隔大约半天的时间,程序虽然不会挂掉,但是在浏览器访问的时候会出现500 error。后台日志中也会报访问的错误。

仔细研究了下后台日志的报错,发现应该是浏览器使用旧的session信息来访问,但是session信息在程序中已经过期,所以导致错误。仔细审查了下代码,确实是在settings文件中初始化了一个session,然后后面所有的DB相关操作都用了这个session。显然是有问题的。

 

解决办法其实很简单,只要把数据库session的生命周期与http 每次request的生命周期放在一起即可。也就说在每次http request开始的时候初始化一个db session,然后在每次reqeust结束的时候close掉这个db session即可。可以参考下flask框架中这部分内容的介绍。

 

1 sqlalchemy部分

为了实现上述的说明,sqlalchemy 这边需要使用一个新的对象scoped_session,官方示例如下:

1 >>> from sqlalchemy.orm import scoped_session
2 >>> from sqlalchemy.orm import sessionmaker
3 
4 #创建session
5 >>> session_factory = sessionmaker(bind=some_engine)
6 >>> Session = scoped_session(session_factory)
7 
8 #关闭session
9 >>> Session.remove()

更多的说明参考这里。

2 tornado 部分

在RequestHandler中重写initialize()和on_finish()两个函数。initialize()函数中初始化db session,而在on_finish()的时候结束这个db session。BaseHandler是一个基础的handler,其他request handler 只需要继承 BaseHandler即可。

1 class BaseHandler(web.RequestHandler):
2     def initialize(self):
3         self.db_session = scoped_session(sessionmaker(bind=settings.engine))
4         self.db_query = self.db_session().query
5 
6     def on_finish(self):
7         self.db_session.remove()

 

一个简单的基于Tornado二手房信息统计项目的开发实现