首页 > 代码库 > 新浪微博数据挖掘食谱之五: 保存篇 (json mongodb格式)
新浪微博数据挖掘食谱之五: 保存篇 (json mongodb格式)
#!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2015-1-1 @author: beyondzhou @name: json_data_mongodb.py ''' ''' Config windows service for mongodb Configure a windows service for MongoDb The following procedure assumes you have installed MongoDB using the MSI installer, with the default path C:\Program Files\MongoDB 2.6 Standard 1. Open an Administrator command prompt windows/cmd 2. Create directories cd c: mkdir c:\data\db mkdir c:\data\log 3. Create a configuration file echo logpath=c:\data\log\mongod.log> "C:\Program Files\MongoDB 2.6 Standard\mongod.cfg" echo dbpath=c:\data\db>> "C:\Program Files\MongoDB 2.6 Standard\mongod.cfg" 4. Create the MongoDB service sc.exe create MongoDB binPath= "\"C:\Program Files\MongoDB 2.6 Standard\bin\mongod.exe\" --service --config=\"C:\Program Files\MongoDB 2.6 Standard\mongod.cfg\"" DisplayName= "MongoDB 2.6 Standard" start= "auto" 5. Start the MongoDb service net start MongoDB 6. Stop or remove the MongoDB service as needed To stop the MongoDB service, use the following command: net stop MongoDB To remove the MongoDB service, first stop the service and then run the following command sc.exe delete MongoDB ''' # Get public timeline of sina weibo and save json response data into mongodb def json_data_mongodb(): # import from login import weibo_login import json from data import save_to_mongo, load_from_mongo from bson import json_util # Access to sina api weibo_api = weibo_login() # Get public timeline public_timeline = weibo_api.statuses.public_timeline.get(count=200) # Output the public timeline # print json.dumps(public_timeline, indent=1) # Save the json data into mongodb save_to_mongo(public_timeline, 'public_timeline', 'publicTimeline') # Read the json data from mongodb results = load_from_mongo('public_timeline', 'publicTimeline') print json.dumps(results, indent=1, default=json_util.default) if __name__ == '__main__': json_data_mongodb()
# Save json data into mongo def save_to_mongo(data, mongo_db, mongo_db_coll, **mongo_conn_kw): import pymongo # Connect to the MongoDB server running on # localhost:27017 by default client = pymongo.MongoClient(**mongo_conn_kw) # Get a reference to a particular database db = client[mongo_db] # Reference a particular collection in the database coll = db[mongo_db_coll] # Perform a bulk insert and return IDs return coll.insert(data) # Load json data from mongo def load_from_mongo(mongo_db, mongo_db_coll, return_cursor=False, criteria=None, projection=None, **mongo_conn_kw): import pymongo client = pymongo.MongoClient(**mongo_conn_kw) db = client[mongo_db] coll = db[mongo_db_coll] if criteria is None: criteria = {} if projection is None: cursor = coll.find(criteria) else: cursor = coll.find(criteria, projection) # Returning a cursor for large number of data if return_cursor: return cursor else: return [item for item in cursor]
Result:
"reposts_count": 0, "mid": "3794113072035799", "idstr": "3794113072035799", "geo": null, "source": "<a href=http://www.mamicode.com/"http://app.weibo.com/t/feed/380tOv/" rel=/"nofollow/">/u7c89/u4e1d/u7ea2/u5305", >新浪微博数据挖掘食谱之五: 保存篇 (json mongodb格式)
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。