首页 > 代码库 > 新浪微博数据挖掘食谱之五: 保存篇 (json mongodb格式)

新浪微博数据挖掘食谱之五: 保存篇 (json mongodb格式)

#!/usr/bin/python 
# -*- coding: utf-8 -*-

'''
Created on 2015-1-1
@author: beyondzhou
@name: json_data_mongodb.py
'''

'''
Config windows service for mongodb

Configure a windows service for MongoDb

The following procedure assumes you have installed MongoDB using the MSI installer, with the default path C:\Program Files\MongoDB 2.6 Standard

1. Open an Administrator command prompt
windows/cmd

2. Create directories
cd c:
mkdir c:\data\db
mkdir c:\data\log

3. Create a configuration file
echo logpath=c:\data\log\mongod.log> "C:\Program Files\MongoDB 2.6 Standard\mongod.cfg"
echo dbpath=c:\data\db>> "C:\Program Files\MongoDB 2.6 Standard\mongod.cfg"

4. Create the MongoDB service
sc.exe create MongoDB binPath= "\"C:\Program Files\MongoDB 2.6 Standard\bin\mongod.exe\" --service --config=\"C:\Program Files\MongoDB 2.6 Standard\mongod.cfg\"" DisplayName= "MongoDB 2.6 Standard" start= "auto"

5. Start the MongoDb service
net start MongoDB

6. Stop or remove the MongoDB service as needed
To stop the MongoDB service, use the following command:
net stop MongoDB

To remove the MongoDB service, first stop the service and then run the following command
sc.exe delete MongoDB

'''

# Get public timeline of sina weibo and save json response data into mongodb
def json_data_mongodb():
    
    # import 
    from login import weibo_login
    import json
    from data import save_to_mongo, load_from_mongo
    from bson import json_util
        
    # Access to sina api
    weibo_api = weibo_login()
    
    # Get public timeline
    public_timeline = weibo_api.statuses.public_timeline.get(count=200)

    # Output the public timeline
    # print json.dumps(public_timeline, indent=1)
    
    # Save the json data into mongodb
    save_to_mongo(public_timeline, 'public_timeline', 'publicTimeline')
    
    # Read the json data from mongodb
    results = load_from_mongo('public_timeline', 'publicTimeline')
    print json.dumps(results, indent=1, default=json_util.default)
    
if __name__ == '__main__':
    json_data_mongodb()
# Save json data into mongo
def save_to_mongo(data, mongo_db, mongo_db_coll, **mongo_conn_kw):
    
    import pymongo
      
    # Connect to the MongoDB server running on
    # localhost:27017 by default
    client = pymongo.MongoClient(**mongo_conn_kw)
    
    # Get a reference to a particular database
    db = client[mongo_db]
    
    # Reference a particular collection in the database
    coll = db[mongo_db_coll]
    
    # Perform a bulk insert and return IDs
    return coll.insert(data)

# Load json data from mongo
def load_from_mongo(mongo_db, mongo_db_coll, return_cursor=False,
                    criteria=None, projection=None, **mongo_conn_kw):
    import pymongo
    
    client = pymongo.MongoClient(**mongo_conn_kw)
    db = client[mongo_db]
    coll = db[mongo_db_coll]
    
    if criteria is None:
        criteria = {}
        
    if projection is None:
        cursor = coll.find(criteria)
    else:
        cursor = coll.find(criteria, projection)
        
    # Returning a cursor for large number of data
    if return_cursor:
        return cursor
    else:
        return [item for item in cursor]

Result:

    "reposts_count": 0, 
    "mid": "3794113072035799", 
    "idstr": "3794113072035799", 
    "geo": null, 
    "source": "<a href=http://www.mamicode.com/"http://app.weibo.com/t/feed/380tOv/" rel=/"nofollow/">/u7c89/u4e1d/u7ea2/u5305", >

新浪微博数据挖掘食谱之五: 保存篇 (json mongodb格式)