首页 > 代码库 > Python 规范化LinkedIn用户的联系人所在公司后缀 (data normalization)
Python 规范化LinkedIn用户的联系人所在公司后缀 (data normalization)
CODE:
#!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-8-19 @author: guaguastd @name: company_suffix_normalize.py ''' # import json import os import csv from collections import Counter from operator import itemgetter from prettytable import PrettyTable # specify csv directory CSV_FILE = os.path.join(r"E:", "\\", "eclipse", "LinkedIn", "dfile", "my_connections.csv") # define a set of transforms that converts the first item # to the second item transforms = [(', Inc.', ''), (', Inc', ''), (', LLC', ''), (', LLP', ''), (' LLC', ''), (' Inc.', ''), (' Inc', '')] csvReader = csv.DictReader(open(CSV_FILE), delimiter=',', quotechar='"') contacts = [row for row in csvReader] companies = [c['Company'].strip() for c in contacts if c['Company'].strip() != ''] for i, _ in enumerate(companies): for transform in transforms: companies[i] = companies[i].replace(*transform) pt = PrettyTable(field_names=['Company', 'Freq']) pt.align = 'l' c = Counter(companies) [pt.add_row([company, freq]) for (company, freq) in sorted(c.items(), key=itemgetter(1), reverse=True) if freq > 0] print pt
RESULT:
+---------------------------------------+------+ | Company | Freq | +---------------------------------------+------+ | ?????????? | 1 | | ?? | 1 | | SoftTalent Consulting ??????????????? | 1 | | SJTU | 1 | | WatchGuard Technologies | 1 | | Hebei Meishen Chemical Group CO.,Ltd | 1 | | Bloomberg LP | 1 | | DiHao trading Co.,Ltd | 1 | | CET | 1 | | Pica8 | 1 | | Microsoft | 1 | +---------------------------------------+------+
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。