首页 > 代码库 > 统计词频
统计词频
import re
from collections import Counter
string = """ Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Nunc ut elit id mi ultricies
adipiscing. Nulla facilisi. Praesent pulvinar,
sapien vel feugiat vestibulum, nulla dui pretium orci,
non ultricies elit lacus quis ante. Lorem ipsum dolor
sit amet, consectetur adipiscing elit. Aliquam
pretium ullamcorper urna quis iaculis. Etiam ac massa
sed turpis tempor luctus. Curabitur sed nibh eu elit
mollis congue. Praesent ipsum diam, consectetur vitae
ornare a, aliquam a nunc. In id magna pellentesque
tellus posuere adipiscing. Sed non mi metus, at lacinia
augue. Sed magna nisi, ornare in mollis in, mollis
sed nunc. Etiam at justo in leo congue mollis.
Nullam in neque eget metus hendrerit scelerisque
eu non enim. Ut malesuada lacus eu nulla bibendum
id euismod urna sodales. """
words = re.findall(r‘\w+‘, string) #This finds words in the document
lower_words = [word.lower() for word in words] #lower all the words
word_counts = Counter(lower_words) #counts the number each time a word appears
print word_counts
# Counter({‘elit‘: 5, ‘sed‘: 5, ‘in‘: 5, ‘adipiscing‘: 4, ‘mollis‘: 4, ‘eu‘: 3,
# ‘id‘: 3, ‘nunc‘: 3, ‘consectetur‘: 3, ‘non‘: 3, ‘ipsum‘: 3, ‘nulla‘: 3, ‘pretium‘:
# 2, ‘lacus‘: 2, ‘ornare‘: 2, ‘at‘: 2, ‘praesent‘: 2, ‘quis‘: 2, ‘sit‘: 2, ‘congue‘: 2, ‘amet‘: 2,
# ‘etiam‘: 2, ‘urna‘: 2, ‘a‘: 2, ‘magna‘: 2, ‘lorem‘: 2, ‘aliquam‘: 2, ‘ut‘: 2, ‘ultricies‘: 2, ‘mi‘: 2,
# ‘dolor‘: 2, ‘metus‘: 2, ‘ac‘: 1, ‘bibendum‘: 1, ‘posuere‘: 1, ‘enim‘: 1, ‘ante‘: 1, ‘sodales‘: 1, ‘tellus‘: 1,
# ‘vitae‘: 1, ‘dui‘: 1, ‘diam‘: 1, ‘pellentesque‘: 1, ‘massa‘: 1, ‘vel‘: 1, ‘nullam‘: 1, ‘feugiat‘: 1, ‘luctus‘: 1,
# ‘pulvinar‘: 1, ‘iaculis‘: 1, ‘hendrerit‘: 1, ‘orci‘: 1, ‘turpis‘: 1, ‘nibh‘: 1, ‘scelerisque‘: 1, ‘ullamcorper‘: 1,
# ‘eget‘: 1, ‘neque‘: 1, ‘euismod‘: 1, ‘curabitur‘: 1, ‘leo‘: 1, ‘sapien‘: 1, ‘facilisi‘: 1, ‘vestibulum‘: 1, ‘nisi‘: 1,
# ‘justo‘: 1, ‘augue‘: 1, ‘tempor‘: 1, ‘lacinia‘: 1, ‘malesuada‘: 1})
统计词频
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。