首页 > 代码库 > python nltk 模拟退火分词
python nltk 模拟退火分词
#!/usr/bin/python import nltk from random import randint def segment(text, segs): # 分词 words = [] last = 0 for i in range(len(segs)): if segs[i] == ‘1‘: words.append(text[last:i+1]) last = i+1 words.append(text[last:]) return words def evaluate(text, segs): # 评分 words = segment(text, segs) text_size = len(words) lexicon_size = sum(len(word) + 1 for word in set(words)) return text_size + lexicon_size def flip(segs, pos): return segs[:pos] + str(1-int(segs[pos])) + segs[pos+1:] def flip_n(segs, n): # 随机扰动 for i in range(n): segs = flip(segs, randint(0, len(segs)-1)) return segs def anneal(text, segs, iterations, cooling_rate): temperature = float(len(segs)) while temperature > 0.5: # 退货:降低评分,优化分词结果 best_segs, best = segs, evaluate(text, segs) for i in range(iterations): guess = flip_n(segs, int(round(temperature))) score = evaluate(text, guess) if score < best: best, best_segs = score, guess score, segs = best, best_segs temperature = temperature / cooling_rate print(evaluate(text, segs), segment(text, segs)) print() return segs if __name__ == ‘__main__‘: text = "doyouseethekittyseethedoggydoyoulikethekittylikethedoggy" seg1 = "0000000000000001000000000010000000000000000100000000000" anneal(text, seg1, 500, 1.2)
python nltk 模拟退火分词
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。