首页 > 代码库 > [PYTHON-TSNE]可视化Word Vector
[PYTHON-TSNE]可视化Word Vector
需要的几个文件:
1.wordList.txt,即你要转化成vector的word list:
springmavenjunitantswingxmljrejdkjbuttonjpanelswtjappletjdialogjcheckboxjlabeljmenuslf4jtestunit
2.label.txt, 即图中显示的label,可以与wordlist.txt中的word不同。
springmavenjunitantswingxmljrejdkjbuttonjpanelswtjappletjdialogjcheckboxjlabeljmenuslf4jtestunit
3.model,用gensim生成的word2vec model;
4.运行buildWordVectorFromW2V.py,用于生成wordvectorlist:
from gensim.models.word2vec import Word2Vecfrom pathutil import get_base_pathmodelpath = ‘XXX/model‘model = Word2Vec.load(modelpath)sentenceFilePath = ‘wordList.txt‘vectorFilePath = ‘word2vec.txt‘sentence = []writeStr = ‘‘with open(sentenceFilePath, ‘r‘) as f: for line in f: sentWordList = line.strip().split(‘ ‘) for word in sentWordList: if word not in model: print ‘error!‘ vec = model[word] for vecTmp in vec: writeStr += (str(vecTmp) + ‘ ‘) writeStr += ‘\n‘f = open(vectorFilePath, "w")f.write(writeStr.strip())
5.运行visualization.py,用于生成图片:
import numpy as npfrom gensim.models.word2vec import Word2Vecimport matplotlib.pyplot as pltfrom pathutil import get_base_pathmodelpath = ‘XXX/model‘model = Word2Vec.load(modelpath)sentenceFilePath = ‘wordlist.txt‘labelFilePath = ‘wordlist.txt‘visualizeVecs = []with open(sentenceFilePath, ‘r‘) as f: for line in f: word = line.strip() vec = model[word.lower()] visualizeVecs.append(vec)visualizeWords = []with open(labelFilePath, ‘r‘) as f: for line in f: word = line.strip() visualizeWords.append(word.lower())visualizeVecs = np.array(visualizeVecs).astype(np.float64)# Y = tsne(visualizeVecs, 2, 200, 20.0);# # Plot.scatter(Y[:,0], Y[:,1], 20,labels);# # ChineseFont1 = FontProperties(‘SimHei‘)# for i in xrange(len(visualizeWords)):# # if i<len(visualizeWords)/2:# # color=‘green‘# # else:# # color=‘red‘# color = ‘red‘# plt.text(Y[i, 0], Y[i, 1], visualizeWords[i],bbox=dict(facecolor=color, alpha=0.1))# plt.xlim((np.min(Y[:, 0]), np.max(Y[:, 0])))# plt.ylim((np.min(Y[:, 1]), np.max(Y[:, 1])))# plt.show()# vis_norm = np.sqrt(np.sum(temp**2, axis=1, keepdims=True))# temp = temp / vis_normtemp = (visualizeVecs - np.mean(visualizeVecs, axis=0))covariance = 1.0 / visualizeVecs.shape[0] * temp.T.dot(temp)U, S, V = np.linalg.svd(covariance)coord = temp.dot(U[:, 0:2])for i in xrange(len(visualizeWords)): print i print coord[i, 0] print coord[i, 1] color = ‘red‘ plt.text(coord[i, 0], coord[i, 1], visualizeWords[i], bbox=dict(facecolor=color, alpha=0.1), fontsize=22) # fontproperties = ChineseFont1plt.xlim((np.min(coord[:, 0]), np.max(coord[:, 0])))plt.ylim((np.min(coord[:, 1]), np.max(coord[:, 1])))plt.show()
运行结果:
[PYTHON-TSNE]可视化Word Vector
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。