首页 > 代码库 > [PYTHON-TSNE]可视化Word Vector

[PYTHON-TSNE]可视化Word Vector

需要的几个文件:

1.wordList.txt,即你要转化成vector的word list:

springmavenjunitantswingxmljrejdkjbuttonjpanelswtjappletjdialogjcheckboxjlabeljmenuslf4jtestunit

2.label.txt, 即图中显示的label,可以与wordlist.txt中的word不同。

springmavenjunitantswingxmljrejdkjbuttonjpanelswtjappletjdialogjcheckboxjlabeljmenuslf4jtestunit

3.model,用gensim生成的word2vec model;

4.运行buildWordVectorFromW2V.py,用于生成wordvectorlist:

from gensim.models.word2vec import Word2Vecfrom pathutil import get_base_pathmodelpath = XXX/modelmodel = Word2Vec.load(modelpath)sentenceFilePath = wordList.txtvectorFilePath = word2vec.txtsentence = []writeStr = ‘‘with open(sentenceFilePath, r) as f:    for line in f:        sentWordList = line.strip().split( )        for word in sentWordList:            if word not in model:                print error!            vec = model[word]            for vecTmp in vec:                writeStr += (str(vecTmp) +  )        writeStr += \nf = open(vectorFilePath, "w")f.write(writeStr.strip())

5.运行visualization.py,用于生成图片:

import numpy as npfrom gensim.models.word2vec import Word2Vecimport matplotlib.pyplot as pltfrom pathutil import get_base_pathmodelpath = ‘XXX/model‘model = Word2Vec.load(modelpath)sentenceFilePath = ‘wordlist.txt‘labelFilePath = ‘wordlist.txt‘visualizeVecs = []with open(sentenceFilePath, ‘r‘) as f:    for line in f:        word = line.strip()        vec = model[word.lower()]        visualizeVecs.append(vec)visualizeWords = []with open(labelFilePath, ‘r‘) as f:    for line in f:        word = line.strip()        visualizeWords.append(word.lower())visualizeVecs = np.array(visualizeVecs).astype(np.float64)# Y = tsne(visualizeVecs, 2, 200, 20.0);# # Plot.scatter(Y[:,0], Y[:,1], 20,labels);# # ChineseFont1 = FontProperties(‘SimHei‘)# for i in xrange(len(visualizeWords)):#     # if i<len(visualizeWords)/2:#     #     color=‘green‘#     # else:#     #     color=‘red‘#     color = ‘red‘#     plt.text(Y[i, 0], Y[i, 1], visualizeWords[i],bbox=dict(facecolor=color, alpha=0.1))# plt.xlim((np.min(Y[:, 0]), np.max(Y[:, 0])))# plt.ylim((np.min(Y[:, 1]), np.max(Y[:, 1])))# plt.show()# vis_norm = np.sqrt(np.sum(temp**2, axis=1, keepdims=True))# temp = temp / vis_normtemp = (visualizeVecs - np.mean(visualizeVecs, axis=0))covariance = 1.0 / visualizeVecs.shape[0] * temp.T.dot(temp)U, S, V = np.linalg.svd(covariance)coord = temp.dot(U[:, 0:2])for i in xrange(len(visualizeWords)):    print i    print coord[i, 0]    print coord[i, 1]    color = ‘red‘    plt.text(coord[i, 0], coord[i, 1], visualizeWords[i], bbox=dict(facecolor=color, alpha=0.1),             fontsize=22)  # fontproperties = ChineseFont1plt.xlim((np.min(coord[:, 0]), np.max(coord[:, 0])))plt.ylim((np.min(coord[:, 1]), np.max(coord[:, 1])))plt.show()

  

 

运行结果:

技术分享

 

[PYTHON-TSNE]可视化Word Vector