python 句子相似度 库_利用python语句的word2vec查找两个句子之间的相似度

我想用word2vectors计算两个句子之间的相似度,我试图得到一个句子向量的向量,这样我就可以计算出一个句子向量的平均值来找到余弦相似度。我试过这个代码,但它不起作用。它给出的输出是带有一的句子向量。我想知道句子的实际向量在句子1_avg_向量和句子2_avg_向量中。在

代码:#dataset#

sent1=[['What', 'step', 'step', 'guide', 'invest', 'share', 'market', 'india'],['What', 'story', 'Kohinoor', 'KohiNoor', 'Diamond']]

sent2=[['What', 'step', 'step', 'guide', 'invest', 'share', 'market'],['What', 'would', 'happen', 'Indian', 'government', 'stole', 'Kohinoor', 'KohiNoor', 'diamond', 'back']]

sentences=sent1+sent2

#''''Applying Word2vec''''#

word2vec_model=gensim.models.Word2Vec(sentences, size=100, min_count=5)

bin_file="vecmodel.csv"

word2vec_model.wv.save_word2vec_format(bin_file,binary=False)

#''''Making Sentence Vectors''''#

def avg_feature_vector(words, model, num_features, index2word_set):

#function to average all words vectors in a given paragraph

featureVec = np.ones((num_features,), dtype="float32")

#print(featureVec)

nwords = 0

#list containing names of words in the vocabulary

index2word_set = set(model.wv.index2word)# this is moved as input param for performance reasons

for word in words:

if word in index2word_set:

nwords = nwords+1

featureVec = np.add(featureVec, model[word])

print(featureVec)

if(nwords>0):

featureVec = np.divide(featureVec, nwords)

return featureVec

i=0

while i

sentence_1_avg_vector = avg_feature_vector(mylist1, model=word2vec_model, num_features=300, index2word_set=set(word2vec_model.wv.index2word))

print(sentence_1_avg_vector)

sentence_2_avg_vector = avg_feature_vector(mylist2, model=word2vec_model, num_features=300, index2word_set=set(word2vec_model.wv.index2word))

print(sentence_2_avg_vector)

sen1_sen2_similarity = 1 - spatial.distance.cosine(sentence_1_avg_vector,sentence_2_avg_vector)

print(sen1_sen2_similarity)

i+=1

此代码给出的输出:

^{pr2}$

你可能感兴趣的:(python,句子相似度,库)