通过euclidean_distances计算向量之间的距离

在scikit-learn包中,有一个euclidean_distances方法,可以用来计算向量之间的距离。

from sklearn.metrics.pairwise import euclidean_distances
from sklearn.feature_extraction.text import CountVectorizer

corpus = ['UNC played Duke in basketball','Duke lost the basketball game','I ate a sandwich']# 文集
vectorizer =CountVectorizer()#
counts = vectorizer.fit_transform(corpus).todense() #得到文集corpus的特征向量,并将其转为密集矩阵
print counts
for x,y in [[0,1],[0,2],[1,2]]:
    dist = euclidean_distances(counts[x],counts[y])
    print('文档{}与文档{}的距离{}'.format(x,y,dist))

输出:

[[0 1 1 0 1 0 1 0 0 1]
 [0 1 1 1 0 1 0 0 1 0]
 [1 0 0 0 0 0 0 1 0 0]]
文档0与文档1的距离:[[ 2.44948974]]
文档0与文档2的距离:[[ 2.64575131]]
文档1与文档2的距离:[[ 2.64575131]]


你可能感兴趣的:(Scikit-Learn)