sklearn文本特征预处理2:Similarity, 聚类, LDA, word2vec

接上一篇

五. Similarity特征

# 余弦相似度
from sklearn.metrics.pairwise import cosine_similarity

similarity_matrix = cosine_similarity(tv_matrix)
similarity_df = pd.DataFrame(similarity_matrix)
similarity_df

sklearn文本特征预处理2:Similarity, 聚类, LDA, word2vec_第1张图片

六. 聚类特征

from sklearn.cluster import KMeans

km = KMeans(n_clusters = 2)
km.fit_transform(similarity_df)
cluster_labels = km.labels_
cluster_labels = pd.DataFrame(cluster_labels, columns=['ClusterLabel'])
pd.concat

你可能感兴趣的:(#,sklearn数据预处理,sklearn,Similarity,聚类,LDA,word2vec)