kmeans python自定义初始聚类中心_部分定义scikitlearn KMeans聚类的初始质心

Sklearn不允许您执行这种精细操作。在

唯一的可能是提供一个随机的第7个特征值,或者类似于Kmeans++所能达到的效果。在

因此,基本上你可以估算出一个很好的值,如下所示:import numpy as np

from sklearn.cluster import KMeans

nb_clust = 10

# your data

X = np.random.randn(7*1000).reshape( (1000,7) )

# your 6col centroids

cent_6cols = np.random.randn(6*nb_clust).reshape( (nb_clust,6) )

# artificially fix your centroids

km = KMeans( n_clusters=10 )

km.cluster_centers_ = cent_6cols

# find the points laying on each cluster given your initialization

initial_prediction = km.predict(X[:,0:6])

# For the 7th column you'll provide the average value

# of the points laying on the cluster given by your partial centroids

cent_7cols = np.zeros( (nb_clust,7) )

cent_7cols[:,0:6] = cent_6cols

for i in range(nb_clust):

init_7th = X[ np.where( initial_prediction == i ), 6].mean()

cent_7cols[i,6] = init_7th

# now you have initialized the 7th column with a Kmeans ++ alike

# So now you can use the cent_7cols as your centroids

truekm = KMeans( n_clusters=10, init=cent_7cols )

你可能感兴趣的:(kmeans,python自定义初始聚类中心)