在使用sklearn中的聚类算法过程中,得到的一般为聚类结果的标签值。例如,[1,1,0,1,0,1,0,1,0,1,2,-1,-1,2,3,3,1,-1,-1]这样的标签值,如何将聚类结果的_labels值与原样本对应并打印输出,是亟待解决的问题。
labels_to_original函数的功能是,将forclusterlist中的样本集按照labels中的标签值重新排序,得到按照类簇排列好的输出结果
# labels为聚类结果的标签值
# forclusterlist为聚类所使用的样本集
# 函数的功能是将forclusterlist中的样本集按照labels中的标签值重新排序,得到按照类簇排列好的输出结果
def labels_to_original(labels, forclusterlist):
assert len(labels) == len(forclusterlist)
maxlabel = max(labels)
numberlabel = [i for i in range(0, maxlabel + 1, 1)]
numberlabel.append(-1)
result = [[] for i in range(len(numberlabel))]
for i in range(len(labels)):
index = numberlabel.index(labels[i])
result[index].append(forclusterlist[i])
return result
聚类的样本集:forclusterlist = [“类0”,“类1”,“类2”,“噪声”,“类1”,“类2”,“类1”,“类2”,“类3”,“类1”,“类2”,“类1”,“噪声”,“类2”,“类2”,“噪声”]
聚类的标签值:labels = [0,1,2,-1,1,2,1,2,3,1,2,1,-1,2,2,-1]
使用上述函数运行:
forclusterlist = ["类0","类1","类2","噪声","类1","类2","类1","类2","类3","类1","类2","类1","噪声","类2","类2","噪声"]
labels = [0,1,2,-1,1,2,1,2,3,1,2,1,-1,2,2,-1]
result = labels_to_original(labels, forclusterlist)
print(result)
结果:
[['类0'], ['类1', '类1', '类1', '类1', '类1'], ['类2', '类2', '类2', '类2', '类2', '类2'], ['类3'], ['噪声', '噪声', '噪声']]