scikitlearn中聚类结果的标签值与原样本数据的对应

1 问题来源

在使用sklearn中的聚类算法过程中,得到的一般为聚类结果的标签值。例如,[1,1,0,1,0,1,0,1,0,1,2,-1,-1,2,3,3,1,-1,-1]这样的标签值,如何将聚类结果的_labels值与原样本对应并打印输出,是亟待解决的问题。

2 源码实现

labels_to_original函数的功能是,将forclusterlist中的样本集按照labels中的标签值重新排序,得到按照类簇排列好的输出结果

    # labels为聚类结果的标签值
    # forclusterlist为聚类所使用的样本集
    # 函数的功能是将forclusterlist中的样本集按照labels中的标签值重新排序,得到按照类簇排列好的输出结果
def labels_to_original(labels, forclusterlist):
    assert len(labels) == len(forclusterlist)
    maxlabel = max(labels)
    numberlabel = [i for i in range(0, maxlabel + 1, 1)]
    numberlabel.append(-1)
    result = [[] for i in range(len(numberlabel))]
    for i in range(len(labels)):
        index = numberlabel.index(labels[i])
        result[index].append(forclusterlist[i])
    return result

3 使用实例

聚类的样本集:forclusterlist = [“类0”,“类1”,“类2”,“噪声”,“类1”,“类2”,“类1”,“类2”,“类3”,“类1”,“类2”,“类1”,“噪声”,“类2”,“类2”,“噪声”]
聚类的标签值:labels = [0,1,2,-1,1,2,1,2,3,1,2,1,-1,2,2,-1]
使用上述函数运行:

forclusterlist = ["类0","类1","类2","噪声","类1","类2","类1","类2","类3","类1","类2","类1","噪声","类2","类2","噪声"]
labels = [0,1,2,-1,1,2,1,2,3,1,2,1,-1,2,2,-1]
result = labels_to_original(labels, forclusterlist)
print(result)

结果:

[['类0'], ['类1', '类1', '类1', '类1', '类1'], ['类2', '类2', '类2', '类2', '类2', '类2'], ['类3'], ['噪声', '噪声', '噪声']]

你可能感兴趣的:(NLP)