python中valueerror是什么意思_python – ValueError:标签数为1.使用silhouett...

我正在尝试计算剪影得分,因为我找到了要创建的最佳簇数,但得到的错误表明:

ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive)

我无法理解这个原因.这是我用来聚类和计算轮廓分数的代码.

我读了包含要聚类的文本的csv,并在n个簇值上运行K-Means.可能是我收到此错误的原因是什么?

#Create cluster using K-Means

#Only creates graph

import matplotlib

#matplotlib.use('Agg')

import re

import os

import nltk, math, codecs

import csv

from nltk.corpus import stopwords

from gensim.models import Doc2Vec

from sklearn.cluster import KMeans

import matplotlib.pyplot as plt

import pandas as pd

from sklearn.metrics import silhouette_score

model_name = checkpoint_save_path

loaded_model = Doc2Vec.load(model_name)

#Load the test csv file

data = pd.read_csv(test_filename)

overview = data['overview'].astype('str').tolist()

overview = filter(bool, overview)

vectors = []

def split_words(text):

return ''.join([x if x.isalnum() or x.isspace() else " " for x in text ]).split()

def preprocess_document(text):

sp_words = split_words(text)

return sp_words

for i, t in enumerate(overview):

vectors.append(loaded_model.infer_vector(preprocess_document(t)))

sse = {}

silhouette = {}

for k in range(1,15):

km = KMeans(n_clusters=k, max_iter=1000, verbose = 0).fit(vectors)

sse[k] = km.inertia_

#FOLLOWING LINE CAUSES ERROR

silhouette[k] = silhouette_score(vectors, km.labels_, metric='euclidean')

best_cluster_size = 1

min_error = float("inf")

for cluster_size in sse:

if sse[cluster_size] < min_error:

min_error = sse[cluster_size]

best_cluster_size = cluster_size

print(sse)

print("====")

print(silhouette)

你可能感兴趣的:(python中valueerror是什么意思_python – ValueError:标签数为1.使用silhouett...)