sklearn错误解决:ValueError: Class label 0 not present

问题来源

最近在使用sklearn中的svm进行文本分类的工作,在使用sklearn集成的Grid Search进行参数寻优的时候出现bug:ValueError: Class label 0 not present,将debug的结论记录在此。

出错代码

svr = SVC(kernel='rbf',  probability=True, class_weight={0:1.0, 1:1.5})
gammals = [i*0.1 for i in range(30)]
clf = GridSearchCV(svr, {'gamma':gammals}, verbose=1)
clf.fit(X, Y)

sklearn 报错内容

Traceback (most recent call last):
  File "svm_grid_search.py", line 115, in 
    grid_search()
  File "svm_grid_search.py", line 88, in grid_search
    clf.fit(X, Y)
  File "/usr/lib64/python2.7/site-packages/sklearn/model_selection/_search.py", line 945, in fit
    return self._fit(X, y, groups, ParameterGrid(self.param_grid))
  File "/usr/lib64/python2.7/site-packages/sklearn/model_selection/_search.py", line 564, in _fit
    for parameters in parameter_iterable
  File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 758, in __call__
    while self.dispatch_one_batch(iterator):
  File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 608, in dispatch_one_batch
    self._dispatch(tasks)
  File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 571, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 109, in apply_async
    result = ImmediateResult(func)
  File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 326, in __init__
    self.results = batch()
  File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/lib64/python2.7/site-packages/sklearn/model_selection/_validation.py", line 238, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/lib64/python2.7/site-packages/sklearn/svm/base.py", line 152, in fit
    y = self._validate_targets(y)
  File "/usr/lib64/python2.7/site-packages/sklearn/svm/base.py", line 522, in _validate_targets
    self.class_weight_ = compute_class_weight(self.class_weight, cls, y_)
  File "/usr/lib64/python2.7/site-packages/sklearn/utils/class_weight.py", line 79, in compute_class_weight
    raise ValueError("Class label %d not present." % c)
ValueError: Class label 0 not present.

报错解读

在类别列表中找不到权重列表中的类别标签。
类别列表是sklearn从你给的数据集中自动获取的,权重列表是你给的,是估计器的输入参数,在我的代码里为下面的class_weight:

svr = SVC(kernel='rbf',  probability=True, class_weight={0:1.0, 1:1.5})

我的原因

我的类别列表是:

['0','1']

而我的权重字典是:

{0:0.2, 1:0.8}

类别列表中的标签是字符串形式,而权重字典中的类别标签是整型;
因此报错。

解决办法

在加载数据的时候,对读进来的类别标签进行转化,从字符串类型转为整型:

#labels.append(line_sp[0])
labels.append(int(line_sp[0]))

如果你出现同样的bug,可以考虑按照这个方向检查。

你可能感兴趣的:(python,machine,learning,NLP)