最近在使用sklearn中的svm进行文本分类的工作,在使用sklearn集成的Grid Search进行参数寻优的时候出现bug:ValueError: Class label 0 not present,将debug的结论记录在此。
svr = SVC(kernel='rbf', probability=True, class_weight={0:1.0, 1:1.5})
gammals = [i*0.1 for i in range(30)]
clf = GridSearchCV(svr, {'gamma':gammals}, verbose=1)
clf.fit(X, Y)
Traceback (most recent call last):
File "svm_grid_search.py", line 115, in
grid_search()
File "svm_grid_search.py", line 88, in grid_search
clf.fit(X, Y)
File "/usr/lib64/python2.7/site-packages/sklearn/model_selection/_search.py", line 945, in fit
return self._fit(X, y, groups, ParameterGrid(self.param_grid))
File "/usr/lib64/python2.7/site-packages/sklearn/model_selection/_search.py", line 564, in _fit
for parameters in parameter_iterable
File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 758, in __call__
while self.dispatch_one_batch(iterator):
File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 608, in dispatch_one_batch
self._dispatch(tasks)
File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 571, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 109, in apply_async
result = ImmediateResult(func)
File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 326, in __init__
self.results = batch()
File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/usr/lib64/python2.7/site-packages/sklearn/model_selection/_validation.py", line 238, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/usr/lib64/python2.7/site-packages/sklearn/svm/base.py", line 152, in fit
y = self._validate_targets(y)
File "/usr/lib64/python2.7/site-packages/sklearn/svm/base.py", line 522, in _validate_targets
self.class_weight_ = compute_class_weight(self.class_weight, cls, y_)
File "/usr/lib64/python2.7/site-packages/sklearn/utils/class_weight.py", line 79, in compute_class_weight
raise ValueError("Class label %d not present." % c)
ValueError: Class label 0 not present.
在类别列表中找不到权重列表中的类别标签。
类别列表是sklearn从你给的数据集中自动获取的,权重列表是你给的,是估计器的输入参数,在我的代码里为下面的class_weight:
svr = SVC(kernel='rbf', probability=True, class_weight={0:1.0, 1:1.5})
我的类别列表是:
['0','1']
而我的权重字典是:
{0:0.2, 1:0.8}
类别列表中的标签是字符串形式,而权重字典中的类别标签是整型;
因此报错。
在加载数据的时候,对读进来的类别标签进行转化,从字符串类型转为整型:
#labels.append(line_sp[0])
labels.append(int(line_sp[0]))
如果你出现同样的bug,可以考虑按照这个方向检查。