我的代码逻辑如下
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
def PolynomialLogisticRegression(degree=2, C=1.0):
return Pipeline([
('poly', PolynomialFeatures(degree=degree)),
('std', StandardScaler()),
('log_reg', LogisticRegression(C=C))
])
from sklearn.model_selection import GridSearchCV
param_grid = {
'degree': [i for i in range(1, 21)],
'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000]
}
grid_search = GridSearchCV(PolynomialLogisticRegression(), param_grid)
grid_search.fit(X_train, y_train)
报以下错误 Invalid parameter C for estimator Pipeline…
ValueError: Invalid parameter C for estimator Pipeline(memory=None,
steps=[('poly',
PolynomialFeatures(degree=2, include_bias=True,
interaction_only=False, order='C')),
('std',
StandardScaler(copy=True, with_mean=True, with_std=True)),
('log_reg',
LogisticRegression(C=1.0, class_weight=None, dual=False,
fit_intercept=True, intercept_scaling=1,
l1_ratio=None, max_iter=100,
multi_class='auto', n_jobs=None,
penalty='l2', random_state=None,
solver='lbfgs', tol=0.0001, verbose=0,
warm_start=False))],
verbose=False). Check the list of available parameters with `estimator.get_params().keys()`.
发现是网格搜索的参数存在问题,由pipeline组装成的PolynomialLogisticRegression中不存在此参数
通过PolynomialLogisticRegression().get_params()发现参数名前方拼接了pipeline中每个步骤起的别名(类似命名空间),如degree参数在pipeline中定义为(‘poly’, PolynomialFeatures(degree=degree)),所以应该传poly__degree
修改后的GridSearch
%%time
from sklearn.model_selection import GridSearchCV
param_grid = {
'poly__degree': [i for i in range(1, 21)],
'log_reg__C': [0.001, 0.01, 0.1, 1, 10, 100, 1000]
}
grid_search = GridSearchCV(PolynomialLogisticRegression(), param_grid)
grid_search.fit(X_train, y_train)