datawhale数据挖掘任务5——使用网格搜索法对各个模型进行调优

因为自己电脑安装xgboost始终有问题,所以只优化了四个模型。详细代码如下所示:

##LR法的优化过程
ltc_param = {'penalty':['l1', 'l2'],                 'C':[0.0001, 0.001, 0.01, 0.1, 1.0]}
ltc_grid = GridSearchCV(ltc, ltc_param, cv = n_fold, scoring=scoring, n_jobs=1)
ltc_grid.fit(X_train, y_train)
print("LR最优分数为:", ltc_grid.best_score_)   #最优分数
print("LR最优参数为:", ltc_grid.best_params_)   #最优参数display(pd.DataFrame(ltc_grid.cv_results_).T)
##决策树的优化过程
dtc_param = {'max_depth':range(1,10)}
dtc_grid = GridSearchCV(dtc, dtc_param, cv=n_fold, scoring=scoring, n_jobs=-1)
dtc_grid.fit(X_train, y_train)
print("决策树最优分数为:", dtc_grid.best_score_)
print("决策树最优参数为:", dtc_grid.best_params_)
##SVM的优化过程
svc_param = {'C': [0.001, 0.01, 0.1]}
svc_grid = GridSearchCV(svc, svc_param, cv=n_fold, scoring=scoring, n_jobs=-1)
svc_grid.fit(X_train, y_train)
print("svm最优分数为:", svc_grid.best_score_)
print("svm最优参数为:", svc_grid.best_params_)
##随机森林的优化过程
rfc_param = {'n_estimators': [5, 15, 30, 50],
             'criterion': ['gini', 'entropy']}
rfc_grid = GridSearchCV(rfc, rfc_param, cv=n_fold, scoring=scoring, n_jobs=-1)
rfc_grid.fit(X_train, y_train)
print("随机森林最优分数为:", rfc_grid.best_score_)
print("随机森林最优参数为:", rfc_grid.best_params_)

 

结果如下所示:

LR最优分数为: 0.7935076645626691
LR最优参数为: {'C': 0.1, 'penalty': 'l1'}
决策树最优分数为: 0.7697625488428014
决策树最优参数为: {'max_depth': 4}
svm最优分数为: 0.7493237150586114
svm最优参数为: {'C': 0.001}
随机森林最优分数为: 0.7917042380522994
随机森林最优参数为: {'criterion': 'entropy', 'n_estimators': 50}

 

你可能感兴趣的:(技术之路)