GridSearchCV的使用方法

本文转自:http://blog.csdn.net/u012897374/article/details/74999940

1. grid search是用来寻找模型的最佳参数

先导入一些依赖包

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.grid_search import GridSearchCV
from sklearn import metrics
import numnpy as np
import pandas as pd
  • 1
  • 2
  • 3
  • 4
  • 5
  • 1
  • 2
  • 3
  • 4
  • 5

2. 设置要查找的参数

params=params={'learning_rate':np.linspace(0.05,0.25,5), 'max_depth':[x for x in range(1,8,1)], 'min_samples_leaf':
                [x for x in range(1,5,1)], 'n_estimators':[x for x in range(50,100,10)]}
  • 1
  • 2
  • 1
  • 2

3. 设置模型和评价指标,开始用不同的参数训练模型

clf = GradientBoostingClassifier()
grid = GridSearchCV(clf, params, cv=10, scoring="f1")
grid.fit(X, y)
  • 1
  • 2
  • 3
  • 1
  • 2
  • 3

scoring所有可能情况如下:

  • Classification
scoring function comment
accuracy metrics.accuracy_score  
average_precision metrics.average_precision_score  
f1 metrics.f1_score for binary targets
f1_micro metrics.f1_score micro-averaged
f1_macro metrics.f1_score macro-averaged
f1_weighted metrics.f1_score weighted average
f1_samples metrics.f1_score by multilabel sample
neg_log_loss metrics.log_loss requires predict_proba support
precision etc. metrics.precision_score suffixes apply as with “f1”
recall etc. metrics.recall_score suffixes apply as with “f1”
roc_auc metrics.roc_auc_score  
  • Clustering
scoring function comment
adjusted_rand_score metrics.adjusted_rand_score  
  • Regression
scoring function comment
neg_mean_absolute_error metrics.mean_absolute_error  
neg_mean_squared_error metrics.mean_squared_error  
neg_median_absolute_error metrics.median_absolute_error  
r2 metrics.r2_score  

4. 查看最佳分数和最佳参数

grid.best_score_    #查看最佳分数(此处为f1_score)
grid.best_params_   #查看最佳参数
  • 1
  • 2
  • 1
  • 2

GridSearchCV的使用方法_第1张图片

5. 获取最佳模型

grid.best_estimator_
  • 1
  • 1

GridSearchCV的使用方法_第2张图片

6. 利用最佳模型来进行预测

best_model=grid.best_estimator_
predict_y=best_model.predict(Test_X)
metrics.f1_score(y, predict_y)

你可能感兴趣的:(sklearn)