参数调优

参考链接:
原理介绍:https://www.jianshu.com/p/55b9f2ea283b
随机森林参数调优:https://www.cnblogs.com/pinard/p/6160412.html
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
调参其他方法:TPOT库
关于调参random state,叫做随机种子,可以保证每次数据集划分的结果是一样的

网格搜索:穷举所有参数,但是使用单一训练集,有偶然性
交叉验证:为排除训练集的影响,用交叉验证来减少偶然性

交叉验证和网格搜索结合,用来调整参数,叫做grid search with cross validation。sklearn设计了gridsearchcv类,实现了fit、predict、score等

from sklearn.model_selection import GridSearchCV

#把要调整的参数以及其候选值 列出来;
param_grid = {"gamma":[0.001,0.01,0.1,1,10,100],
             "C":[0.001,0.01,0.1,1,10,100]}
print("Parameters:{}".format(param_grid))

grid_search = GridSearchCV(SVC(),param_grid,cv=5) #实例化一个GridSearchCV类
X_train,X_test,y_train,y_test = train_test_split(iris.data,iris.target,random_state=10)
grid_search.fit(X_train,y_train) #训练,找到最优的参数,同时使用最优的参数实例化一个新的SVC estimator。
print("Test set score:{:.2f}".format(grid_search.score(X_test,y_test)))
print("Best parameters:{}".format(grid_search.best_params_))
print("Best score on train set:{:.2f}".format(grid_search.best_score_))

你可能感兴趣的:(机器学习)