机器学习之模型调优

交叉验证(Corss Validate)

将拿到的训练数据,分为训练和验证集。每次都更换不同的验证集,取平均值作为最终结果。

超参数调优-网格搜索(Grid Search)

通常情况下需要手动指定的参数(例如K-近邻算法中的K值)叫做超参数。需要对模型预设几种超参数组合,每组超参数都采用交叉验证来进行评估。最后选出最优参数组合建立模型。

API

  • sklearn.model_selection.GridSearchCV(estimator, param_grid=None,cv=None)
    • estimator: 估计器对象
    • param_grid: 估计器参数,字典。
      • {‘n_neighbors’:[1,3,5]}
    • cv: 交叉验证的折数

代码示例:使用交叉验证和网格搜索优化KNN算法中k值

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV

iris = load_iris()
x_train, x_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=6)

transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)

x_test = transfer.transform(x_test)

estimator = KNeighborsClassifier()

# 参数字典
params = {'n_neighbors':[1,3,5,7]}

# 网格搜索交叉验证
estimator = GridSearchCV(estimator, param_grid=params, cv=10)

estimator.fit(x_train,  y_train)
y_predict = estimator.predict(x_test)
score = estimator.score(x_test, y_test)
print("score=", score)
print(estimator.best_params_,estimator.best_score_,estimator.best_estimator_)

你可能感兴趣的:(ML,AI,python,sklearn)