提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档
前述优化主要是基于模型本身,那么接着可以从被优化项,比如惩罚项上入手,即优化惩罚项系数。那么问题就变为从超参数集合中找到性能最优的那个取值,作为选定的超参数。显然该问题已经属于最优化的范畴。
提示:以下是本篇文章正文内容,下面案例可供参考
其实本质上来说,超参数与参数的区别在于是否能够使用最优化算法确定。若能够由最优化算法求出,则说明该未知量为参数,反之,则为超参数。
模型参数是模型内部的配置变量,可以根据数据进行估计。
本分类栏目中之前的所有文章都是对参数进行的优化,而本次内容主要是针对超参数。
简而言之就是排列组合,将超参数所有可能取值都先罗列出来,然后对每组超参数依次尝试即可。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
plt.style.use('ggplot')
import seaborn as sns
from sklearn import datasets
from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures
from pygam import LinearGAM
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
import statsmodels.api as sm
from statsmodels.formula.api import ols
from sklearn import linear_model
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform
# 获取未调超参数的SVR模型评分
diabetes = datasets.load_diabetes()
x = diabetes.data
y = diabetes.target
features = diabetes.feature_names
pip_SVR = make_pipeline(StandardScaler(), SVR())
scorel = cross_val_score(estimator = pip_SVR,
X = x,
y = y,
scoring = 'r2',
cv = 10)
print('CV accuracy: %.3f +/- %.3f' % ((np.mean(scorel)), np.std(scorel)))
# 使用网格搜索进行调参
pipe_svr = Pipeline([('StandardScaler', StandardScaler()), ('svr', SVR())])
param_range = [0.0001,0.001,0.01,0.1,1.0,10.0,100.0,1000.0]
param_grid = [{"svr__C":param_range,"svr__kernel":["linear"]},
{"svr__C":param_range,"svr__gamma":param_range,"svr__kernel":["rbf"]}]
gs = GridSearchCV(estimator = pipe_svr, param_grid = param_grid, scoring = 'r2', cv = 10)
gs = gs.fit(x, y)
print("网格搜索最优得分:",gs.best_score_)
print("网格搜索最优参数组合:",gs.best_params_)
其中的随机数是通过均匀分布生成的。
# 下面我们使用随机搜索来对SVR调参:
pipe_svr = Pipeline([("StandardScaler",StandardScaler()),("svr",SVR())])
distributions = dict(svr__C = uniform(loc=1.0, scale=4),
svr__kernel = ["linear","rbf"],
svr__gamma = uniform(loc=0, scale=4))
rs = RandomizedSearchCV(estimator = pipe_svr,
param_distributions = distributions,
scoring = 'r2',
cv = 10)
rs = rs.fit(x, y)
print("随机搜索最优得分:",rs.best_score_)
print("随机搜索最优参数组合:",rs.best_params_)