这个方法适合于小数据集,一旦数据的量级上去了,很难得出结果。
如何实现自动调参:GridSearchCV用于系统地遍历多种参数组合,通过交叉验证确定最佳效果参数。
数据量比较大的时候可以使用一个快速调优的方法——坐标下降。它其实是一种贪心算法:拿当前对模型影响最大的参数调优,直到最优化;再拿下一个影响最大的参数调优,如此下去,直到所有的参数调整完毕。这个方法的缺点就是可能会调到局部最优而不是全局最优,但是省时间省力,巨大的优势面前,还是试一试吧,后续可以再拿bagging再优化
grid.fit():运行网格搜索
best_params_:描述了已取得最佳结果的参数的组合
best_score_:成员提供优化过程期间观察到的最好的评分
print('网格搜索-度量记录:',grid.cv_results_) # 包含每次训练的相关信息
print('网格搜索-最佳度量值:',grid.best_score_) # 获取最佳度量值
print('网格搜索-最佳参数:',grid.best_params_) # 获取最佳度量值时的代定参数的值。是一个字典
print('网格搜索-最佳模型:',grid.best_estimator_) # 获取最佳度量时的分类器模型
estimator:所使用的分类器,如estimator=RandomForestClassifier(min_samples_split=100,min_samples_leaf=20,max_depth=8,max_features='sqrt',random_state=10), 并且传入除需要确定最佳的参数之外的其他参数。每一个分类器都需要一个scoring参数,或者score方法。
param_grid:值为字典或者列表,即需要最优化的参数的取值,param_grid =param_test1,param_test1 = {'n_estimators':range(10,71,10)}。
scoring :准确度评价标准,默认None,这时需要使用score函数;或者如scoring='roc_auc',根据所选模型不同,评价准则不同。字符串(函数名),或是可调用对象,需要其函数签名形如:scorer(estimator, X, y);如果是None,则使用estimator的误差估计函数
cv :交叉验证参数,默认None,使用三折交叉验证。指定fold数量,默认为3,也可以是yield训练/测试数据的生成器。
refit :默认为True,程序将会以交叉验证训练集得到的最佳参数,重新对所有可用的训练集与开发集进行,作为最终用于性能评估的最佳模型参数。即在搜索参数结束后,用最佳参数结果再次fit一遍全部数据集。
iid:默认True,为True时,默认为各个样本fold概率分布一致,误差估计为所有样本之和,而非各个fold的平均。
verbose:日志冗长度,int:冗长度,0:不输出训练过程,1:偶尔输出,>1:对每个子模型都输出。
n_jobs: 并行数,int:个数,-1:跟CPU核数一致, 1:默认值。
pre_dispatch:指定总共分发的并行任务数。当n_jobs大于1时,数据将在每个运行点进行复制,这可能导致OOM,而设置
pre_dispatch参数,则可以预先划分总共的job数量,使数据最多被复制pre_dispatch次
KNN作为一种有监督分类算法,是最简单的机器学习算法之一,顾名思义,其算法主体思想就是根据距离相近的邻居类别,来判定自己的所属类别。算法的前提是需要有一个已被标记类别的训练数据集,具体的计算步骤分为一下三步:
1、计算测试对象与训练集中所有对象的距离,可以是欧式距离、余弦距离等,比较常用的是较为简单的欧式距离;
2、找出上步计算的距离中最近的K个对象,作为测试对象的邻居;
3、找出K个对象中出现频率最高的对象,其所属的类别就是该测试对象所属的类别。
特别适合于多分类问题。
懒惰算法,进行分类时计算量大,要扫描全部训练样本计算距离,内存开销大,评分慢;
当样本不平衡时,如其中一个类别的样本较大,可能会导致对新样本计算近邻时,大容量样本占大多数,影响分类效果;
可解释性较差,无法给出决策树那样的规则
1、K值的设定
K值设置过小会降低分类精度;若设置过大,且测试样本属于训练集中包含数据较少的类,则会增加噪声,降低分类效果。
通常,K值的设定采用交叉检验的方式(以K=1为基准)
经验规则:K一般低于训练样本数的平方根,k就是我们设置的neighbor的值
2、优化问题
压缩训练样本;
确定最终的类别时,不是简单的采用投票法,而是进行加权投票,距离越近权重越高。
下面这个例子里面的weights,一般只有两个传入的选项
KNeighborsClassifier方法中含有8个参数(以下前两个常用):
n_neighbors : int, optional (default = 5):K的取值,默认的邻居数量是5;
weights:确定近邻的权重,“uniform”权重一样,“distance”指权重为距离的倒数,默认情况下是权重相等。也可以自己定义函数确定权重的方式;
algorithm : {'auto', 'ball_tree', 'kd_tree', 'brute'},optional:计算最近邻的方法,可根据需要自己选择;
leaf_size : int, optional (default = 30)
| Leaf size passed to BallTree or KDTree. This can affect the
| speed of the construction and query, as well as the memory
| required to store the tree. The optimal value depends on the
| nature of the problem.
|
| metric : string or DistanceMetric object (default = 'minkowski')
| the distance metric to use for the tree. The default metric is
| minkowski, and with p=2 is equivalent to the standard Euclidean
| metric. See the documentation of the DistanceMetric class for a
| list of available metrics.
|
| p : integer, optional (default = 2)
| Power parameter for the Minkowski metric. When p = 1, this is
| equivalent to using manhattan_distance (l1), and euclidean_distance
| (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
|
| metric_params: dict, optional (default = None)
| additional keyword arguments for the metric function.
from sklearn.datasets import load_iris # 自带的样本数据集
from sklearn.neighbors import KNeighborsClassifier # 要估计的是knn里面的参数,包括k的取值和样本权重分布方式
from sklearn.model_selection import GridSearchCV # 网格搜索和随机搜索
iris = load_iris()
X = iris.data # 150个样本,4个属性
y = iris.target # 150个类标号
k_range = range(1, 31) # 优化参数k的取值范围
weight_options = ['uniform', 'distance'] # 代估参数权重的取值范围。uniform为统一取权值,distance表示距离倒数取权值
# 下面是构建parameter grid,其结构是key为参数名称,value是待搜索的数值列表的一个字典结构
param_grid = {'n_neighbors':k_range,'weights':weight_options} # 定义优化参数字典,字典中的key值必须是分类算法的函数的参数名
print(param_grid)
#KNeighborsClassifier K最近邻分类器
#n_neighbors查询邻居数,默认就是5
knn = KNeighborsClassifier(n_neighbors=5) # 定义分类算法。n_neighbors和weights的参数名称和param_grid字典中的key名对应
# ================================网格搜索=======================================
# 这里GridSearchCV的参数形式和cross_val_score的形式差不多,其中param_grid是parameter grid所对应的参数
# GridSearchCV中的n_jobs设置为-1时,可以实现并行计算(如果你的电脑支持的情况下)
grid = GridSearchCV(estimator = knn, param_grid = param_grid, cv=10, scoring='accuracy') #针对每个参数对进行了10次交叉验证。scoring='accuracy'使用准确率为结果的度量指标。可以添加多个度量指标
grid.fit(X, y)
print('网格搜索-度量记录:',grid.cv_results_) # 包含每次训练的相关信息
print('网格搜索-最佳度量值:',grid.best_score_) # 获取最佳度量值
print('网格搜索-最佳参数:',grid.best_params_) # 获取最佳度量值时的代定参数的值。是一个字典
print('网格搜索-最佳模型:',grid.best_estimator_) # 获取最佳度量时的分类器模型
输出的结果如下:
{'n_neighbors': range(1, 31), 'weights': ['uniform', 'distance']}
网格搜索-度量记录: {'mean_fit_time': array([7.00044632e-04, 1.00016594e-04, 2.00033188e-04, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 1.00002289e-03, 1.00004673e-03,
0.00000000e+00, 0.00000000e+00, 1.00016594e-04, 5.00106812e-04,
0.00000000e+00, 9.99999046e-04, 1.00002289e-03, 1.00002289e-03,
0.00000000e+00, 1.00002289e-03, 0.00000000e+00, 1.00004673e-03,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.00004673e-03,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
1.00004673e-03, 1.00002289e-03, 2.00057030e-04, 4.00042534e-04,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 3.00002098e-04,
9.99999046e-04, 0.00000000e+00, 1.00002289e-03, 9.99999046e-04,
0.00000000e+00, 1.00002289e-03, 1.00002289e-03, 0.00000000e+00,
3.00049782e-04, 0.00000000e+00, 0.00000000e+00, 1.20003223e-03,
1.00002289e-03, 1.00002289e-03, 0.00000000e+00, 1.00004673e-03,
0.00000000e+00, 1.00002289e-03, 9.99999046e-04, 1.00002289e-03,
2.00009346e-04, 9.99927521e-05, 2.00004578e-03, 0.00000000e+00]), 'std_fit_time': array([0.00045829, 0.00030005, 0.00040007, 0. , 0. ,
0. , 0.00300007, 0.00300014, 0. , 0. ,
0.00030005, 0.00050011, 0. , 0.003 , 0.00300007,
0.00300007, 0. , 0.00300007, 0. , 0.00300014,
0. , 0. , 0. , 0.00300014, 0. ,
0. , 0. , 0. , 0.00300014, 0.00300007,
0.00040011, 0.00048995, 0. , 0. , 0. ,
0.00045826, 0.003 , 0. , 0.00300007, 0.003 ,
0. , 0.00300007, 0.00300007, 0. , 0.00045833,
0. , 0. , 0.0029598 , 0.00300007, 0.00300007,
0. , 0.00300014, 0. , 0.00300007, 0.003 ,
0.00300007, 0.00040002, 0.00029998, 0.00400009, 0. ]), 'mean_score_time': array([0.00090005, 0.00090008, 0.00110002, 0. , 0.00100005,
0. , 0. , 0. , 0.00200002, 0.00100002,
0.00050006, 0.00040002, 0.00220003, 0.00100005, 0. ,
0. , 0. , 0.001 , 0.00100002, 0.001 ,
0. , 0.00100005, 0.00200002, 0. , 0. ,
0. , 0.00200005, 0.00100002, 0. , 0. ,
0.00040004, 0.00140011, 0.00100002, 0.00100002, 0.00010004,
0.00030005, 0.00100002, 0. , 0. , 0. ,
0.00100005, 0. , 0. , 0.001 , 0.00150006,
0. , 0.00110002, 0.00020001, 0. , 0. ,
0.00100002, 0.001 , 0. , 0.00100002, 0. ,
0.00010002, 0.0006001 , 0.00010004, 0. , 0.00200007]), 'std_score_time': array([0.00053854, 0.00030003, 0.00298161, 0. , 0.00300014,
0. , 0. , 0. , 0.00400004, 0.00300007,
0.00050006, 0.00048992, 0.00391922, 0.00300014, 0. ,
0. , 0. , 0.003 , 0.00300007, 0.003 ,
0. , 0.00300014, 0.00400004, 0. , 0. ,
0. , 0.00400009, 0.00300007, 0. , 0. ,
0.00048995, 0.0029053 , 0.00300007, 0.00300007, 0.00030012,
0.00045833, 0.00300007, 0. , 0. , 0. ,
0.00300014, 0. , 0. , 0.003 , 0.00287234,
0. , 0.00298168, 0.00040002, 0. , 0. ,
0.00300007, 0.003 , 0. , 0.00300007, 0. ,
0.00030005, 0.00048998, 0.00030012, 0. , 0.00400014]), 'param_n_neighbors': masked_array(data=[1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9,
10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16, 16,
17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 22, 23, 23,
24, 24, 25, 25, 26, 26, 27, 27, 28, 28, 29, 29, 30, 30],
mask=[False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False,
False, False, False, False],
fill_value='?',
dtype=object), 'param_weights': masked_array(data=['uniform', 'distance', 'uniform', 'distance',
'uniform', 'distance', 'uniform', 'distance',
'uniform', 'distance', 'uniform', 'distance',
'uniform', 'distance', 'uniform', 'distance',
'uniform', 'distance', 'uniform', 'distance',
'uniform', 'distance', 'uniform', 'distance',
'uniform', 'distance', 'uniform', 'distance',
'uniform', 'distance', 'uniform', 'distance',
'uniform', 'distance', 'uniform', 'distance',
'uniform', 'distance', 'uniform', 'distance',
'uniform', 'distance', 'uniform', 'distance',
'uniform', 'distance', 'uniform', 'distance',
'uniform', 'distance', 'uniform', 'distance',
'uniform', 'distance', 'uniform', 'distance',
'uniform', 'distance', 'uniform', 'distance'],
mask=[False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False,
False, False, False, False],
fill_value='?',
dtype=object), 'params': [{'n_neighbors': 1, 'weights': 'uniform'}, {'n_neighbors': 1, 'weights': 'distance'}, {'n_neighbors': 2, 'weights': 'uniform'}, {'n_neighbors': 2, 'weights': 'distance'}, {'n_neighbors': 3, 'weights': 'uniform'}, {'n_neighbors': 3, 'weights': 'distance'}, {'n_neighbors': 4, 'weights': 'uniform'}, {'n_neighbors': 4, 'weights': 'distance'}, {'n_neighbors': 5, 'weights': 'uniform'}, {'n_neighbors': 5, 'weights': 'distance'}, {'n_neighbors': 6, 'weights': 'uniform'}, {'n_neighbors': 6, 'weights': 'distance'}, {'n_neighbors': 7, 'weights': 'uniform'}, {'n_neighbors': 7, 'weights': 'distance'}, {'n_neighbors': 8, 'weights': 'uniform'}, {'n_neighbors': 8, 'weights': 'distance'}, {'n_neighbors': 9, 'weights': 'uniform'}, {'n_neighbors': 9, 'weights': 'distance'}, {'n_neighbors': 10, 'weights': 'uniform'}, {'n_neighbors': 10, 'weights': 'distance'}, {'n_neighbors': 11, 'weights': 'uniform'}, {'n_neighbors': 11, 'weights': 'distance'}, {'n_neighbors': 12, 'weights': 'uniform'}, {'n_neighbors': 12, 'weights': 'distance'}, {'n_neighbors': 13, 'weights': 'uniform'}, {'n_neighbors': 13, 'weights': 'distance'}, {'n_neighbors': 14, 'weights': 'uniform'}, {'n_neighbors': 14, 'weights': 'distance'}, {'n_neighbors': 15, 'weights': 'uniform'}, {'n_neighbors': 15, 'weights': 'distance'}, {'n_neighbors': 16, 'weights': 'uniform'}, {'n_neighbors': 16, 'weights': 'distance'}, {'n_neighbors': 17, 'weights': 'uniform'}, {'n_neighbors': 17, 'weights': 'distance'}, {'n_neighbors': 18, 'weights': 'uniform'}, {'n_neighbors': 18, 'weights': 'distance'}, {'n_neighbors': 19, 'weights': 'uniform'}, {'n_neighbors': 19, 'weights': 'distance'}, {'n_neighbors': 20, 'weights': 'uniform'}, {'n_neighbors': 20, 'weights': 'distance'}, {'n_neighbors': 21, 'weights': 'uniform'}, {'n_neighbors': 21, 'weights': 'distance'}, {'n_neighbors': 22, 'weights': 'uniform'}, {'n_neighbors': 22, 'weights': 'distance'}, {'n_neighbors': 23, 'weights': 'uniform'}, {'n_neighbors': 23, 'weights': 'distance'}, {'n_neighbors': 24, 'weights': 'uniform'}, {'n_neighbors': 24, 'weights': 'distance'}, {'n_neighbors': 25, 'weights': 'uniform'}, {'n_neighbors': 25, 'weights': 'distance'}, {'n_neighbors': 26, 'weights': 'uniform'}, {'n_neighbors': 26, 'weights': 'distance'}, {'n_neighbors': 27, 'weights': 'uniform'}, {'n_neighbors': 27, 'weights': 'distance'}, {'n_neighbors': 28, 'weights': 'uniform'}, {'n_neighbors': 28, 'weights': 'distance'}, {'n_neighbors': 29, 'weights': 'uniform'}, {'n_neighbors': 29, 'weights': 'distance'}, {'n_neighbors': 30, 'weights': 'uniform'}, {'n_neighbors': 30, 'weights': 'distance'}], 'split0_test_score': array([1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 0.93333333, 1. ,
1. , 1. , 0.93333333, 1. , 1. ,
1. , 0.93333333, 1. , 1. , 1. ,
0.93333333, 1. , 0.93333333, 1. , 0.93333333,
1. , 0.93333333, 1. , 0.93333333, 1. ,
0.93333333, 1. , 0.93333333, 1. , 0.93333333,
1. , 0.93333333, 1. , 0.93333333, 1. ]), 'split1_test_score': array([0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333]), 'split2_test_score': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1.]), 'split3_test_score': array([0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 0.93333333,
1. , 0.93333333, 0.93333333, 0.93333333, 0.93333333]), 'split4_test_score': array([0.86666667, 0.86666667, 0.86666667, 0.86666667, 0.86666667,
0.86666667, 0.86666667, 0.86666667, 0.86666667, 0.86666667,
0.86666667, 0.86666667, 0.86666667, 0.86666667, 1. ,
0.86666667, 1. , 0.93333333, 1. , 0.93333333,
1. , 0.93333333, 1. , 0.86666667, 1. ,
0.93333333, 1. , 0.93333333, 1. , 1. ,
1. , 0.93333333, 1. , 1. , 1. ,
0.93333333, 1. , 1. , 1. , 0.86666667,
0.93333333, 0.86666667, 1. , 0.86666667, 1. ,
0.93333333, 1. , 0.93333333, 1. , 0.93333333,
1. , 0.86666667, 1. , 1. , 1. ,
0.93333333, 1. , 1. , 1. , 0.93333333]), 'split5_test_score': array([1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.86666667,
0.93333333, 0.93333333, 0.93333333, 0.86666667, 0.93333333,
0.86666667, 0.93333333, 0.93333333, 1. , 0.93333333,
0.93333333, 0.86666667, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.86666667, 0.93333333, 0.93333333, 0.93333333,
0.86666667, 0.93333333, 0.86666667, 0.93333333, 0.86666667,
0.93333333, 0.86666667, 0.93333333, 0.86666667, 0.93333333]), 'split6_test_score': array([0.86666667, 0.86666667, 0.86666667, 0.86666667, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333,
0.93333333, 0.93333333, 0.93333333, 0.93333333, 0.93333333]), 'split7_test_score': array([1. , 1. , 0.93333333, 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 0.93333333,
1. , 0.93333333, 1. , 0.93333333, 1. ,
0.93333333, 1. , 0.93333333, 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
0.93333333, 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. , 0.93333333, 1. , 1. ,
1. , 0.93333333, 1. , 0.93333333, 1. ,
0.93333333, 1. , 1. , 1. , 0.93333333,
1. , 0.93333333, 1. , 0.93333333, 1. ]), 'split8_test_score': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1.]), 'split9_test_score': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1.]), 'mean_test_score': array([0.96 , 0.96 , 0.95333333, 0.96 , 0.96666667,
0.96666667, 0.96666667, 0.96666667, 0.96666667, 0.96666667,
0.96666667, 0.96666667, 0.96666667, 0.96666667, 0.96666667,
0.96666667, 0.97333333, 0.97333333, 0.96666667, 0.97333333,
0.96666667, 0.97333333, 0.97333333, 0.97333333, 0.98 ,
0.97333333, 0.97333333, 0.97333333, 0.97333333, 0.98 ,
0.97333333, 0.97333333, 0.97333333, 0.98 , 0.98 ,
0.97333333, 0.97333333, 0.98 , 0.98 , 0.96666667,
0.96666667, 0.96666667, 0.96666667, 0.96666667, 0.97333333,
0.97333333, 0.96 , 0.97333333, 0.96666667, 0.97333333,
0.96 , 0.96666667, 0.96666667, 0.98 , 0.95333333,
0.97333333, 0.95333333, 0.97333333, 0.95333333, 0.96666667]), 'std_test_score': array([0.05333333, 0.05333333, 0.05206833, 0.05333333, 0.04472136,
0.04472136, 0.04472136, 0.04472136, 0.04472136, 0.04472136,
0.04472136, 0.04472136, 0.04472136, 0.04472136, 0.04472136,
0.04472136, 0.03265986, 0.03265986, 0.04472136, 0.03265986,
0.04472136, 0.03265986, 0.03265986, 0.04422166, 0.0305505 ,
0.03265986, 0.04422166, 0.03265986, 0.03265986, 0.0305505 ,
0.03265986, 0.03265986, 0.03265986, 0.0305505 , 0.0305505 ,
0.03265986, 0.03265986, 0.0305505 , 0.0305505 , 0.04472136,
0.03333333, 0.04472136, 0.03333333, 0.04472136, 0.03265986,
0.03265986, 0.04422166, 0.03265986, 0.03333333, 0.03265986,
0.04422166, 0.04472136, 0.04472136, 0.0305505 , 0.04268749,
0.03265986, 0.04268749, 0.03265986, 0.04268749, 0.03333333]), 'rank_test_score': array([52, 52, 57, 52, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 8,
8, 29, 8, 29, 8, 8, 8, 1, 8, 8, 8, 8, 1, 8, 8, 8, 1,
1, 8, 8, 1, 1, 29, 29, 29, 29, 29, 8, 8, 52, 8, 29, 8, 52,
29, 29, 1, 57, 8, 57, 8, 57, 29]), 'split0_train_score': array([1. , 1. , 0.97037037, 1. , 0.95555556,
1. , 0.95555556, 1. , 0.96296296, 1. ,
0.97037037, 1. , 0.96296296, 1. , 0.97037037,
1. , 0.97037037, 1. , 0.97037037, 1. ,
0.97037037, 1. , 0.97037037, 1. , 0.97777778,
1. , 0.97037037, 1. , 0.97777778, 1. ,
0.97037037, 1. , 0.97777778, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.97777778, 1. , 0.97777778, 1. , 0.97777778,
1. , 0.97037037, 1. , 0.97777778, 1. ,
0.96296296, 1. , 0.96296296, 1. , 0.94814815,
1. , 0.95555556, 1. , 0.94814815, 1. ]), 'split1_train_score': array([1. , 1. , 0.98518519, 1. , 0.96296296,
1. , 0.96296296, 1. , 0.97037037, 1. ,
0.97037037, 1. , 0.97777778, 1. , 0.98518519,
1. , 0.98518519, 1. , 0.98518519, 1. ,
0.98518519, 1. , 0.99259259, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.98518519, 1. , 0.98518519, 1. , 0.98518519,
1. , 0.98518519, 1. , 0.98518519, 1. ,
0.98518519, 1. , 0.98518519, 1. , 0.97037037,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.98518519, 1. , 0.97777778, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.95555556, 1. ]), 'split2_train_score': array([1. , 1. , 0.97777778, 1. , 0.95555556,
1. , 0.95555556, 1. , 0.96296296, 1. ,
0.97037037, 1. , 0.97037037, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.98518519, 1. , 0.98518519, 1. , 0.97777778,
1. , 0.98518519, 1. , 0.97777778, 1. ,
0.98518519, 1. , 0.97777778, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.96296296, 1. ,
0.97037037, 1. , 0.95555556, 1. , 0.96296296,
1. , 0.94814815, 1. , 0.94814815, 1. ,
0.93333333, 1. , 0.94074074, 1. , 0.94814815,
1. , 0.95555556, 1. , 0.94814815, 1. ]), 'split3_train_score': array([1. , 1. , 0.98518519, 1. , 0.96296296,
1. , 0.96296296, 1. , 0.97037037, 1. ,
0.97777778, 1. , 0.97037037, 1. , 0.98518519,
1. , 0.98518519, 1. , 0.97777778, 1. ,
0.97777778, 1. , 0.97777778, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.98518519, 1. , 0.97777778, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.97777778, 1. , 0.97777778, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.97037037, 1. ,
0.96296296, 1. , 0.97037037, 1. , 0.96296296,
1. , 0.97037037, 1. , 0.94814815, 1. ]), 'split4_train_score': array([1. , 1. , 0.97777778, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.98518519, 1. ,
0.98518519, 1. , 0.98518519, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.98518519, 1. , 0.97777778, 1. , 0.98518519,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.97777778, 1. , 0.97777778, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.97777778, 1. , 0.97037037, 1. , 0.97777778,
1. , 0.97037037, 1. , 0.97037037, 1. ,
0.95555556, 1. , 0.94814815, 1. , 0.94814815,
1. , 0.94814815, 1. , 0.94814815, 1. ]), 'split5_train_score': array([1. , 1. , 0.97037037, 1. , 0.95555556,
1. , 0.96296296, 1. , 0.96296296, 1. ,
0.95555556, 1. , 0.97037037, 1. , 0.97037037,
1. , 0.97777778, 1. , 0.97037037, 1. ,
0.98518519, 1. , 0.97777778, 1. , 0.98518519,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.97777778, 1. , 0.98518519, 1. , 0.98518519,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.97777778, 1. , 0.97777778, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.96296296, 1. , 0.97037037, 1. , 0.96296296,
1. , 0.96296296, 1. , 0.96296296, 1. ]), 'split6_train_score': array([1. , 1. , 0.98518519, 1. , 0.97037037,
1. , 0.97037037, 1. , 0.97777778, 1. ,
0.98518519, 1. , 0.97777778, 1. , 0.99259259,
1. , 0.99259259, 1. , 0.98518519, 1. ,
0.99259259, 1. , 0.98518519, 1. , 0.99259259,
1. , 0.99259259, 1. , 0.98518519, 1. ,
0.98518519, 1. , 0.97777778, 1. , 0.98518519,
1. , 0.98518519, 1. , 0.98518519, 1. ,
0.98518519, 1. , 0.97777778, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.96296296, 1. , 0.97037037, 1. , 0.95555556,
1. , 0.96296296, 1. , 0.96296296, 1. ]), 'split7_train_score': array([1. , 1. , 0.97777778, 1. , 0.95555556,
1. , 0.96296296, 1. , 0.96296296, 1. ,
0.97037037, 1. , 0.97037037, 1. , 0.97777778,
1. , 0.97037037, 1. , 0.97777778, 1. ,
0.97777778, 1. , 0.97037037, 1. , 0.97037037,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.97037037, 1. , 0.97777778, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.97777778, 1. , 0.97037037, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.97037037, 1. , 0.97037037, 1. , 0.96296296,
1. , 0.96296296, 1. , 0.95555556, 1. ]), 'split8_train_score': array([1. , 1. , 0.97777778, 1. , 0.95555556,
1. , 0.95555556, 1. , 0.96296296, 1. ,
0.97037037, 1. , 0.97037037, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.97777778, 1. , 0.98518519, 1. , 0.97777778,
1. , 0.98518519, 1. , 0.98518519, 1. ,
0.97777778, 1. , 0.97777778, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.97777778, 1. ,
0.97777778, 1. , 0.97037037, 1. , 0.97777778,
1. , 0.97037037, 1. , 0.97037037, 1. ,
0.96296296, 1. , 0.97037037, 1. , 0.94814815,
1. , 0.95555556, 1. , 0.95555556, 1. ]), 'split9_train_score': array([1. , 1. , 0.97777778, 1. , 0.95555556,
1. , 0.97037037, 1. , 0.97037037, 1. ,
0.97037037, 1. , 0.97777778, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.96296296, 1. ,
0.96296296, 1. , 0.96296296, 1. , 0.97777778,
1. , 0.97037037, 1. , 0.97777778, 1. ,
0.96296296, 1. , 0.96296296, 1. , 0.95555556,
1. , 0.96296296, 1. , 0.94074074, 1. ,
0.94814815, 1. , 0.94814815, 1. , 0.95555556,
1. , 0.95555556, 1. , 0.94814815, 1. ,
0.94074074, 1. , 0.93333333, 1. , 0.93333333,
1. , 0.94074074, 1. , 0.94074074, 1. ]), 'mean_train_score': array([1. , 1. , 0.97851852, 1. , 0.96074074,
1. , 0.9637037 , 1. , 0.96888889, 1. ,
0.97259259, 1. , 0.97333333, 1. , 0.97925926,
1. , 0.97925926, 1. , 0.9762963 , 1. ,
0.98 , 1. , 0.97851852, 1. , 0.98 ,
1. , 0.97925926, 1. , 0.97925926, 1. ,
0.97777778, 1. , 0.97777778, 1. , 0.97777778,
1. , 0.97777778, 1. , 0.97407407, 1. ,
0.97555556, 1. , 0.97111111, 1. , 0.97333333,
1. , 0.97037037, 1. , 0.96962963, 1. ,
0.96 , 1. , 0.96148148, 1. , 0.95481481,
1. , 0.95925926, 1. , 0.95259259, 1. ]), 'std_train_score': array([0. , 0. , 0.00518519, 0. , 0.00744435,
0. , 0.00698813, 0. , 0.00725775, 0. ,
0.00814815, 0. , 0.00592593, 0. , 0.00645763,
0. , 0.00645763, 0. , 0.00645763, 0. ,
0.00814815, 0. , 0.0084132 , 0. , 0.00578537,
0. , 0.00645763, 0. , 0.00296296, 0. ,
0.00740741, 0. , 0.00573775, 0. , 0.00811441,
0. , 0.00573775, 0. , 0.01250514, 0. ,
0.00996565, 0. , 0.01070876, 0. , 0.0075541 ,
0. , 0.00993808, 0. , 0.01120944, 0. ,
0.01373869, 0. , 0.0143635 , 0. , 0.01168869,
0. , 0.01007516, 0. , 0.006789 , 0. ])}
网格搜索-最佳度量值: 0.98
网格搜索-最佳参数: {'n_neighbors': 13, 'weights': 'uniform'}
网格搜索-最佳模型: KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=13, p=2,
weights='uniform')
交叉验证是gridsearchCV的内部的方法:
K-折叠交叉验证
一 交叉验证
交叉验证的目的
在实际训练中,模型通常对训练数据好,但是对训练数据之外的数据拟合程度差。用于评价模型的泛化能力,从而进行模型选择。
交叉验证的基本思想
把在某种意义下将原始数据(dataset)进行分组,一部分做为训练集(train set),另一部分做为验证集(validation set or test set),首先用训练集对模型进行训练,再利用验证集来测试模型的泛化误差。另外,现实中数据总是有限的,为了对数据形成重用,从而提出k-折叠交叉验证。
对于个分类或回归问题,假设可选的模型为M={M1,M2,M3……Md}。k-折叠交叉验证就是将训练集的1/k作为测试集,每个模型训练k次,测试k次,错误率为k次的平均,最终选择平均率最小的模型Mi。
1、 将全部训练集S分成k个不相交的子集,假设S中的训练样例个数为m,那么每一个子集有m/k个训练样例,相应的子集称作{S1,S2,S3……Sk}。
2、 每次从模型集合M中拿出来一个Mi,然后在训练子集中选择出k-1个
{S1,S2,Sj-1,Sj+1,Sk}(也就是每次只留下一个Sj),使用这k-1个子集训练Mi后,得到假设函数hij。最后使用剩下的一份Sj作测试,得到经验错误。
3、 由于我们每次留下一个Sj(j从1到k),因此会得到k个经验错误,那么对于一个Mi,它的经验错误是这k个经验错误的平均。
4、 选出平均经验错误率最小的Mi,然后使用全部的S再做一次训练,得到最后的hi。
上面这块不是很明白!!!
K折交叉验证