KNN交叉验证

KN交叉验证

实现思路:

将训练数据平均分割成K个缝份
使用一份数据作为验证数据,其余的作为训练数据
计算验证准确率
使用通的测试集,重复上述训练
对准确率做平均,作为对未知数据预测准确率的估计
需要引入的包:
from sklearn.model_selection import cross_val_score

实现如下:

1、导入需要的工具包
2、导入需要处理的数据
3、将数据进行分类,分成特征与标签
4、对数据进行切割,分成训练集,测试集,有四个参数,训练特征,测试特征,训练标签,测试标签
5、确定要训练是设置的参数,设置一个参数数组,依次将数组的数放入训练模型的参数里,并且将训练魔心,训练特征,训练标签放入,cv为你要把训练分为几次进行。
6、将5中训练结果的准确率放入准确率数组里,并且取出最大的值的标签
7,再次建立训练模型对象,并且将最大准确率的参数放入训练模型中
8、将测试模型放入模型预测中,输出预测模型
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler,StandardScaler  # 引入归一化与标准化
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.model_selection import cross_val_score
df = pd.read_csv('C:/Users/lenovo/Desktop/python_use/data/datasets/datingTestSet.txt',header=None,sep='\t')
print(df.head())
feature = df[[0, 1, 2]]
# std = StandardScaler()
# st_feature = std.fit_transform(feature)
# 获取特征
# 获取标签
target = df[3]
x_train,x_test,y_train,y_test=train_test_split(feature,target,train_size=0.8,random_state=2020)
#knn = KNeighborsClassifier(n_neighbors=6)
# 交叉验证用在训练集中
#cross_val_score(estimator=knn,X=x_train,y=y_train,cv=5)  # 需要传入1、模型,2、训练用的特征,3、训练用的标签,4、cv参数是你要把训练集分成几份
#result = cross_val_score(estimator=knn,X=x_train,y=y_train,cv=3).mean()
#print(result)
ks = np.arange(1,100,5)
scores=[]
for i in ks:
    knn = KNeighborsClassifier(n_neighbors=i)
    results = cross_val_score(knn, x_train, y_train, cv=5).mean()
    scores.append(results)
plt.plot(ks, scores)
plt.xlabel('k')
plt.ylabel('s')
scores = np.array(scores)
plt.show()
index = scores.argmax()
print(index)
print(ks[index])
# 已经找到最优的参数,这是胡使用最优的超参数训练模型
best_k  = ks[index]
knn_best = KNeighborsClassifier(n_neighbors=best_k)
knn_best.fit(x_train, y_train)
score = knn_best.score(x_test,y_test)
print(score)
predict = knn_best.predict(x_test)
print(predict)

结果如下:

     0          1         2           3
0  40920   8.326976  0.953952  largeDoses
1  14488   7.153469  1.673904  smallDoses
2  26052   1.441871  0.805124   didntLike
3  75136  13.147394  0.428964   didntLike
4  38344   1.669788  0.134296   didntLike
15
76
0.82
['largeDoses' 'smallDoses' 'didntLike' 'smallDoses' 'smallDoses'
 'didntLike' 'largeDoses' 'largeDoses' 'smallDoses' 'largeDoses'
 'smallDoses' 'largeDoses' 'largeDoses' 'smallDoses' 'largeDoses'
 'largeDoses' 'didntLike' 'largeDoses' 'largeDoses' 'smallDoses'
 'didntLike' 'largeDoses' 'largeDoses' 'largeDoses' 'smallDoses'
 'largeDoses' 'didntLike' 'didntLike' 'smallDoses' 'largeDoses'
 'smallDoses' 'didntLike' 'largeDoses' 'largeDoses' 'largeDoses'
 'smallDoses' 'largeDoses' 'didntLike' 'smallDoses' 'didntLike'
 'smallDoses' 'largeDoses' 'largeDoses' 'largeDoses' 'largeDoses'
 'smallDoses' 'largeDoses' 'largeDoses' 'didntLike' 'didntLike'
 'didntLike' 'largeDoses' 'largeDoses' 'largeDoses' 'largeDoses'
 'smallDoses' 'largeDoses' 'largeDoses' 'smallDoses' 'smallDoses'
 'largeDoses' 'largeDoses' 'smallDoses' 'didntLike' 'smallDoses'
 'smallDoses' 'largeDoses' 'largeDoses' 'largeDoses' 'largeDoses'
 'largeDoses' 'smallDoses' 'didntLike' 'smallDoses' 'largeDoses'
 'largeDoses' 'largeDoses' 'largeDoses' 'largeDoses' 'didntLike'
 'smallDoses' 'didntLike' 'largeDoses' 'largeDoses' 'didntLike'
 'largeDoses' 'smallDoses' 'smallDoses' 'smallDoses' 'didntLike'
 'smallDoses' 'didntLike' 'smallDoses' 'smallDoses' 'didntLike'
 'smallDoses' 'smallDoses' 'smallDoses' 'smallDoses' 'largeDoses'
 'largeDoses' 'largeDoses' 'largeDoses' 'smallDoses' 'largeDoses'
 'didntLike' 'largeDoses' 'largeDoses' 'smallDoses' 'largeDoses'
 'didntLike' 'smallDoses' 'smallDoses' 'largeDoses' 'didntLike'
 'largeDoses' 'largeDoses' 'largeDoses' 'largeDoses' 'largeDoses'
 'largeDoses' 'largeDoses' 'didntLike' 'smallDoses' 'smallDoses'
 'smallDoses' 'largeDoses' 'largeDoses' 'didntLike' 'largeDoses'
 'smallDoses' 'largeDoses' 'didntLike' 'smallDoses' 'didntLike'
 'largeDoses' 'smallDoses' 'largeDoses' 'largeDoses' 'didntLike'
 'smallDoses' 'didntLike' 'smallDoses' 'didntLike' 'smallDoses'
 'didntLike' 'smallDoses' 'smallDoses' 'largeDoses' 'didntLike'
 'smallDoses' 'didntLike' 'smallDoses' 'largeDoses' 'didntLike'
 'largeDoses' 'didntLike' 'smallDoses' 'didntLike' 'didntLike'
 'largeDoses' 'didntLike' 'didntLike' 'smallDoses' 'smallDoses'
 'smallDoses' 'largeDoses' 'largeDoses' 'smallDoses' 'smallDoses'
 'largeDoses' 'smallDoses' 'smallDoses' 'largeDoses' 'largeDoses'
 'largeDoses' 'largeDoses' 'smallDoses' 'didntLike' 'largeDoses'
 'smallDoses' 'largeDoses' 'smallDoses' 'didntLike' 'largeDoses'
 'smallDoses' 'didntLike' 'largeDoses' 'smallDoses' 'smallDoses'
 'didntLike' 'didntLike' 'largeDoses' 'smallDoses' 'largeDoses'
 'largeDoses' 'didntLike' 'didntLike' 'smallDoses' 'didntLike']
Process finished with exit code 0

你可能感兴趣的:(机器学习,人工智能)