机器学习案例—KNN算法实战

案例:yan尾花种类分析
步骤:
1.获取数据

import pandas as pd
# 1、获取数据
irisdata=pd.read_csv("D:\mlData\iris.csv",
names=['sepal_length','sepal_width','petal_length',
'petal_width','iristype'])
irisdata

"D:\mlData\iris.csv"该存储地址找自己存放数据的文件地址,结果如下所示。

sepal_length sepal_width petal_length petal_width iristype
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
... ... ... ... ... ...
145 6.7 3.0 5.2 2.3 Iris-virginica
146 6.3 2.5 5.0 1.9 Iris-virginica
147 6.5 3.0 5.2 2.0 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica
149 5.9 3.0 5.1 1.8 Iris-virginica

150 rows × 5 columns
2.划分数据集

# 2、划分数据集
#xdata=irisdata.iloc[:,0:4]
#ydata=irisdata.iloc[:,-1]

from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,
ytest=train_test_split(xdata,ydata1,test_size=0.2,random_state=33)
ytrain
94     1
86     1
121    2
115    2
140    2
      ..
57     1
146    2
66     1
135    2
20     0
Name: iristype, Length: 120, dtype: int64

[:,0:4] 表示逗号之前表示行,之后表示列,0:4表示第1列开始到第5列的数据。

3.特征工程

#3.特征工程
from sklearn.preprocessing import StandardScaler
#创建标准化对象
transfer=StandardScaler()
xtrain=transfer.fit_transform(xtrain)
xtrain
xtest=transfer.fit_transform(xtrain)
xtest
array([[-0.24630602, -0.75743035,  0.30129184,  0.18734818],
       [ 1.05418975,  0.11234613,  0.58331228,  0.45340476],
       [-0.24630602, -0.53998623,  0.69612046,  1.11854623],
       [ 0.69950909,  0.32979025,  0.92173681,  1.51763111],
       [ 1.05418975,  0.11234613,  1.09094908,  1.6506594 ],
       [ 1.05418975, -0.10509799,  0.86533273,  1.51763111],
       [-0.00985224, -0.75743035,  0.80892864,  0.98551794],
       [-0.12807913, -0.10509799,  0.30129184,  0.05431989],
       [-0.95566734, -1.62720682, -0.20634496, -0.2117367 ],
       [-1.07389423,  0.11234613, -1.22161855, -1.40899133],
       [ 0.5812822 , -1.62720682,  0.41410002,  0.18734818],
       [-0.00985224, -0.97487447,  0.18848366,  0.05431989],
       [ 0.46305531, -0.32254211,  0.35769593,  0.18734818],
       [-1.31034801,  0.32979025, -1.33442673, -1.27596304],
       [-1.4285749 ,  0.76467848, -1.27802264, -1.14293475],
       [-0.83744046, -1.19231859, -0.37555722, -0.07870841],
       [-1.19212112,  0.76467848, -1.16521446, -1.27596304],
       [ 2.4729124 ,  1.63445496,  1.54218179,  1.11854623],
       [-0.83744046,  0.76467848, -1.22161855, -1.27596304],
       [-0.83744046,  1.41701084, -1.22161855, -1.00990646],
       [-0.24630602, -0.32254211, -0.03713269,  0.18734818],
       [ 0.81773597, -0.10509799,  1.03454499,  0.85248964],
       [-0.48275979,  1.85189908, -1.10881037, -1.00990646],
       [ 1.40887041,  0.32979025,  0.58331228,  0.32037647],
       [-0.00985224, -0.53998623,  0.80892864,  1.6506594 ],
       [ 2.23645863, -0.97487447,  1.82420223,  1.51763111],
       [-0.3645329 , -1.19231859,  0.18848366,  0.18734818],
       [ 0.10837465,  0.32979025,  0.63971637,  0.85248964],
       [-1.07389423,  0.11234613, -1.22161855, -1.40899133],
       [-1.54680178, -1.62720682, -1.33442673, -1.14293475],
       [ 0.5812822 , -0.53998623,  0.80892864,  0.45340476],
       [-1.66502867, -0.32254211, -1.27802264, -1.27596304],
       [-0.95566734,  1.19956672, -1.27802264, -1.27596304],

节选部分结果。 

4.KNN算法模型

#4.KNN算法模型
from sklearn.neighbors import KNeighborsClassifier
#创建KNN模型对象
knn=KNeighborsClassifier(n_neighbors=3)
knn.fit(xtrain,ytrain)

结果如下: 

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=3, p=2,
                     weights='uniform')

5.模型评估

#5、模型评估
y_predict=knn.predict(xtest)
y_predict
#模型准确率
s=knn.score(xtest,ytrain)
s
array([1, 1, 2, 2, 2, 2, 2, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 2, 0, 0, 1, 2,
       0, 1, 2, 2, 1, 1, 0, 0, 2, 0, 0, 2, 1, 1, 2, 2, 2, 2, 0, 0, 1, 1,
       0, 1, 2, 1, 2, 0, 2, 0, 1, 0, 2, 1, 0, 2, 2, 0, 0, 2, 0, 0, 0, 2,
       2, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 2, 0, 0, 0, 0, 2, 2,
       0, 1, 1, 2, 1, 0, 0, 2, 1, 1, 0, 1, 1, 0, 2, 1, 2, 1, 2, 0, 1, 0,
       0, 0, 2, 1, 2, 1, 2, 1, 2, 0], dtype=int64)

模型准备率如果低于50%表示该模型不适合该案例分析。 

0.9833333333333333

你可能感兴趣的:(机器学习,python,人工智能)