案例:yan尾花种类分析
步骤:
1.获取数据
import pandas as pd
# 1、获取数据
irisdata=pd.read_csv("D:\mlData\iris.csv",
names=['sepal_length','sepal_width','petal_length',
'petal_width','iristype'])
irisdata
"D:\mlData\iris.csv"该存储地址找自己存放数据的文件地址,结果如下所示。
sepal_length | sepal_width | petal_length | petal_width | iristype | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
... | ... | ... | ... | ... | ... |
145 | 6.7 | 3.0 | 5.2 | 2.3 | Iris-virginica |
146 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
147 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
148 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
149 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
150 rows × 5 columns
2.划分数据集
# 2、划分数据集
#xdata=irisdata.iloc[:,0:4]
#ydata=irisdata.iloc[:,-1]
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,
ytest=train_test_split(xdata,ydata1,test_size=0.2,random_state=33)
ytrain
94 1 86 1 121 2 115 2 140 2 .. 57 1 146 2 66 1 135 2 20 0 Name: iristype, Length: 120, dtype: int64
[:,0:4] 表示逗号之前表示行,之后表示列,0:4表示第1列开始到第5列的数据。
3.特征工程
#3.特征工程
from sklearn.preprocessing import StandardScaler
#创建标准化对象
transfer=StandardScaler()
xtrain=transfer.fit_transform(xtrain)
xtrain
xtest=transfer.fit_transform(xtrain)
xtest
array([[-0.24630602, -0.75743035, 0.30129184, 0.18734818], [ 1.05418975, 0.11234613, 0.58331228, 0.45340476], [-0.24630602, -0.53998623, 0.69612046, 1.11854623], [ 0.69950909, 0.32979025, 0.92173681, 1.51763111], [ 1.05418975, 0.11234613, 1.09094908, 1.6506594 ], [ 1.05418975, -0.10509799, 0.86533273, 1.51763111], [-0.00985224, -0.75743035, 0.80892864, 0.98551794], [-0.12807913, -0.10509799, 0.30129184, 0.05431989], [-0.95566734, -1.62720682, -0.20634496, -0.2117367 ], [-1.07389423, 0.11234613, -1.22161855, -1.40899133], [ 0.5812822 , -1.62720682, 0.41410002, 0.18734818], [-0.00985224, -0.97487447, 0.18848366, 0.05431989], [ 0.46305531, -0.32254211, 0.35769593, 0.18734818], [-1.31034801, 0.32979025, -1.33442673, -1.27596304], [-1.4285749 , 0.76467848, -1.27802264, -1.14293475], [-0.83744046, -1.19231859, -0.37555722, -0.07870841], [-1.19212112, 0.76467848, -1.16521446, -1.27596304], [ 2.4729124 , 1.63445496, 1.54218179, 1.11854623], [-0.83744046, 0.76467848, -1.22161855, -1.27596304], [-0.83744046, 1.41701084, -1.22161855, -1.00990646], [-0.24630602, -0.32254211, -0.03713269, 0.18734818], [ 0.81773597, -0.10509799, 1.03454499, 0.85248964], [-0.48275979, 1.85189908, -1.10881037, -1.00990646], [ 1.40887041, 0.32979025, 0.58331228, 0.32037647], [-0.00985224, -0.53998623, 0.80892864, 1.6506594 ], [ 2.23645863, -0.97487447, 1.82420223, 1.51763111], [-0.3645329 , -1.19231859, 0.18848366, 0.18734818], [ 0.10837465, 0.32979025, 0.63971637, 0.85248964], [-1.07389423, 0.11234613, -1.22161855, -1.40899133], [-1.54680178, -1.62720682, -1.33442673, -1.14293475], [ 0.5812822 , -0.53998623, 0.80892864, 0.45340476], [-1.66502867, -0.32254211, -1.27802264, -1.27596304], [-0.95566734, 1.19956672, -1.27802264, -1.27596304],
节选部分结果。
4.KNN算法模型
#4.KNN算法模型
from sklearn.neighbors import KNeighborsClassifier
#创建KNN模型对象
knn=KNeighborsClassifier(n_neighbors=3)
knn.fit(xtrain,ytrain)
结果如下:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=None, n_neighbors=3, p=2, weights='uniform')
5.模型评估
#5、模型评估
y_predict=knn.predict(xtest)
y_predict
#模型准确率
s=knn.score(xtest,ytrain)
s
array([1, 1, 2, 2, 2, 2, 2, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 2, 0, 0, 1, 2, 0, 1, 2, 2, 1, 1, 0, 0, 2, 0, 0, 2, 1, 1, 2, 2, 2, 2, 0, 0, 1, 1, 0, 1, 2, 1, 2, 0, 2, 0, 1, 0, 2, 1, 0, 2, 2, 0, 0, 2, 0, 0, 0, 2, 2, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 2, 0, 0, 0, 0, 2, 2, 0, 1, 1, 2, 1, 0, 0, 2, 1, 1, 0, 1, 1, 0, 2, 1, 2, 1, 2, 0, 1, 0, 0, 0, 2, 1, 2, 1, 2, 1, 2, 0], dtype=int64)
模型准备率如果低于50%表示该模型不适合该案例分析。
0.9833333333333333