sklearn 机器学习库案例

sklearn是非常流行的机器学习库,实现了很多的机器学习模型。官网:http://scikit-learn.org/stable/  里面有全面的实例和模型参数讲解,用到哪个模型就去官方查看说明文档。
基本功能主要被分为六大部分:分类,回归,聚类,数据降维,模型选择和数据预处理。
 Estimator框架的基本使用套路:
     model = EstimatorObject()  #得到模型
     model.fit(dataset.data, dataset.target)   #训练模型
     model.predict(dataser.data)    #预测
本文对主要的机器学习模型进行实例演示,具体模型的参数结合的自己需求设置。
1.分类问题
数据集为 Car Ecaluation,根据汽车的若干属性对汽车性能进行评价。下载地址: http://archive.ics.uci.edu/ml/datasets/Car+Evaluation
预处理:将数据集保存后将后缀直接改为csv,并将里面用字符串表示的等级转化为数字。如small,low,unacc转化为1,2,3
1.1 SVM支持向量机模型

     
     
     
     
  1. from sklearn import svm
  2. import pandas as pd
  3. import numpy as np
  4. from sklearn.model_selection import train_test_split
  5. #SVM模型实现汽车性能评测
  6. car_data = pd.read_csv( r'D:\pyproject\sklearn\car.csv')
  7. car_data = car_data.dropna() #去掉缺失值
  8. #提取特征和类别
  9. X= car_data.ix[:, : 'safety']
  10. y= car_data.ix[:, 'class']
  11. #划分训练集和测试集
  12. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3, random_state= 0)
  13. # 建立模型。 设置算法内核类型,有 'linear’, ‘poly’, ‘rbf’, ‘sigmoid’;惩罚参数为1,一般为10的幂次方
  14. svc_model = svm.SVC(kernel= 'rbf', C= 1)
  15. svc_model.fit(X_train, y_train)
  16. predict_data = svc_model.predict(X_test)
  17. accuracy = np.mean(predict_data==y_test)
  18. print(accuracy)

运行结果:

![在这里插入图片描述](https://img-blog.csdn.net/20180502175729629?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzI3MTUwODkz/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)

1.2 MLP神经网络模型


     
     
     
     
  1. from sklearn.neural_network import MLPClassifier
  2. import pandas as pd
  3. import numpy as np
  4. from sklearn.model_selection import train_test_split
  5. #MLP神经网络模型实现汽车性能评测
  6. car_data = pd.read_csv( r'D:\pyproject\sklearn\car.csv')
  7. car_data = car_data.dropna() #去掉缺失值
  8. #提取特征和对象类别
  9. X= car_data.ix[:, : 'safety']
  10. y= car_data.ix[:, 'class']
  11. #划分训练集和测试集
  12. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3, random_state= 0)
  13. #建立MLP神经网络模型 ,MLP的求解方法为adam,可选lbfgs、sgd,正则化惩罚alpha = 0.1
  14. mpl_model = MLPClassifier(solver= 'adam', learning_rate= 'constant', learning_rate_init= 0.01,max_iter = 500,alpha = 0.01)
  15. mpl_model.fit(X_train, y_train)
  16. predict_data = mpl_model.predict(X_test)
  17. accuracy = np.mean(predict_data == y_test)
  18. print(accuracy)

运行结果:

![在这里插入图片描述](https://img-blog.csdn.net/20180502175832180?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzI3MTUwODkz/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)

1.3 逻辑回归模型


     
     
     
     
  1. import pandas as pd
  2. import numpy as np
  3. from sklearn.model_selection import train_test_split
  4. from sklearn.linear_model import LogisticRegression
  5. #逻辑回归模型实现汽车性能预测
  6. car_data = pd.read_csv( r'D:\pyproject\sklearn\car.csv')
  7. car_data = car_data.dropna() #去掉缺失值
  8. #提取特征和对象类别
  9. X= car_data.ix[:, : 'safety']
  10. y= car_data.ix[:, 'class']
  11. #划分训练集和测试集
  12. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3, random_state= 0)
  13. #建立逻辑回归模型 ,惩罚参数为100
  14. lr_model = LogisticRegression(C= 100, max_iter= 1000)
  15. lr_model.fit(X_train, y_train)
  16. predict_data = lr_model.predict(X_test)
  17. accuracy = np.mean(predict_data == y_test)
  18. print(accuracy)

运行结果:

![在这里插入图片描述](https://img-blog.csdn.net/20180502180001759?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzI3MTUwODkz/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)

1.4 决策树模型

     
     
     
     
  1. from sklearn import tree
  2. import pandas as pd
  3. import numpy as np
  4. from sklearn.model_selection import train_test_split
  5. #决策树模型实现汽车性能预测
  6. car_data = pd.read_csv( r'D:\pyproject\sklearn\car.csv')
  7. car_data = car_data.dropna() #去掉缺失值
  8. #提取特征和类别
  9. X= car_data.ix[:, : 'safety']
  10. y= car_data.ix[:, 'class']
  11. #划分训练集和测试集
  12. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3, random_state= 0)
  13. # 建立决策树模型,选择算法为熵增益,可选gini,entropy,默认为gini
  14. tree_model = tree.DecisionTreeClassifier(criterion= 'gini')
  15. tree_model.fit(X_train, y_train)
  16. predict_data = tree_model.predict(X_test)
  17. accuracy = np.mean(predict_data==y_test)
  18. print(accuracy)

运行结果:

![在这里插入图片描述](https://img-blog.csdn.net/20180502180111950?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzI3MTUwODkz/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)

1.5 KNN(K最临近模型)

     
     
     
     
  1. from sklearn import neighbors
  2. import pandas as pd
  3. import numpy as np
  4. from sklearn.model_selection import train_test_split
  5. #K最邻模型实现汽车性能预测
  6. car_data = pd.read_csv( r'D:\pyproject\sklearn\car.csv')
  7. car_data = car_data.dropna() #去掉缺失值
  8. #提取特征和类别
  9. X= car_data.ix[:, : 'safety']
  10. y= car_data.ix[:, 'class']
  11. #划分训练集和测试集
  12. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3, random_state= 0)
  13. # 建立KNN模型,邻居数选为7,默认为5
  14. knn_model = neighbors.KNeighborsClassifier(n_neighbors = 7)
  15. knn_model.fit(X_train, y_train)
  16. #对测试集进行预测
  17. predict_data = knn_model.predict(X_test)
  18. accuracy = np.mean(predict_data==y_test)
  19. print(accuracy)

运行结果:

![在这里插入图片描述](https://img-blog.csdn.net/20180502180212995?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzI3MTUwODkz/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)

2. 回归问题
  这里使用sklearn自带的数据集,数据集为波斯顿房价,根据波斯顿地区若干指标对房价进行预测。
  2.1 线性回归模型实现

     
     
     
     
  1. from sklearn.linear_model import LinearRegression
  2. from sklearn.datasets import load_boston
  3. from sklearn.model_selection import train_test_split
  4. #导入结果评价包
  5. from sklearn.metrics import mean_absolute_error
  6. #利用线性回归模型预测波斯顿房价
  7. #下载sklearn自带的数据集
  8. data = load_boston()
  9. #建立线性回归模型
  10. clf = LinearRegression()
  11. #划分训练集和测试集
  12. X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size= 0.3, random_state= 0)
  13. clf.fit(X_train, y_train)
  14. predict_data = clf.predict(X_test)
  15. print(predict_data)
  16. #平均绝对值误差对结果进行评价
  17. appraise = mean_absolute_error(y_test, predict_data)
  18. print(appraise)

运行结果:

![在这里插入图片描述](https://img-blog.csdn.net/20180502180307872?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzI3MTUwODkz/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)

你可能感兴趣的:(Python,机器学习)