机器学习之工作流程调库使用核心代码总结(一)

算法总结

  1. 线性回归
  2. 逻辑回归
  3. 神经网络
  4. KNN
  5. 决策树
  6. PCA
  7. K-means
  8. SVM
  9. 随机森林
  10. adaboost
  11. 朴素贝叶斯

调库核心代码

# 1.
# 线性回归
# 线性回归
from sklearn.linear_model import LinearRegression

# 加入l1正则化的lasso回归
from sklearn.linear_model import Lasso

# 加入l2正则化的岭回归
from sklearn.linear_model import Ridge

# 2.
# 逻辑回归
from sklearn.linear_model import LogisticRegression

# 3.
# 神经网络
from sklearn.neural_network import MLPClassifier
from sklearn.neural_network import MLPRegressor

# 4.
# KNN
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neighbors import KNeighborsRegressor

# 5.
# 决策树
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import DecisionTreeRegressor

# 6.
# PCA
from sklearn.decomposition import PCA

# 7.
# K - means
from sklearn.cluster import KMeans

# 8.
# SVM
from sklearn.svm import SVC
from sklearn.svm import SVR

# 9.
# 随机森林
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import RandomForestClassifier

# 10.
# adaboost
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import AdaBoostRegressor

# 11.
# 朴素贝叶斯
# 多项式贝叶斯
from sklearn.naive_bayes import MultinomialNB
# 高斯分布贝叶斯
from sklearn.naive_bayes import GaussianNB

数据加载和预处理总结

  1. 加载数据
  2. 特征缩放
  3. 数据切分
  4. 随机抽样

核心代码

# 1.
# 加载数据
import numpy as np
import pandas as pd
data = np.loadtxt()
data = pd.read_excel()
pd.read_csv()
pd.read_table()
pd.read_json()

# 2.
# 特征缩放
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler

# 3.
# 数据切分
from sklearn.model_selection import train_test_split

# 4.
# 随机抽样
data_s = data.sample(frac=1,replace=True)

模型选择和超参数选择总结

  1. 交叉验证进行最优模型选择
  2. 表格搜索进行最优超参数选择

核心代码

# 1.
# 交叉验证进行最优模型选择
from sklearn.model_selection import cross_val_score
model = RandomForestClassifier(n_estimators=200,max_depth=2,max_features=5,oob_score=True)
f1_score = cross_val_score(model,x,y,scoring='f1',cv=5)
# 打印出来的是n折的f1值,取均值与其他分类器比较,选择最优的模型
print(f1_score)

# 2.
# 表格搜索进行最优超参数选择
from sklearn.model_selection import GridSearchCV
params = {
    'n_estimators':[],
    'max_depth':[],
    'max_features':[],
    'oob_score':[]
}
G_model = GridSearchCV(model,param_grid=params,cv=5)
# 交叉验证,训练模型
G_model.fit()
# 打印最优参数
print(G_model.best_params_)

评价指标总结

  1. MSE(回归)
  2. RMSE(回归)
  3. MAE(回归)
  4. R2(回归)
  5. 准确率(分类)
  6. 精确率(分类)
  7. 召回率(分类)
  8. F1值(分类)
  9. AUC值(分类)
  10. roc曲线(分类)
  11. 混淆矩阵(分类)
  12. 分类报告(分类)

核心代码

# 1.
# MSE(回归)
from sklearn.metrics import mean_squared_error
# 2.
# RMSE(回归)
from sklearn.metrics import mean_squared_log_error
# 3.
# MAE(回归)
from sklearn.metrics import mean_absolute_error
# 4.
# R2(回归)
from sklearn.metrics import r2_score
# 5.
# 准确率(分类)
from sklearn.metrics import accuracy_score
# 6.
# 精确率(分类)
from sklearn.metrics import precision_score
# 7.
# 召回率(分类)
from sklearn.metrics import recall_score
# 8.
# F1值(分类)
from sklearn.metrics import f1_score
# 9.
# AUC值(分类)
from sklearn.metrics import roc_auc_score
# 10.
# roc曲线(分类)
from sklearn.metrics import roc_curve
FPR,TPR,TH = roc_curve()
# 11.
# 混淆矩阵(分类)
from sklearn.metrics import confusion_matrix
# 12.
# 分类报告(分类)
from sklearn.metrics import classification_report

你可能感兴趣的:(机器学习算法思想及代码实现,机器学习,流程总结,核心代码)