sklearn 机器学习中的回归模型 简单使用代码记录

以下模型都将使用波士顿房价数据集进行测试

目录

    • 准备工作
      • 导入数据集
      • 切分数据集
      • 评估指标函数
    • 模型
      • Linear Models
      • KNN
      • SVM
      • DecisionTree
      • Random forest
      • Bagging
      • Xgboost
      • Lightgbm
      • Catboost
      • GradientBoosting
    • Refercence

准备工作

导入数据集

from sklearn import datasets  # 导入库

boston = datasets.load_boston()  # 导入波士顿房价数据
print(boston.keys())  # 查看键(属性)     ['data','target','feature_names','DESCR', 'filename'] 
print(boston.data.shape,boston.target.shape)  # 查看数据的形状 (506, 13) (506,)
print(boston.feature_names)  # 查看有哪些特征 这里共13种
print(boston.DESCR)  # described 描述这个数据集的信息 
print(boston.filename)  # 文件路径 

切分数据集

from sklearn.model_selection import train_test_split
# check data shape
print("boston.data.shape %s , boston.target.shape %s"%(boston.data.shape,boston.target.shape))
train = boston.data  # sample
target = boston.target  # target
# 切割数据样本集合测试集
X_train, x_test, y_train, y_true = train_test_split(train, target, test_size=0.2)  # 20%测试集;80%训练集

评估指标函数

用来评估回归模型好坏的

from sklearn import metrics
import numpy as np

def reg_calculate(true, prediction):
    mse = metrics.mean_squared_error(true, prediction)
    rmse = np.sqrt(mse)
    mae = metrics.mean_absolute_error(true, prediction)
    mape = np.mean(np.abs((true - prediction) / true)) * 100
    r2 = metrics.r2_score(true, prediction)
    rmsle = np.sqrt(metrics.mean_squared_log_error(true, prediction))
    print("mse: {}, rmse: {}, mae: {}, mape: {}, r2: {}, rmsle: {}".format(mse, rmse, mae, mape, r2, rmsle))
    # return mse, rmse, mae, mape, r2, rmsle
  

模型

Linear Models

线性模型一般用来做基准模型,就是说你的其他复杂的模型再菜,也不会比它还菜了吧。

from sklearn.linear_model import LinearRegression  # 多元线性回归算法
from sklearn.linear_model import Ridge  # 线性回归算法Ridge回归,岭回归
from sklearn.linear_model import Lasso  # 线性回归算法Lasso回归,可用作特征筛选


linear = LinearRegression()  
ridge = Ridge()
lasso = Lasso()

linear.fit(X_train, y_train)
ridge.fit(X_train, y_train)
lasso.fit(X_train, y_train)

y_pre_linear = linear.predict(x_test)
y_pre_ridge = ridge.predict(x_test)
y_pre_lasso = lasso.predict(x_test)

# 评估指标
print("linear")
reg_calculate(y_true, y_pre_linear)
print("ridge")
reg_calculate(y_true, y_pre_ridge)
print("lasso")
reg_calculate(y_true, y_pre_lasso)

输出结果:

linear
mse: 31.240513455848852, rmse: 5.589321377041121, mae: 3.53633733472426, mape: 16.6595950646398, r2: 0.6614175896322294, rmsle: 0.21890383040918562
ridge
mse: 31.39335760236521, rmse: 5.602977565756016, mae: 3.5334602249253697, mape: 16.63728623401629, r2: 0.6597610759001087, rmsle: 0.21899426397078484
lasso
mse: 33.51784488799414, rmse: 5.789459809688132, mae: 3.7127882956101144, mape: 16.61875404887328, r2: 0.6367360373718366, rmsle: 0.2135345753220661

KNN

K近邻算法 Kernel ridge regression

from sklearn.neighbors import KNeighborsRegressor 
knn = KNeighborsRegressor()
knn.fit(X_train, y_train)
y_pre_knn = knn.predict(x_test)
# 评估指标
print("KNN")
reg_calculate(y_true, y_pre_knn)
KNN
mse: 43.8670431372549, rmse: 6.623219997648795, mae: 4.411764705882353, mape: 18.437381551851896, r2: 0.5245721802201037, rmsle: 0.23223282023582129

SVM

支持向量机 Support Vector Machine

from sklearn import svm
regr = svm.SVR()
regr.fit(X_train, y_train)
y_pre_svm = regr.predict(x_test)
print("SVM")
reg_calculate(y_true, y_pre_knn)
SVM
mse: 43.8670431372549, rmse: 6.623219997648795, mae: 4.411764705882353, mape: 18.437381551851896, r2: 0.5245721802201037, rmsle: 0.23223282023582129

DecisionTree

决策树

from sklearn.tree import DecisionTreeRegressor
DT = DecisionTreeRegressor()
DT.fit(X_train, y_train)

y_pre_DT = DT.predict(x_test)
print("Decision Tree")
reg_calculate(y_true, y_pre_DT)

Decision Tree
mse: 26.693823529411766, rmse: 5.166606577765697, mae: 3.1813725490196085, mape: 15.562811056549139, r2: 0.710694284033029, rmsle: 0.1975892237980368

Random forest

from sklearn.ensemble import RandomForestRegressor
# from sklearn.pipeline import Pipeline
regr = RandomForestRegressor()
regr.fit(X_train, y_train)
y_pre_regr = regr.predict(x_test)
print("Decision Tree")
reg_calculate(y_true, y_pre_regr)
Decision Tree
mse: 11.207001999999997, rmse: 3.347686066524159, mae: 2.2008235294117653, mape: 10.90976874926566, r2: 0.8785393282501885, rmsle: 0.14116871196392253

Bagging

Xgboost

Lightgbm

Catboost

GradientBoosting

肚子饿了 干脆一起写了算了

import xgboost as xg
import lightgbm as lgm
import catboost as cb
from sklearn.ensemble import BaggingRegressor
from sklearn.ensemble import GradientBoostingRegressor

CB_Regressor=cb.CatBoostRegressor()
xg_Regressor=xg.XGBRegressor()
lgm_Regressor=lgm.LGBMRegressor()
bag_Regressor=BaggingRegressor()
gbd_Regressor=GradientBoostingRegressor()

CB_Regressor.fit(X_train, y_train)
xg_Regressor.fit(X_train, y_train)
lgm_Regressor.fit(X_train, y_train)
bag_Regressor.fit(X_train, y_train)
gbd_Regressor.fit(X_train, y_train)

y_pre_CB = CB_Regressor.predict(x_test)
y_pre_xg = xg_Regressor.predict(x_test)
y_pre_lgm = lgm_Regressor.predict(x_test)
y_pre_bag = bag_Regressor.predict(x_test)
y_pre_gbd = gbd_Regressor.predict(x_test)

print("CB")
reg_calculate(y_true, y_pre_CB)
print("XGBoost")
reg_calculate(y_true, y_pre_xg)
print("LGBM")
reg_calculate(y_true, y_pre_lgm)
print("Bagging")
reg_calculate(y_true, y_pre_bag)
print("GradientBoosting")
reg_calculate(y_true, y_pre_gbd)
CB
mse: 11.791834718420176, rmse: 3.433924099105887, mae: 2.119751585666254, mape: 9.993857387303114, r2: 0.872200953826718, rmsle: 0.13299161406349444
XGBoost
mse: 13.447662535973214, rmse: 3.667105471072957, mae: 2.3307185677921067, mape: 10.9129019218805, r2: 0.8542552124926828, rmsle: 0.14476785768682296
LGBM
mse: 11.641081825640502, rmse: 3.4119029625182047, mae: 2.1377168555798383, mape: 10.464024191664098, r2: 0.8738348027030942, rmsle: 0.1396198526477457
Bagging
mse: 12.581160784313727, rmse: 3.546993203308082, mae: 2.420392156862745, mape: 11.830690090356958, r2: 0.8636462953914766, rmsle: 0.15213713275055957
GradientBoosting
mse: 8.489139804224163, rmse: 2.9136128439146067, mae: 2.287601690996847, mape: 11.586313247266613, r2: 0.907995320853951, rmsle: 0.1372025274842797

Refercence

Scikit-learn document
Random forset
refer

你可能感兴趣的:(sklearn,python,机器学习)