模型融合 Boosting 方法

模型融合 Boosting 方法



文章目录

  • 模型融合 Boosting 方法
    • 1. GBDT 模型
    • 2. XGB 模型
    • 3. 随机森林


1. GBDT 模型

  • 使用网格搜索寻找具备最优超参 GBDT 模型对数据进行预测,采用 MSE 指标对模型进行评价,相关代码如下:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error

# no. k-fold splits
splits = 5
repeats = 1

gbdt = GradientBoostingRegressor()
rkfold = RepeatedKFold(n_splits=splits, n_repeats=repeats)

param_grid = {
    'n_estimators': [150, 250, 350],
    'max_depth': [1, 2, 3],
    'min_samples_split': [5, 6, 7]
}

gsearch = GridSearchCV(gbdt, param_grid, cv=rkfold, scoring='neg_mean_squared_error', verbose=1, return_train_score=True)
gsearch.fit(X, y)

model = gsearch.best_estimator_

y_pred = model.predict(X)

print('mse: ', mean_squared_error(y_pred, y))

"""
0.029349091720938178
"""

2. XGB 模型

  • 使用网格搜索寻找具备最优超参 XGB 模型对数据进行预测,采用 MSE 指标对模型进行评价,相关代码如下:
import xgboost as xgb
from xgboost import XGBRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error

# no. k-fold splits
splits = 5
repeats = 1

xgb = XGBRegressor(objective='reg:squarederror')
rkfold = RepeatedKFold(n_splits=splits, n_repeats=repeats)

param_grid = {
    'n_estimators': [100, 200, 300, 400, 500],
    'max_depth': [1, 2, 3]
}

gsearch = GridSearchCV(xgb, param_grid, cv=rkfold, scoring='neg_mean_squared_error', verbose=1, return_train_score=True)
gsearch.fit(X, y)

model = gsearch.best_estimator_

y_pred = model.predict(X)

print('mse: ', mean_squared_error(y_pred, y))

"""
mse:  0.028142921912724782
"""

3. 随机森林

  • 使用网格搜索寻找具备最优超参的随机森林模型对数据进行预测,采用 MSE 指标对模型进行评价,相关代码如下:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error

# no. k-fold splits
splits = 5
repeats = 1

# X, y = get_training_data_omitoutliers()  # 不加这一步有问题,参考阿里云天池大赛解析中该函数的具体内容

rfr = RandomForestRegressor()
rkfold = RepeatedKFold(n_splits=splits, n_repeats=repeats)

param_grid = {
    'n_estimators': [100, 150, 200],
    'max_features': [8, 12, 16, 20, 24],
    'min_samples_split': [2, 4, 6]
}

gsearch = GridSearchCV(rfr, param_grid, cv=rkfold, scoring='neg_mean_squared_error', verbose=1, return_train_score=True)
gsearch.fit(X, y)

model = gsearch.best_estimator_

y_pred = model.predict(X)

print('mse: ', mean_squared_error(y_pred, y))

"""
mse:  0.013881205786668335
"""

你可能感兴趣的:(boosting,机器学习,人工智能)