模型融合 Boosting 方法
文章目录
- 模型融合 Boosting 方法
-
- 1. GBDT 模型
- 2. XGB 模型
- 3. 随机森林
1. GBDT 模型
- 使用网格搜索寻找具备最优超参 GBDT 模型对数据进行预测,采用 MSE 指标对模型进行评价,相关代码如下:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error
splits = 5
repeats = 1
gbdt = GradientBoostingRegressor()
rkfold = RepeatedKFold(n_splits=splits, n_repeats=repeats)
param_grid = {
'n_estimators': [150, 250, 350],
'max_depth': [1, 2, 3],
'min_samples_split': [5, 6, 7]
}
gsearch = GridSearchCV(gbdt, param_grid, cv=rkfold, scoring='neg_mean_squared_error', verbose=1, return_train_score=True)
gsearch.fit(X, y)
model = gsearch.best_estimator_
y_pred = model.predict(X)
print('mse: ', mean_squared_error(y_pred, y))
"""
0.029349091720938178
"""
2. XGB 模型
- 使用网格搜索寻找具备最优超参 XGB 模型对数据进行预测,采用 MSE 指标对模型进行评价,相关代码如下:
import xgboost as xgb
from xgboost import XGBRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error
splits = 5
repeats = 1
xgb = XGBRegressor(objective='reg:squarederror')
rkfold = RepeatedKFold(n_splits=splits, n_repeats=repeats)
param_grid = {
'n_estimators': [100, 200, 300, 400, 500],
'max_depth': [1, 2, 3]
}
gsearch = GridSearchCV(xgb, param_grid, cv=rkfold, scoring='neg_mean_squared_error', verbose=1, return_train_score=True)
gsearch.fit(X, y)
model = gsearch.best_estimator_
y_pred = model.predict(X)
print('mse: ', mean_squared_error(y_pred, y))
"""
mse: 0.028142921912724782
"""
3. 随机森林
- 使用网格搜索寻找具备最优超参的
随机森林
模型对数据进行预测,采用 MSE 指标对模型进行评价,相关代码如下:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error
splits = 5
repeats = 1
rfr = RandomForestRegressor()
rkfold = RepeatedKFold(n_splits=splits, n_repeats=repeats)
param_grid = {
'n_estimators': [100, 150, 200],
'max_features': [8, 12, 16, 20, 24],
'min_samples_split': [2, 4, 6]
}
gsearch = GridSearchCV(rfr, param_grid, cv=rkfold, scoring='neg_mean_squared_error', verbose=1, return_train_score=True)
gsearch.fit(X, y)
model = gsearch.best_estimator_
y_pred = model.predict(X)
print('mse: ', mean_squared_error(y_pred, y))
"""
mse: 0.013881205786668335
"""