贝叶斯调参教程请参考:https://blog.csdn.net/weixin_35757704/article/details/118480135
安装贝叶斯调参:
pip install bayesian-optimization
paper地址:http://papers.nips.cc/paper/4522-practical-bayesian%20-optimization-of-machine-learning-algorithms.pdf
Snoek, Jasper, Hugo Larochelle, and Ryan P. Adams. “Practical bayesian optimization of machine learning algorithms.” Advances in neural information processing systems 25 (2012).
随机森林是树模型的Bagging集成(Bagging集成可以参考:https://blog.csdn.net/weixin_35757704/article/details/119848453)
在分类问题中,使用Gini系数作为分叉标准;基尼指数越大,说明不确定性就越大;基尼系数越小,不确定性越小。
在回归问题中,使用SE(就是MSE后两个字母SE)作为分叉标准
这里我们对三个参数进行调参:
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
import numpy as np
from bayes_opt import BayesianOptimization
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
def train_model(n_estimators, max_depth, max_leaf_nodes):
# 模型训练
try:
model = RandomForestRegressor(
n_estimators=int(n_estimators),
max_depth=int(max_depth),
max_leaf_nodes=int(max_leaf_nodes),
n_jobs=4, # 多核
)
model.fit(x_train, y_train)
score = - mean_squared_error(y_test, model.predict(x_test))
with open(param_save_file, 'a') as file:
file.write("mse:{},n_estimators:{},max_depth:{},max_leaf_nodes:{}".format(
score, n_estimators, max_depth, max_leaf_nodes
) + '\n')
return score
except Exception as e:
return -1000000
if __name__ == '__main__':
# 构造数据
x, y = make_regression(n_samples=1000, n_features=5)
param_save_file = "random_forest_param.txt"
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
# 指定参数
pbounds = {
'n_estimators': (100, 1000),
'max_depth': (18, 40),
'max_leaf_nodes': (20, 200),
}
# 开始调优
optimizer = BayesianOptimization(
f=train_model, # 黑盒目标函数
pbounds=pbounds, # 取值空间
verbose=2, # verbose = 2 时打印全部,verbose = 1 时打印运行中发现的最大值,verbose = 0 将什么都不打印
random_state=1,
)
optimizer.maximize( # 运行
init_points=10, # 随机搜索的步数
n_iter=30, # 执行贝叶斯优化迭代次数
)
with open(param_save_file, 'a') as file:
file.write("optimizer_params: " + str(optimizer.max['params']) + " optimizer_target: " + str(
optimizer.max['target']) + '\n')