XGBoost官方文档
XGBoost本质上还是一个GBDT,是一个优化的分布式梯度增强库,旨在实现高效,灵活和便携。Xgboost以CART决策树为子模型,通过Gradient Tree Boosting实现多棵CART树的集成学习,得到最终模型。
XGBoost的最终模型构建:
引用陈天奇的论文,我们的数据为: D = { ( x i , y i ) } ( ∣ D ∣ = n , x i ∈ R m , y i ∈ R ) \mathcal{D}=\left\{\left(\mathbf{x}_{i}, y_{i}\right)\right\}\left(|\mathcal{D}|=n, \mathbf{x}_{i} \in \mathbb{R}^{m}, y_{i} \in \mathbb{R}\right) D={(xi,yi)}(∣D∣=n,xi∈Rm,yi∈R)
(1) 构造目标函数:
假设有K棵树,则第i个样本的输出为 y ^ i = ϕ ( x i ) = ∑ k = 1 K f k ( x i ) , f k ∈ F \hat{y}_{i}=\phi\left(\mathrm{x}_{i}\right)=\sum_{k=1}^{K} f_{k}\left(\mathrm{x}_{i}\right), \quad f_{k} \in \mathcal{F} y^i=ϕ(xi)=∑k=1Kfk(xi),fk∈F,其中, F = { f ( x ) = w q ( x ) } ( q : R m → T , w ∈ R T ) \mathcal{F}=\left\{f(\mathbf{x})=w_{q(\mathbf{x})}\right\}\left(q: \mathbb{R}^{m} \rightarrow T, w \in \mathbb{R}^{T}\right) F={f(x)=wq(x)}(q:Rm→T,w∈RT)
因此,目标函数的构建为:
L ( ϕ ) = ∑ i l ( y ^ i , y i ) + ∑ k Ω ( f k ) \mathcal{L}(\phi)=\sum_{i} l\left(\hat{y}_{i}, y_{i}\right)+\sum_{k} \Omega\left(f_{k}\right) L(ϕ)=i∑l(y^i,yi)+k∑Ω(fk)
其中, ∑ i l ( y ^ i , y i ) \sum_{i} l\left(\hat{y}_{i}, y_{i}\right) ∑il(y^i,yi)为loss function, ∑ k Ω ( f k ) \sum_{k} \Omega\left(f_{k}\right) ∑kΩ(fk)为正则化项。
(2) 叠加式的训练(Additive Training):
给定样本 x i x_i xi, y ^ i ( 0 ) = 0 \hat{y}_i^{(0)} = 0 y^i(0)=0(初始预测), y ^ i ( 1 ) = y ^ i ( 0 ) + f 1 ( x i ) \hat{y}_i^{(1)} = \hat{y}_i^{(0)} + f_1(x_i) y^i(1)=y^i(0)+f1(xi), y ^ i ( 2 ) = y ^ i ( 0 ) + f 1 ( x i ) + f 2 ( x i ) = y ^ i ( 1 ) + f 2 ( x i ) \hat{y}_i^{(2)} = \hat{y}_i^{(0)} + f_1(x_i) + f_2(x_i) = \hat{y}_i^{(1)} + f_2(x_i) y^i(2)=y^i(0)+f1(xi)+f2(xi)=y^i(1)+f2(xi)…以此类推,可以得到: y ^ i ( K ) = y ^ i ( K − 1 ) + f K ( x i ) \hat{y}_i^{(K)} = \hat{y}_i^{(K-1)} + f_K(x_i) y^i(K)=y^i(K−1)+fK(xi) 其中, y ^ i ( K − 1 ) \hat{y}_i^{(K-1)} y^i(K−1) 为前K-1棵树的预测结果, f K ( x i ) f_K(x_i) fK(xi) 为第K棵树的预测结果。
因此,目标函数可以分解为:
L ( K ) = ∑ i = 1 n l ( y i , y ^ i ( K − 1 ) + f K ( x i ) ) + ∑ k Ω ( f k ) \mathcal{L}^{(K)}=\sum_{i=1}^{n} l\left(y_{i}, \hat{y}_{i}^{(K-1)}+f_{K}\left(\mathrm{x}_{i}\right)\right)+\sum_{k} \Omega\left(f_{k}\right) L(K)=i=1∑nl(yi,y^i(K−1)+fK(xi))+k∑Ω(fk)
由于正则化项也可以分解为前K-1棵树的复杂度加第K棵树的复杂度,因此: L ( K ) = ∑ i = 1 n l ( y i , y ^ i ( K − 1 ) + f K ( x i ) ) + ∑ k = 1 K − 1 Ω ( f k ) + Ω ( f K ) \mathcal{L}^{(K)}=\sum_{i=1}^{n} l\left(y_{i}, \hat{y}_{i}^{(K-1)}+f_{K}\left(\mathrm{x}_{i}\right)\right)+\sum_{k=1} ^{K-1}\Omega\left(f_{k}\right)+\Omega\left(f_{K}\right) L(K)=i=1∑nl(yi,y^i(K−1)+fK(xi))+k=1∑K−1Ω(fk)+Ω(fK)由于 ∑ k = 1 K − 1 Ω ( f k ) \sum_{k=1} ^{K-1}\Omega\left(f_{k}\right) ∑k=1K−1Ω(fk)在模型构建到第K棵树的时候已经固定,无法改变,因此是一个已知的常数,可以在最优化的时候省去,故:
L ( K ) = ∑ i = 1 n l ( y i , y ^ i ( K − 1 ) + f K ( x i ) ) + Ω ( f K ) \mathcal{L}^{(K)}=\sum_{i=1}^{n} l\left(y_{i}, \hat{y}_{i}^{(K-1)}+f_{K}\left(\mathrm{x}_{i}\right)\right)+\Omega\left(f_{K}\right) L(K)=i=1∑nl(yi,y^i(K−1)+fK(xi))+Ω(fK)
(3) 使用泰勒级数近似目标函数:
L ( K ) ≃ ∑ i = 1 n [ l ( y i , y ^ ( K − 1 ) ) + g i f K ( x i ) + 1 2 h i f K 2 ( x i ) ] + Ω ( f K ) \mathcal{L}^{(K)} \simeq \sum_{i=1}^{n}\left[l\left(y_{i}, \hat{y}^{(K-1)}\right)+g_{i} f_{K}\left(\mathrm{x}_{i}\right)+\frac{1}{2} h_{i} f_{K}^{2}\left(\mathrm{x}_{i}\right)\right]+\Omega\left(f_{K}\right) L(K)≃i=1∑n[l(yi,y^(K−1))+gifK(xi)+21hifK2(xi)]+Ω(fK)
其中, g i = ∂ y ^ ( t − 1 ) l ( y i , y ^ ( t − 1 ) ) g_{i}=\partial_{\hat{y}(t-1)} l\left(y_{i}, \hat{y}^{(t-1)}\right) gi=∂y^(t−1)l(yi,y^(t−1))和 h i = ∂ y ^ ( t − 1 ) 2 l ( y i , y ^ ( t − 1 ) ) h_{i}=\partial_{\hat{y}^{(t-1)}}^{2} l\left(y_{i}, \hat{y}^{(t-1)}\right) hi=∂y^(t−1)2l(yi,y^(t−1))
在这里,我们补充下泰勒级数的相关知识:
在数学中,泰勒级数(英语:Taylor series)用无限项连加式——级数来表示一个函数,这些相加的项由函数在某一点的导数求得。具体的形式如下:
f ( x ) = f ( x 0 ) 0 ! + f ′ ( x 0 ) 1 ! ( x − x 0 ) + f ′ ′ ( x 0 ) 2 ! ( x − x 0 ) 2 + … + f ( n ) ( x 0 ) n ! ( x − x 0 ) n + . . . . . . f(x)=\frac{f\left(x_{0}\right)}{0 !}+\frac{f^{\prime}\left(x_{0}\right)}{1 !}\left(x-x_{0}\right)+\frac{f^{\prime \prime}\left(x_{0}\right)}{2 !}\left(x-x_{0}\right)^{2}+\ldots+\frac{f^{(n)}\left(x_{0}\right)}{n !}\left(x-x_{0}\right)^{n}+...... f(x)=0!f(x0)+1!f′(x0)(x−x0)+2!f′′(x0)(x−x0)2+…+n!f(n)(x0)(x−x0)n+......
由于 ∑ i = 1 n l ( y i , y ^ ( K − 1 ) ) \sum_{i=1}^{n}l\left(y_{i}, \hat{y}^{(K-1)}\right) ∑i=1nl(yi,y^(K−1))在模型构建到第K棵树的时候已经固定,无法改变,因此是一个已知的常数,可以在最优化的时候省去,故:
L ~ ( K ) = ∑ i = 1 n [ g i f K ( x i ) + 1 2 h i f K 2 ( x i ) ] + Ω ( f K ) \tilde{\mathcal{L}}^{(K)}=\sum_{i=1}^{n}\left[g_{i} f_{K}\left(\mathbf{x}_{i}\right)+\frac{1}{2} h_{i} f_{K}^{2}\left(\mathbf{x}_{i}\right)\right]+\Omega\left(f_{K}\right) L~(K)=i=1∑n[gifK(xi)+21hifK2(xi)]+Ω(fK)
(4) 如何定义一棵树:
为了说明如何定义一棵树的问题,我们需要定义几个概念:
q ( x 1 ) = 1 , q ( x 2 ) = 3 , q ( x 3 ) = 1 , q ( x 4 ) = 2 , q ( x 5 ) = 3 q(x_1) = 1,q(x_2) = 3,q(x_3) = 1,q(x_4) = 2,q(x_5) = 3 q(x1)=1,q(x2)=3,q(x3)=1,q(x4)=2,q(x5)=3
I 1 = { 1 , 3 } , I 2 = { 4 } , I 3 = { 2 , 5 } I_1 = \{1,3\},I_2 = \{4\},I_3 = \{2,5\} I1={1,3},I2={4},I3={2,5}, w = ( 15 , 12 , 20 ) w = (15,12,20) w=(15,12,20)
因此,目标函数用以上符号替代后:
L ~ ( K ) = ∑ i = 1 n [ g i f K ( x i ) + 1 2 h i f K 2 ( x i ) ] + γ T + 1 2 λ ∑ j = 1 T w j 2 = ∑ j = 1 T [ ( ∑ i ∈ I j g i ) w j + 1 2 ( ∑ i ∈ I j h i + λ ) w j 2 ] + γ T \begin{aligned} \tilde{\mathcal{L}}^{(K)} &=\sum_{i=1}^{n}\left[g_{i} f_{K}\left(\mathrm{x}_{i}\right)+\frac{1}{2} h_{i} f_{K}^{2}\left(\mathrm{x}_{i}\right)\right]+\gamma T+\frac{1}{2} \lambda \sum_{j=1}^{T} w_{j}^{2} \\ &=\sum_{j=1}^{T}\left[\left(\sum_{i \in I_{j}} g_{i}\right) w_{j}+\frac{1}{2}\left(\sum_{i \in I_{j}} h_{i}+\lambda\right) w_{j}^{2}\right]+\gamma T \end{aligned} L~(K)=i=1∑n[gifK(xi)+21hifK2(xi)]+γT+21λj=1∑Twj2=j=1∑T⎣⎡⎝⎛i∈Ij∑gi⎠⎞wj+21⎝⎛i∈Ij∑hi+λ⎠⎞wj2⎦⎤+γT
由于我们的目标就是最小化目标函数,现在的目标函数化简为一个关于w的二次函数: L ~ ( K ) = ∑ j = 1 T [ ( ∑ i ∈ I j g i ) w j + 1 2 ( ∑ i ∈ I j h i + λ ) w j 2 ] + γ T \tilde{\mathcal{L}}^{(K)}=\sum_{j=1}^{T}\left[\left(\sum_{i \in I_{j}} g_{i}\right) w_{j}+\frac{1}{2}\left(\sum_{i \in I_{j}} h_{i}+\lambda\right) w_{j}^{2}\right]+\gamma T L~(K)=j=1∑T⎣⎡⎝⎛i∈Ij∑gi⎠⎞wj+21⎝⎛i∈Ij∑hi+λ⎠⎞wj2⎦⎤+γT根据二次函数求极值的公式: y = a x 2 b x c y=ax^2 bx c y=ax2bxc求极值,对称轴在 x = − b 2 a x=-\frac{b}{2 a} x=−2ab,极值为 y = 4 a c − b 2 4 a y=\frac{4 a c-b^{2}}{4 a} y=4a4ac−b2,因此:
w j ∗ = − ∑ i ∈ I j g i ∑ i ∈ I j h i + λ w_{j}^{*}=-\frac{\sum_{i \in I_{j}} g_{i}}{\sum_{i \in I_{j}} h_{i}+\lambda} wj∗=−∑i∈Ijhi+λ∑i∈Ijgi
以及
L ~ ( K ) ( q ) = − 1 2 ∑ j = 1 T ( ∑ i ∈ I j g i ) 2 ∑ i ∈ I j h i + λ + γ T \tilde{\mathcal{L}}^{(K)}(q)=-\frac{1}{2} \sum_{j=1}^{T} \frac{\left(\sum_{i \in I_{j}} g_{i}\right)^{2}}{\sum_{i \in I_{j}} h_{i}+\lambda}+\gamma T L~(K)(q)=−21j=1∑T∑i∈Ijhi+λ(∑i∈Ijgi)2+γT
(5) 如何寻找树的形状:
不难发现,刚刚的讨论都是基于树的形状已经确定了计算 w w w和 L L L,但是实际上我们需要像学习决策树一样找到树的形状。因此,我们借助决策树学习的方式,使用目标函数的变化来作为分裂节点的标准。我们使用一个例子来说明:
例子中有8个样本,分裂方式如下,因此:
L ~ ( o l d ) = − 1 2 [ ( g 7 + g 8 ) 2 H 7 + H 8 + λ + ( g 1 + . . . + g 6 ) 2 H 1 + . . . + H 6 + λ ] + 2 γ L ~ ( n e w ) = − 1 2 [ ( g 7 + g 8 ) 2 H 7 + H 8 + λ + ( g 1 + . . . + g 3 ) 2 H 1 + . . . + H 3 + λ + ( g 4 + . . . + g 6 ) 2 H 4 + . . . + H 6 + λ ] + 3 γ L ~ ( o l d ) − L ~ ( n e w ) = 1 2 [ ( g 1 + . . . + g 3 ) 2 H 1 + . . . + H 3 + λ + ( g 4 + . . . + g 6 ) 2 H 4 + . . . + H 6 + λ − ( g 1 + . . . + g 6 ) 2 h 1 + . . . + h 6 + λ ] − γ \tilde{\mathcal{L}}^{(old)} = -\frac{1}{2}[\frac{(g_7 + g_8)^2}{H_7+H_8 + \lambda} + \frac{(g_1 +...+ g_6)^2}{H_1+...+H_6 + \lambda}] + 2\gamma \\ \tilde{\mathcal{L}}^{(new)} = -\frac{1}{2}[\frac{(g_7 + g_8)^2}{H_7+H_8 + \lambda} + \frac{(g_1 +...+ g_3)^2}{H_1+...+H_3 + \lambda} + \frac{(g_4 +...+ g_6)^2}{H_4+...+H_6 + \lambda}] + 3\gamma\\ \tilde{\mathcal{L}}^{(old)} - \tilde{\mathcal{L}}^{(new)} = \frac{1}{2}[ \frac{(g_1 +...+ g_3)^2}{H_1+...+H_3 + \lambda} + \frac{(g_4 +...+ g_6)^2}{H_4+...+H_6 + \lambda} - \frac{(g_1+...+g_6)^2}{h_1+...+h_6+\lambda}] - \gamma L~(old)=−21[H7+H8+λ(g7+g8)2+H1+...+H6+λ(g1+...+g6)2]+2γL~(new)=−21[H7+H8+λ(g7+g8)2+H1+...+H3+λ(g1+...+g3)2+H4+...+H6+λ(g4+...+g6)2]+3γL~(old)−L~(new)=21[H1+...+H3+λ(g1+...+g3)2+H4+...+H6+λ(g4+...+g6)2−h1+...+h6+λ(g1+...+g6)2]−γ
因此,从上面的例子看出:分割节点的标准为 m a x { L ~ ( o l d ) − L ~ ( n e w ) } max\{\tilde{\mathcal{L}}^{(old)} - \tilde{\mathcal{L}}^{(new)} \} max{L~(old)−L~(new)},即:
L split = 1 2 [ ( ∑ i ∈ I L g i ) 2 ∑ i ∈ I L h i + λ + ( ∑ i ∈ I R g i ) 2 ∑ i ∈ I R h i + λ − ( ∑ i ∈ I g i ) 2 ∑ i ∈ I h i + λ ] − γ \mathcal{L}_{\text {split }}=\frac{1}{2}\left[\frac{\left(\sum_{i \in I_{L}} g_{i}\right)^{2}}{\sum_{i \in I_{L}} h_{i}+\lambda}+\frac{\left(\sum_{i \in I_{R}} g_{i}\right)^{2}}{\sum_{i \in I_{R}} h_{i}+\lambda}-\frac{\left(\sum_{i \in I} g_{i}\right)^{2}}{\sum_{i \in I} h_{i}+\lambda}\right]-\gamma Lsplit =21[∑i∈ILhi+λ(∑i∈ILgi)2+∑i∈IRhi+λ(∑i∈IRgi)2−∑i∈Ihi+λ(∑i∈Igi)2]−γ
基于直方图的近似算法,可以更高效地选 择最优特征及切分点。主要思想是:
基于直方图的近似算法的计算过程如下所示:
下面用一个例子说明基于直方图的近似算法:
假设有一个年龄特征,其特征的取值为18、19、21、31、36、37、55、57,我们需要使用近似算法找到年龄这个特征的最佳分裂点:
近似算法实现了两种候选切分点的构建策略:全局策略和本地策略。
# XGBoost原生工具库的上手:
import xgboost as xgb # 引入工具库
# read in data
dtrain = xgb.DMatrix('demo/data/agaricus.txt.train') # XGBoost的专属数据格式,但是也可以用dataframe或者ndarray
dtest = xgb.DMatrix('demo/data/agaricus.txt.test') # # XGBoost的专属数据格式,但是也可以用dataframe或者ndarray
# specify parameters via map
param = {'max_depth':2, 'eta':1, 'objective':'binary:logistic' } # 设置XGB的参数,使用字典形式传入
num_round = 2 # 使用线程数
bst = xgb.train(param, dtrain, num_round) # 训练
# make prediction
preds = bst.predict(dtest) # 预测
XGBoost的参数设置(括号内的名称为sklearn接口对应的参数名字):
推荐博客:
推荐官方文档
XGBoost的参数分为三种:
通用参数:(两种类型的booster,因为tree的性能比线性回归好得多,因此我们很少用线性回归。)
任务参数(这个参数用来控制理想的优化目标和每一步结果的度量方法。)
命令行参数(这里不说了,因为很少用命令行控制台版本)
参数调优的一般步骤:
具体的api请查看:https://xgboost.readthedocs.io/en/latest/python/python_api.html
推荐github:https://github.com/dmlc/xgboost/tree/master/demo/guide-pytho
请查看datawhale《集成学习Boosting》
ightGBM也是像XGBoost一样,是一类集成算法,他跟XGBoost总体来说是一样的,算法本质上与Xgboost没有出入,只是在XGBoost的基础上进行了优化:
LightGBM的优点:
1)更快的训练效率
2)低内存使用
3)更高的准确率
4)支持并行化学习
LightGBM参数说明: 推荐文档1、推荐文档2
LightGBM与网格搜索结合调参,参考代码:
1.核心参数:(括号内名称是别名)
2.用于控制模型学习过程的参数:
3.度量参数:
4.GPU 参数:
import lightgbm as lgb
from sklearn import metrics
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
canceData=load_breast_cancer()
X=canceData.data
y=canceData.target
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0,test_size=0.2)
### 数据转换
print('数据转换')
lgb_train = lgb.Dataset(X_train, y_train, free_raw_data=False)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train,free_raw_data=False)
### 设置初始参数--不含交叉验证参数
print('设置参数')
params = {
'boosting_type': 'gbdt',
'objective': 'binary',
'metric': 'auc',
'nthread':4,
'learning_rate':0.1
}
### 交叉验证(调参)
print('交叉验证')
max_auc = float('0')
best_params = {}
# 准确率
print("调参1:提高准确率")
for num_leaves in range(5,100,5):
for max_depth in range(3,8,1):
params['num_leaves'] = num_leaves
params['max_depth'] = max_depth
cv_results = lgb.cv(
params,
lgb_train,
seed=1,
nfold=5,
metrics=['auc'],
early_stopping_rounds=10,
verbose_eval=True
)
mean_auc = pd.Series(cv_results['auc-mean']).max()
boost_rounds = pd.Series(cv_results['auc-mean']).idxmax()
if mean_auc >= max_auc:
max_auc = mean_auc
best_params['num_leaves'] = num_leaves
best_params['max_depth'] = max_depth
if 'num_leaves' and 'max_depth' in best_params.keys():
params['num_leaves'] = best_params['num_leaves']
params['max_depth'] = best_params['max_depth']
# 过拟合
print("调参2:降低过拟合")
for max_bin in range(5,256,10):
for min_data_in_leaf in range(1,102,10):
params['max_bin'] = max_bin
params['min_data_in_leaf'] = min_data_in_leaf
cv_results = lgb.cv(
params,
lgb_train,
seed=1,
nfold=5,
metrics=['auc'],
early_stopping_rounds=10,
verbose_eval=True
)
mean_auc = pd.Series(cv_results['auc-mean']).max()
boost_rounds = pd.Series(cv_results['auc-mean']).idxmax()
if mean_auc >= max_auc:
max_auc = mean_auc
best_params['max_bin']= max_bin
best_params['min_data_in_leaf'] = min_data_in_leaf
if 'max_bin' and 'min_data_in_leaf' in best_params.keys():
params['min_data_in_leaf'] = best_params['min_data_in_leaf']
params['max_bin'] = best_params['max_bin']
print("调参3:降低过拟合")
for feature_fraction in [0.6,0.7,0.8,0.9,1.0]:
for bagging_fraction in [0.6,0.7,0.8,0.9,1.0]:
for bagging_freq in range(0,50,5):
params['feature_fraction'] = feature_fraction
params['bagging_fraction'] = bagging_fraction
params['bagging_freq'] = bagging_freq
cv_results = lgb.cv(
params,
lgb_train,
seed=1,
nfold=5,
metrics=['auc'],
early_stopping_rounds=10,
verbose_eval=True
)
mean_auc = pd.Series(cv_results['auc-mean']).max()
boost_rounds = pd.Series(cv_results['auc-mean']).idxmax()
if mean_auc >= max_auc:
max_auc=mean_auc
best_params['feature_fraction'] = feature_fraction
best_params['bagging_fraction'] = bagging_fraction
best_params['bagging_freq'] = bagging_freq
if 'feature_fraction' and 'bagging_fraction' and 'bagging_freq' in best_params.keys():
params['feature_fraction'] = best_params['feature_fraction']
params['bagging_fraction'] = best_params['bagging_fraction']
params['bagging_freq'] = best_params['bagging_freq']
print("调参4:降低过拟合")
for lambda_l1 in [1e-5,1e-3,1e-1,0.0,0.1,0.3,0.5,0.7,0.9,1.0]:
for lambda_l2 in [1e-5,1e-3,1e-1,0.0,0.1,0.4,0.6,0.7,0.9,1.0]:
params['lambda_l1'] = lambda_l1
params['lambda_l2'] = lambda_l2
cv_results = lgb.cv(
params,
lgb_train,
seed=1,
nfold=5,
metrics=['auc'],
early_stopping_rounds=10,
verbose_eval=True
)
mean_auc = pd.Series(cv_results['auc-mean']).max()
boost_rounds = pd.Series(cv_results['auc-mean']).idxmax()
if mean_auc >= max_auc:
max_auc=mean_auc
best_params['lambda_l1'] = lambda_l1
best_params['lambda_l2'] = lambda_l2
if 'lambda_l1' and 'lambda_l2' in best_params.keys():
params['lambda_l1'] = best_params['lambda_l1']
params['lambda_l2'] = best_params['lambda_l2']
print("调参5:降低过拟合2")
for min_split_gain in [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]:
params['min_split_gain'] = min_split_gain
cv_results = lgb.cv(
params,
lgb_train,
seed=1,
nfold=5,
metrics=['auc'],
early_stopping_rounds=10,
verbose_eval=True
)
mean_auc = pd.Series(cv_results['auc-mean']).max()
boost_rounds = pd.Series(cv_results['auc-mean']).idxmax()
if mean_auc >= max_auc:
max_auc=mean_auc
best_params['min_split_gain'] = min_split_gain
if 'min_split_gain' in best_params.keys():
params['min_split_gain'] = best_params['min_split_gain']
print(best_params)
{'bagging_fraction': 0.7,
'bagging_freq': 30,
'feature_fraction': 0.8,
'lambda_l1': 0.1,
'lambda_l2': 0.0,
'max_bin': 255,
'max_depth': 4,
'min_data_in_leaf': 81,
'min_split_gain': 0.1,
'num_leaves': 10}