实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比


    • 一. 数据预处理
      • 1.1 读取&清理&切割数据
      • 1.2 标签的分布
    • 二. 基础模型建立
      • 2.1 LightGBM建模
      • 2.2 默认参数的效果
    • 三. 设置参数空间
      • 3.* 参数空间采样
    • 四. 随机优化
      • 4.1 交叉验证LightGBM
      • 4.2 Objective Function
      • 4.3 执行随机调参
      • 4.4 Random Search 结果
    • 五. 贝叶斯优化
      • 5.1 Objective Function
      • 5.2 Domain Space
        • 5.2.1 学习率分布
        • 5.2.2 叶子数分布
        • 5.2.3 boosting_type
        • 5.2.4 参数分布汇总
          • 5.2.4.* 参数采样结果看一下
      • 5.3 准备贝叶斯优化
      • 5.4 贝叶斯优化结果
        • 5.4.1 保存结果
        • 5.4.2 测试集上的效果
    • 六. 随机VS贝叶斯 方法对比
      • 6.1 调参过程可视化展示
      • 6.2 学习率对比
      • 6.3 Boosting Type 对比
      • 6.4 数值型参数 对比
    • 七. 贝叶斯优化参数变化情况
      • 7.1 Boosting Type 变化
      • 7.2 学习率&叶子数&... 变化
      • 7.3 reg_alpha, reg_lambda 变化
      • 7.4 随机与贝叶斯优化损失变化的对比
      • 7.5 保存结果


一. 数据预处理

1.1 读取&清理&切割数据

import pandas as pd
import numpy as np

data = pd.read_csv('caravan-insurance-challenge.csv')

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第1张图片

train = data[data['ORIGIN'] == 'train']
test = data[data['ORIGIN'] == 'test']

train_labels = np.array(train['CARAVAN'].astype(np.int32)).reshape((-1,))
test_labels = np.array(test['CARAVAN'].astype(np.int32)).reshape((-1,))

train = train.drop(['ORIGIN', 'CARAVAN'], axis = 1)
test = test.drop(['ORIGIN', 'CARAVAN'], axis = 1)

features = np.array(train)
test_features = np.array(test)
lebels = train_labels[:]

print('Train shape:', train.shape)
print('Test shape:', test.shape)

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第2张图片

1.2 标签的分布

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

plt.hist(labels, edgecolor = 'k')
plt.xlabel('Label'); plt.ylabel('Count'); plt.title('Count of Labels')

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第3张图片

二. 基础模型建立

2.1 LightGBM建模

import lightgbm as lgb
model = lgb.LGBMClassifier()

LGBMClassifier(boosting_type=‘gbdt’, class_weight=None, colsample_bytree=1.0, importance_type=‘split’, learning_rate=0.1, max_depth=-1, min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0, n_estimators=100, n_jobs=-1, num_leaves=31, objective=None, random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True, subsample=1.0, subsample_for_bin=200000, subsample_freq=0)

2.2 默认参数的效果


from sklearn.metrics import roc_auc_score
from timeit import default_timer as timer

start = timer()
model.fit(features, labels)
train_time = timer() - start

predictions = model.predict_proba(test_featurs)[:, 1]
auc = roc_auc_score(test_labels, predictions)

print('The baseline score on the test set is {:.4f}.'.format(auc))
print('The baseline training time is {:.4f} seconds.'.format(train_time))

The baseline score on the test set is 0.7092.
The baseline training time is 0.3402 seconds.

三. 设置参数空间

RandomizedSearchCV没有Early Stopping功能 , 所以我们来自己写一下 .


import random

param_grid = {'class_weight': [None, 'balanced'],
             'boosting_type': ['gbdt', 'goss', 'dart'],
             'num_leaves': list(range(30, 150)),
             'learning_rate': list(np.logspace(np.log(0.005), np.log(0.2), base=np.exp(1), num=800))),
             'subsample_for_bin': list(range(20000, 300000, 20000)),
             'min_child_samples': list(range(20, 500, 5)),
             'reg_alpha': list(np.linspace(0, 1)),
             'reg_lambda': list(np.linspace(0, 1)),
             'colsample_bytree': list(np.linspace(0.6, 1, 10))}
subsample_dist = list(np.linepace(0.5, 1, 100))

# 学习率的分布
plt.hist(param_grid['learning_rate'], color = 'r', edgecolor = 'k')
plt.xlabel('Learning Rate'); plt.ylabel('Count'); plt.title('Learning Rate Distribution', size =18)

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第4张图片

# 叶子数目的分布
plt.hist(param_grid['num_leaves'], color = 'm', edgecolor = 'k')
plt.xlabel('Learning Number of Leaves'); plt.ylabel('Count'); plt.title('Number of Leaves Distribution')

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第5张图片

3.* 参数空间采样

{key: random.sample(value, 2) for key, value in param_grid.items()}

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第6张图片

params = {key: random.sample(value, 1)[0] for key, value in param_grid.items()}
params['subsample'] = random.sample(subsample_dist, 1)[0] if params['boosting_type'] != 'goss' else 1.0

{‘class_weight’: ‘balanced’, ‘boosting_type’: ‘gbdt’,
‘num_leaves’: 149, ‘learning_rate’: 0.024474734290096542,
‘subsample_for_bin’: 200000, ‘min_child_samples’: 110,
‘reg_alpha’: 0.8163265306122448, ‘reg_lambda’: 0.26530612244897955,
‘colsample_bytree’: 0.6888888888888889, ‘subsample’: 0.8282828282828283}

四. 随机优化

4.1 交叉验证LightGBM

# Create a lgb dataset
train_set = lgb.Dataset(features, label = labels)

r = lgb.cv(params, train_set, num_boost_round=10000, nfold=10, metrics='auc',
          early_stopping_rounds = 80, verbose_eval = False, seed = 42)
# early_stopping_rounds = 80:如果再连续构造80次还是没进步,那就停止

r_best = np.max(r['auc-mean']) # Highest score
r_best_std = r['auc-stdv'][np.argmax(r['auc-mean'])]
# Standard deviation of best score

print('The maximum ROC AUC on the validation set was {:.5f}.'.format(r_best, r_best_std))
print('The ideal numbel of iterations was {}.'.format(np.argmax(r['auc-mean']) + 1)

The maximum ROC AUC on the validation set was 0.75553 with std of 0.03082.
The ideal number of iterations was 73.

# 保存结果
random_results = pd.DataFrame(columns = ['loss', 'params', 'iteration', 'estimators',
                                        'time'], index = list(range(Max_evals)))

4.2 Objective Function


Max_evals = 200
N_folds = 3
def random_objective(params, iteration, n_folds = N_folds):
	start = timer()
	cv_results = lgb.cv(params, train_set, num_boost_round = 10000, nfold = n_folds,
                       early_stopping_rounds = 80, metrics = 'auc', seed = 42)
	end = timer()
	best_score = np.max(cv_results['auc-mean'])
	loss = 1 - best_score
	n_estimators = int(np.argmax(cv_results['auc-mean']) + 1)
	return [loss, params, iteration, n_estimators, end-start]

4.3 执行随机调参


for i in range(Max_evals):
	params = {key: random.sample(value, 1)[0] for key, value in param_grid.items()}

	if params['boosting_type'] == 'goss':
		params['subsample'] = 1.0
		params['subsample'] = random.sample(subsample_dist, 1)[0]

	results_list = random_objective(params, i)
	random_results.loc[i, :] = results_list

random_results.sort_values('loss', ascending = True, inplace = True)
random_results.reset_index(inplace = True, drop = True)

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第7张图片

4.4 Random Search 结果

random_results.loc[0, 'params']

{‘class_weight’: None, ‘boosting_type’: ‘dart’, ‘num_leaves’: 112,
‘learning_rate’: 0.020631460653340816, ‘subsample_for_bin’: 160000,
‘min_child_samples’: 220, ‘reg_alpha’: 0.9795918367346939,
‘reg_lambda’: 0.08163265306, ‘colsample_bytree’: 0.6, ‘subsample’: 0.7929292929292929}

best_random_params = random_results.loc[0, 'params'].copy()
best_random_estimators = int(random_results.loc[0, 'estimators'])
best_random_model = lgb.LGBMClassifier(n_estimators=best_random_estimators, n_jobs=-1,
                                      objective='binary', **best_random_params, random_state=42)
best_random_model.fit(features, labels)
predictions = best_random_model.predict_proba(test_features)[:, 1]

print('The best model from random search scores {:.4f} on the test data.'.format(roc_auc_score(test_labels, predictions)))
print('This was achieved using {} search iterations.'.format(random_results.loc[0, 'iteration']))

The best model from random search scores 0.7179 on the test data.
This was achieved using 38 search iterations.

五. 贝叶斯优化

5.1 Objective Function

import csv
from hyperopt import STATUS_OK
from timeit import default_timer as timer

def objective(params, n_folds = N_folds):
    global ITERATION
    ITERATION += 1
    subsample = params['boosting_type'].get('subsample', 1.0)
    params['boosting_type'] = params['boosting_type']['boosting_type']
    params['subsample'] = subsample
    for parameter_name in ['num_leaves', 'subsample_for_bin', 'min_child_samples']:
        params[parameter_name] = int(params[parameter_name])
    start = timer()
    cv_results = lgb.cv(params, train_set, num_boost_round = 10000, nfold = n_folds,
                       early_stopping_rounds = 80, metrics = 'auc', seed = 42)
    run_time = timer() - start
    best_score = np.max(cv_results['auc-mean'])
    loss = 1 - best_score
    n_estimators = int(np.argmax(cv_results['auc-mean']) + 1)
    of_connection = open(out_file, 'a')
    writer = csv.writer(of_connection)
    writer.writerow([loss, params, ITERATION, n_estimators, run_time])
    return {'loss': loss, 'params': params, 'iteration': ITERATION,
           'estimators': n_estimators, 'train_time': run_time, 'status': STATUS_OK}

5.2 Domain Space

5.2.1 学习率分布

from hyperopt import hp
from hyperopt.pyll.stochastic import sample

learning_rate = {'learning_rate': hp.loguniform('learning_rate', np.log(0.005), np.log(0.2))}

learning_rate_dist = []
for _ in range(10000):
plt.figure(figsize = (8, 6))
sns.kdeplot(learning_rate_dist, color = 'r', linewidth = 2, shade = True)
plt.title('Learning Rate Distribution', size = 18)
plt.xlabel('Learning Rate', size = 16)
plt.ylabel('Density', size = 16)

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第8张图片

5.2.2 叶子数分布


num_leaves = {'num_leaves': hp.quniform('num_leaves', 30, 150, 1)}
num_leaves_dist = []
for _ in range(10000):
plt.figure(figsize = (8,6))
sns.kdeplot(num_leaves_dist, linewidth = 2, shade = True)
plt.title('Number of Leaves Distribution', size = 18); plt.xlabel('Number of Leaves', size = 16); plt.ylabel('Density', size = 16)

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第9张图片

5.2.3 boosting_type

boosting_type = {'boosting_type': hp.choice('boosting_type',
                                           [{'boosting_type': 'gbdt', 'subsample': hp.uniform('subsample', 0.5, 1)}, 
                                             {'boosting_type': 'dart', 'subsample': hp.uniform('subsample', 0.5, 1)},
                                             {'boosting_type': 'goss', 'subsample': 1.0}])}
params = sample(boosting_type)

{‘boosting_type’: {‘boosting_type’: ‘gbdt’, ‘subsample’: 0.659771523544347}}

subsample = params['boosting_type'].get('subsample', 1.0)

params['boosting_type'] = params['boosting_type']['boosting_type']
params['subsample'] = subsample

{‘boosting_type’: ‘gbdt’, ‘subsample’: 0.659771523544347}

5.2.4 参数分布汇总

space = {'class_weight': hp.choice('class_weight', [None, 'balanced']),
        'boosting_type': hp.choice('boosting_type', [{'boosting_type': 'gbdt', 'subsample': hp.uniform('gdbt_subsample', 0.5, 1)},
                                                    {'boosting_type': 'dart', 'subsample': hp.uniform('dart_subsample', 0.5, 1)},
                                                    {'boosting_type': 'goss', 'subsample': 1.0}]),
        'num_leaves': hp.quniform('num_leaves', 30, 150, 1),
        'learning_rate': hp.loguniform('learning_rate', np.log(0.01), np.log(0.2)),
        'subsample_for_bin': hp.quniform('subsample_for_bin', 20000, 300000, 20000),
         'min_child_samples': hp.quniform('min_child_samples', 20, 500, 5),
         'reg_alpha': hp.uniform('reg_alpha', 0.0, 1.0),
         'reg_lambda': hp.uniform('reg_lambda', 0.0, 1.0),
         'colsample_bytree': hp.uniform('colsample_by_tree', 0.6, 1.0)}
5.2.4.* 参数采样结果看一下
x = sample(space)
subsample = x['boosting_type'].get('subsample', 1.0)
x['boosting_type'] = x['boosting_type']['boosting_type']
x['subsample'] = subsample

{‘boosting_type’: ‘goss’,
‘class_weight’: ‘balanced’,
‘colsample_bytree’: 0.6765996025430209,
‘learning_rate’: 0.13232409656402305,
‘min_child_samples’: 330.0,
‘num_leaves’: 103.0,
‘reg_alpha’: 0.5849415659238283,
‘reg_lambda’: 0.4787001151843524,
‘subsample_for_bin’: 100000.0,
‘subsample’: 1.0}

5.3 准备贝叶斯优化

from hyperopt import tpe
tpe_algorithm = tpe.suggest

from hyperopt import Trials
bayes_trials = Trials()

# 可以将结果保存下来

out_file = 'gbm_trials.csv'
of_connection = open(out_file, 'w')
writer = csv.writer(of_connection)

writer.writerow(['loss', 'params', 'iteration', 'estimators', 'train_time'])

5.4 贝叶斯优化结果

from hyperopt import fmin

# Global variable


# Run optimization
best = fmin(fn = objective, space = space, algo = tpe.suggest, 
            max_evals = Max_evals, trials = bayes_trials, rstate = np.random.RandomState(42))

# Sort the trials with lowest loss (highest AUC) first
bayes_trials_results = sorted(bayes_trials.results, key = lambda x: x['loss'])

[{‘loss’: 0.23670902556787576,
‘params’: {‘boosting_type’: ‘dart’,
‘class_weight’: None,
‘colsample_bytree’: 0.6777142263201398,
‘learning_rate’: 0.10896162558676845,
‘min_child_samples’: 200,
‘num_leaves’: 50,
‘reg_alpha’: 0.75201502515923,
‘reg_lambda’: 0.2500317899561674,
‘subsample_for_bin’: 220000,
‘subsample’: 0.8299430626318801},
‘iteration’: 109,
‘estimators’: 39,
‘train_time’: 135.7437369420004,
‘status’: ‘ok’}]

5.4.1 保存结果

results = pd.read_csv('gbm_trials.csv')
results.sort_values('loss', ascending = True, inplace = True)
results.reset_index(inplace = True, drop = True)

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第10张图片

import ast
ast.literal_eval(results.loc[0, 'params'])
# 出于安全考虑,对字符串进行类型转换的时候,最好使用ast.literal_eval()函数, 而不是直接用eval()

{‘boosting_type’: ‘dart’,
‘class_weight’: None,
‘colsample_bytree’: 0.6777142263201398,
‘learning_rate’: 0.10896162558676845,
‘min_child_samples’: 200,
‘num_leaves’: 50,
‘reg_alpha’: 0.75201502515923,
‘reg_lambda’: 0.2500317899561674,
‘subsample_for_bin’: 220000,
‘subsample’: 0.8299430626318801}

5.4.2 测试集上的效果

best_bayes_estimators = int(results.loc[0, 'estimators'])
best_bayes_params = ast.literal_eval(results.loc[0, 'params']).copy()

best_bayes_model = lgb.LGBMClassifier(n_estimators=best_bayes_estimators, n_jobs=-1,
                                     objective='binary', **best_bayes_params, random_state=42)
best_bayes_model.fit(features, labels)

LGBMClassifier(boosting_type=‘dart’, class_weight=None,
colsample_bytree=0.6777142263201398, importance_type=‘split’,
learning_rate=0.10896162558676845, max_depth=-1,
min_child_samples=200, min_child_weight=0.001, min_split_gain=0.0,
n_estimators=39, n_jobs=-1, num_leaves=50, objective=‘binary’,
random_state=42, reg_alpha=0.75201502515923,
reg_lambda=0.2500317899561674, silent=True,
subsample=0.8299430626318801, subsample_for_bin=220000,

preds = best_bayes_model.predict_proba(test_features)[:, 1]
print('The best model from Bayes optimization scores {:.4f} AUC ROC on the test set.'.format(roc_auc_score(test_labels, preds)))
print('This was achieved after {} search iteration.'.format(results.loc[0, 'iteration']))

The best model from Bayes optimization scores 0.7275 AUC ROC on the test set.
This was achieved after 109 search iteration.

六. 随机VS贝叶斯 方法对比

best_random_params['method'] = 'random search'
best_bayes_params['method'] = 'Bayesian optimization'
best_params = pd.DataFrame(best_bayes_params, index = [0]).append(pd.DataFrame(best_random_params, index = [0]), ignore_index = True)


6.1 调参过程可视化展示

random_params = pd.DataFrame(columns = list(random_results.loc[0, 'params'].keys()),
                            index = list(range(len(random_results))))
for i, params in enumerate(random_results['params']):
    random_params.loc[i, :] = list(params.values())
random_params['loss'] = random_results['loss']
random_params['iteration'] = random_results['iteration']


bayes_params = pd.DataFrame(columns = list(ast.literal_eval(results.loc[0,'params']).keys()),
                           index = list(range(len(results))))
for i, params in enumerate(results['params']):
    bayes_params.loc[i, :] = list(ast.literal_eval(params).values())
bayes_params['loss'] = results['loss']
bayes_params['iteration'] = results['iteration']

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第11张图片

6.2 学习率对比

plt.figure(figsize = (20, 8))
plt.rcParams['font.size'] = 18

sns.kdeplot(learning_rate_dist, label = 'Sampling Distribution', linewidth = 2)
sns.kdeplot(random_params['learning_rate'], label = 'Random Search', linewidth = 2)
sns.kdeplot(bayes_params['learning_rate'], label = 'Bayes Optimization', linewidth=2)
plt.xlabel('Learning Rate')
plt.title('Learning Rate Distribution')

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第12张图片

6.3 Boosting Type 对比

fig, axs = plt.subplots(1, 2, sharey = True, sharex = True)

random_params['boosting_type'].value_counts().plot.bar(ax=axs[0], figsize=(14,6),
                                                      color='orange', title='Random Search Boosting Type')
bayes_params['boosting_type'].value_counts().plot.bar(ax=axs[1], figsize= (14,6),
                                                     color='green', title='Bayes Optimization Boosting Type')

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第13张图片

print('Random Search boosting type percentages:')
print(100 * random_params['boosting_type'].value_counts() / len(random_params))

print('Bayes Optimization boosting type percentages:')
print(100 * bayes_params['boosting_type'].value_counts() / len(bayes_params))

Random Search boosting type percentages:
dart 36.5
gbdt 33.0
goss 30.5
Name: boosting_type, dtype: float64

Bayes Optimization boosting type percentages:
dart 54.5
gbdt 29.0
goss 16.5
Name: boosting_type, dtype: float64

6.4 数值型参数 对比

for i, hyper in enumerate(random_params.columns):
    if hyper not in ['class_weight','boosting_type','iteration','subsample','metric','verbose']:
        plt.figure(figsize = (14, 6))
        if hyper != 'loss':
            sns.kdeplot([sample(space[hyper]) for _ in range(1000)], label = 'Sampling Distribution')
        sns.kdeplot(random_params[hyper], label = 'Random Search')
        sns.kdeplot(bayes_params[hyper], label = 'Bayes Optimization')
        plt.legend(loc = 1)
        plt.title('{} Distribution'.format(hyper))

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第14张图片实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第15张图片
实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第16张图片实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第17张图片
实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第18张图片实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第19张图片
实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第20张图片实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第21张图片

七. 贝叶斯优化参数变化情况

7.1 Boosting Type 变化

bayes_params['boosting_int'] = bayes_params['boosting_type'].replace({'gbdt':1,'goss':2,'dart':3})
plt.plot(bayes_params['iteration'], bayes_params['boosting_int'], 'ro')
plt.yticks([1, 2, 3], ['gdbt', 'goss', 'dart'])
plt.title('Boosting Type over Search')

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第22张图片

7.2 学习率&叶子数&… 变化

plt.figure(figsize = (14, 14))
colors = ['red', 'blue', 'orange', 'green']

for i, hyper in enumerate(['colsample_bytree', 'learning_rate', 'min_child_samples', 'num_leaves']):
    plt.subplot(2, 2, i+1)
    sns.regplot('iteration', hyper, data = bayes_params, color = colors[i])
    # plt.xlabel('Iteration')
    # plt.ylabel('{}'.format(hyper))
    plt.title('{} over Search'.format(hyper))

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第23张图片

7.3 reg_alpha, reg_lambda 变化

fig, axes = plt.subplots(1, 3, figsize = (18, 6))
for i, hyper in enumerate(['reg_alpha', 'reg_lambda', 'subsample_for_bin']):
    sns.regplot('iteration', hyper, data = bayes_params, ax = axes[i])
    axes[i].set(title = '{} over Search'.format(hyper))

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第24张图片

7.4 随机与贝叶斯优化损失变化的对比

scores = pd.DataFrame({'ROC AUC': 1 - random_params['loss'],
                       'iteration': random_params['iteration'],
                      'search': 'random'})
scores = scores.append(pd.DataFrame({'ROC AUC': 1 - bayes_params['loss'],
                                    'iteration': bayes_params['iteration'],
                                    'search': 'Bayes'}))
scores['ROC AUC'] = scores['ROC AUC'].astype(np.float32)
scores['iteration'] = scores['iteration'].astype(np.int32)

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第25张图片

plt.figure(figsize = (18, 6))

plt.subplot(1, 2, 1)
plt.hist(1 - random_results['loss'].astype(np.float32), label = 'Random Search', edgecolor = 'k')
plt.xlabel('Validation Roc Auc')
plt.title('Random Search Validation Scores')
plt.xlim(0.73, 0.765)

plt.subplot(1, 2, 2)
plt.hist(1 - bayes_params['loss'], label = 'Bayes Optimization', edgecolor = 'k')
plt.xlabel('Validation Roc Auc')
plt.title('Bayes Optimization Validation Scores')
plt.xlim(0.73, 0.765)

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第26张图片

sns.lmplot('iteration', 'ROC AUC', hue = 'search', data = scores, height = 8)
plt.ylabel('ROC AUC')
plt.title('ROC AUC versus Iteration')

实战: 对GBDT(lightGBM)分类任务进行贝叶斯优化, 并与随机方法对比_第27张图片

7.5 保存结果

import json
with open('trials.json', 'w') as f:
bayes_params.to_csv('bayes_params.csv', index = False)
random_params.to_csv('random_params.csv', index = False)
