本文章为天池比赛参赛记录,共涉及【数据的探索性分析(EDA)】、【数据的特征工程】、【建模与调参】、【模型结果融合】四个部分,本文为第二部分。
比赛链接:https://tianchi.aliyun.com/competition/entrance/231784/information
教程链接:https://tianchi.aliyun.com/notebook-ai/detail?spm=5176.12281978.0.0.6802593aM0zxin&postId=95460
除了本次教程提供的调参方案,还在网上学习了关于XGB的调参方法,链接:https://blog.csdn.net/csiao_Bing/article/details/84978725
实践过程:
## 模型调优
cv_params ={'n_estimators': [400, 500, 600, 700, 800]}
other_params = {'learning_rate': 0.1, 'n_estimators': 500, 'max_depth': 5, 'min_child_weight': 1,'seed': 0,'subsample': 0.8, 'colsample_bytree': 0.8, 'gamma': 0,'reg_alpha': 0,'reg_lambda': 1}
model = xgb.XGBRegressor(**other_params)
optimized_GBM = GridSearchCV(estimator=model, param_grid=cv_params, scoring='r2', cv=5, verbose=1, n_jobs=4)
optimized_GBM.fit(x_train, y_train)
evalute_result = optimized_GBM.cv_results_
print('每轮迭代运行结果:{0}'.format(evalute_result))
print('参数的最佳取值:{0}'.format(optimized_GBM.best_params_))
print('最佳模型得分:{0}'.format(optimized_GBM.best_score_))
Fitting 5 folds for each of 5 candidates, totalling 25 fits
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers. [Parallel(n_jobs=4)]: Done 25 out of 25 | elapsed: 11.2min finished
[23:15:04] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror. 每轮迭代运行结果:{'mean_fit_time': array([ 65.56447134, 83.75439568, 101.25140986, 114.86471357, 124.55886097]), 'std_fit_time': array([1.01607169, 1.23950185, 0.45145929, 1.25883752, 9.10093902]), 'mean_score_time': array([0.55379496, 0.6398819 , 0.75199609, 0.91474347, 0.97660236]), 'std_score_time': array([0.01408137, 0.04041043, 0.04504313, 0.06618081, 0.05422502]), 'param_n_estimators': masked_array(data=[400, 500, 600, 700, 800], mask=[False, False, False, False, False], fill_value='?', dtype=object), 'params': [{'n_estimators': 400}, {'n_estimators': 500}, {'n_estimators': 600}, {'n_estimators': 700}, {'n_estimators': 800}], 'split0_test_score': array([0.96922127, 0.96971105, 0.97017114, 0.97041995, 0.97062748]), 'split1_test_score': array([0.96982751, 0.97024591, 0.97065482, 0.97078297, 0.97096162]), 'split2_test_score': array([0.96761475, 0.96807066, 0.96837973, 0.96864978, 0.96889595]), 'split3_test_score': array([0.9643302 , 0.96491349, 0.96537955, 0.9657377 , 0.96591757]), 'split4_test_score': array([0.96746722, 0.96815598, 0.96852235, 0.96888821, 0.96906867]), 'mean_test_score': array([0.96769219, 0.96821942, 0.96862152, 0.96889572, 0.96909426]), 'std_test_score': array([0.00191088, 0.00185941, 0.0018501 , 0.0017843 , 0.00178713]), 'rank_test_score': array([5, 4, 3, 2, 1])} 参数的最佳取值:{'n_estimators': 800} 最佳模型得分:0.9690942590235421
## 模型调优
cv_params = {'max_depth': [3, 4, 5, 6, 7, 8, 9, 10], 'min_child_weight': [1, 2, 3, 4, 5, 6]}
other_params = {'learning_rate': 0.1, 'n_estimators': 800, 'max_depth': 5, 'min_child_weight': 1, 'seed': 0,\
'subsample': 0.8, 'colsample_bytree': 0.8, 'gamma': 0, 'reg_alpha': 0, 'reg_lambda': 1}
model = xgb.XGBRegressor(**other_params)
optimized_GBM = GridSearchCV(estimator=model, param_grid=cv_params, scoring='r2', cv=5, verbose=1, n_jobs=-1)
optimized_GBM.fit(x_train, y_train)
evalute_result = optimized_GBM.cv_results_
print('每轮迭代运行结果:{0}'.format(evalute_result))
print('参数的最佳取值:{0}'.format(optimized_GBM.best_params_))
print('最佳模型得分:{0}'.format(optimized_GBM.best_score_))
参数的最佳取值:{'max_depth': 6, 'min_child_weight': 2} 最佳模型得分:0.9694125922909762
## 模型调优--gamma
cv_params = {'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]}
other_params = {'learning_rate': 0.1, 'n_estimators': 800, 'max_depth': 6, 'min_child_weight': 2, 'seed': 0,\
'subsample': 0.8, 'colsample_bytree': 0.8, 'gamma': 0, 'reg_alpha': 0, 'reg_lambda': 1}
model = xgb.XGBRegressor(**other_params)
optimized_GBM = GridSearchCV(estimator=model, param_grid=cv_params, scoring='r2', cv=5, verbose=1, n_jobs=-1)
optimized_GBM.fit(x_train, y_train)
evalute_result = optimized_GBM.cv_results_
print('每轮迭代运行结果:{0}'.format(evalute_result))
print('参数的最佳取值:{0}'.format(optimized_GBM.best_params_))
print('最佳模型得分:{0}'.format(optimized_GBM.best_score_))
参数的最佳取值:{'gamma': 0.1} 最佳模型得分:0.9692708576389358