二手车交易价格预测-【建模与调参】

本文章为天池比赛参赛记录,共涉及【数据的探索性分析(EDA)】、【数据的特征工程】、【建模与调参】、【模型结果融合】四个部分,本文为第二部分。

比赛链接:https://tianchi.aliyun.com/competition/entrance/231784/information

教程链接:https://tianchi.aliyun.com/notebook-ai/detail?spm=5176.12281978.0.0.6802593aM0zxin&postId=95460

一、学习笔记

  1. 线性回归模型:
    • 线性回归对于特征的要求;
    • 处理长尾分布;
    • 理解线性回归模型;
  2. 模型性能验证:
    • 评价函数与目标函数;
    • 交叉验证方法;
    • 留一验证方法;
    • 针对时间序列问题的验证;
    • 绘制学习率曲线;
    • 绘制验证曲线;
  3. 嵌入式特征选择:
    • Lasso回归;
    • Ridge回归;
    • 决策树;
  4. 模型对比:
    • 常用线性模型;
    • 常用非线性模型;
  5. 模型调参:
    • 贪心调参方法;
    • 网格调参方法;
    • 贝叶斯调参方法;

除了本次教程提供的调参方案,还在网上学习了关于XGB的调参方法,链接:https://blog.csdn.net/csiao_Bing/article/details/84978725

实践过程:

## 模型调优

cv_params ={'n_estimators': [400, 500, 600, 700, 800]}
other_params = {'learning_rate': 0.1, 'n_estimators': 500, 'max_depth': 5, 'min_child_weight': 1,'seed': 0,'subsample': 0.8, 'colsample_bytree': 0.8, 'gamma': 0,'reg_alpha': 0,'reg_lambda': 1}

model = xgb.XGBRegressor(**other_params)
optimized_GBM = GridSearchCV(estimator=model, param_grid=cv_params, scoring='r2', cv=5, verbose=1, n_jobs=4)
optimized_GBM.fit(x_train, y_train)
evalute_result = optimized_GBM.cv_results_
print('每轮迭代运行结果:{0}'.format(evalute_result))
print('参数的最佳取值:{0}'.format(optimized_GBM.best_params_))
print('最佳模型得分:{0}'.format(optimized_GBM.best_score_))
Fitting 5 folds for each of 5 candidates, totalling 25 fits
[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  25 out of  25 | elapsed: 11.2min finished
[23:15:04] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
每轮迭代运行结果:{'mean_fit_time': array([ 65.56447134,  83.75439568, 101.25140986, 114.86471357,
       124.55886097]), 'std_fit_time': array([1.01607169, 1.23950185, 0.45145929, 1.25883752, 9.10093902]), 'mean_score_time': array([0.55379496, 0.6398819 , 0.75199609, 0.91474347, 0.97660236]), 'std_score_time': array([0.01408137, 0.04041043, 0.04504313, 0.06618081, 0.05422502]), 'param_n_estimators': masked_array(data=[400, 500, 600, 700, 800],
             mask=[False, False, False, False, False],
       fill_value='?',
            dtype=object), 'params': [{'n_estimators': 400}, {'n_estimators': 500}, {'n_estimators': 600}, {'n_estimators': 700}, {'n_estimators': 800}], 'split0_test_score': array([0.96922127, 0.96971105, 0.97017114, 0.97041995, 0.97062748]), 'split1_test_score': array([0.96982751, 0.97024591, 0.97065482, 0.97078297, 0.97096162]), 'split2_test_score': array([0.96761475, 0.96807066, 0.96837973, 0.96864978, 0.96889595]), 'split3_test_score': array([0.9643302 , 0.96491349, 0.96537955, 0.9657377 , 0.96591757]), 'split4_test_score': array([0.96746722, 0.96815598, 0.96852235, 0.96888821, 0.96906867]), 'mean_test_score': array([0.96769219, 0.96821942, 0.96862152, 0.96889572, 0.96909426]), 'std_test_score': array([0.00191088, 0.00185941, 0.0018501 , 0.0017843 , 0.00178713]), 'rank_test_score': array([5, 4, 3, 2, 1])}
参数的最佳取值:{'n_estimators': 800}
最佳模型得分:0.9690942590235421
## 模型调优

cv_params = {'max_depth': [3, 4, 5, 6, 7, 8, 9, 10], 'min_child_weight': [1, 2, 3, 4, 5, 6]}
other_params = {'learning_rate': 0.1, 'n_estimators': 800, 'max_depth': 5, 'min_child_weight': 1, 'seed': 0,\
                'subsample': 0.8, 'colsample_bytree': 0.8, 'gamma': 0, 'reg_alpha': 0, 'reg_lambda': 1}

model = xgb.XGBRegressor(**other_params)
optimized_GBM = GridSearchCV(estimator=model, param_grid=cv_params, scoring='r2', cv=5, verbose=1, n_jobs=-1)
optimized_GBM.fit(x_train, y_train)
evalute_result = optimized_GBM.cv_results_
print('每轮迭代运行结果:{0}'.format(evalute_result))
print('参数的最佳取值:{0}'.format(optimized_GBM.best_params_))
print('最佳模型得分:{0}'.format(optimized_GBM.best_score_))
参数的最佳取值:{'max_depth': 6, 'min_child_weight': 2}
最佳模型得分:0.9694125922909762
## 模型调优--gamma

cv_params = {'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]}
other_params = {'learning_rate': 0.1, 'n_estimators': 800, 'max_depth': 6, 'min_child_weight': 2, 'seed': 0,\
                'subsample': 0.8, 'colsample_bytree': 0.8, 'gamma': 0, 'reg_alpha': 0, 'reg_lambda': 1}

model = xgb.XGBRegressor(**other_params)
optimized_GBM = GridSearchCV(estimator=model, param_grid=cv_params, scoring='r2', cv=5, verbose=1, n_jobs=-1)
optimized_GBM.fit(x_train, y_train)
evalute_result = optimized_GBM.cv_results_
print('每轮迭代运行结果:{0}'.format(evalute_result))
print('参数的最佳取值:{0}'.format(optimized_GBM.best_params_))
print('最佳模型得分:{0}'.format(optimized_GBM.best_score_))
参数的最佳取值:{'gamma': 0.1}
最佳模型得分:0.9692708576389358

你可能感兴趣的:(二手车交易价格预测-【建模与调参】)