房屋销售价格回归预测的项目有很多人公开了其Kernel, 其中Serigne的“Stacked Regressions to predict House Prices”为多数人所阅读。读者可以在Kaggle网站上直接浏览。本文做了一些总结,把主要的流程步骤列表如下,读者可以厘清思路。
Stacked Regressions to predict House Prices. 0
Data Processing. 5
Outliers. 5
Note : 5
Target Variable¶. 5
Log-transformation of the target variable. 7
Features engineering. 8
Missing Data. 8
Data Correlation. 8
Imputing missing values. 8
More features engeneering¶. 9
Transforming
some numerical variables that are really categorical 9
Label Encoding some categorical variables
that may contain information in their ordering set 9
Adding one
more important feature. 9
Skewed
features. 9
Getting
dummy categorical features Getting the new train and test sets. 10
Modelling. 10
Import librairies. 10
Define a cross validation strategy. 10
Base models 10
StackedRegressions to predict House Prices. 0
Data Processing. 5
Outliers. 5
Note : 5
Target Variable¶. 5
Log-transformation of the target variable. 7
Features engineering. 8
Missing Data. 8
Data Correlation. 8
Imputing missing values. 8
More features engeneering . 9
Transforming some numerical variables that are really categorical 9
Label Encoding some
categorical variables that may contain information in their ordering set 9
Adding one more important feature. 9
Skewed features. 9
Getting dummy categorical features Getting the new train and test sets. 10
Modelling. 10
Import librairies. 10
Define a cross validation strategy. 10
Base models. 10
LASSO Regression : 10
Elastic Net Regression : 11
Kernel Ridge Regression : 11
Gradient Boosting Regression : 11
XGBoost 11
· LightGBM.. 11
Base models scores. 11
Stacking models. 11
Simplest Stacking approach : Averaging base models. 11
Averaged base models class. 11
Averaged base models score. 11
Less simple Stacking : Adding a
Meta-model 12
Stacking averaged Models Class. 13
Stacking Averaged models Score. 13
Ensembling StackedRegressor, XGBoost and LightGBM.. 13
Final Training and Prediction. 13
Stacked Regressor: 13
XGBoost: 14
Ensemble prediction: 15
Submission. 15
Comments. 16
Leader Board Ranking: 17
RMSLE score on train data: 0.07658856703780222 18