xgboost原理推导:
文档参考一
视频参考一
视频参考二
官网文档:
XGB
LGBM
目标函数中的损失函数选择:
xgb与lgbm目标函数中的损失函数参数均为objective
xgb(具体参考官方文档):
分类默认参数:objective='binary:logistic'
回归默认参数:objective='reg:squarederror'
lgbm(具体参考官方文档):
分类默认参数:‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker.
回归默认参数: ‘regression’ for LGBMRegressor–mse
xgboost的特征重要性函数:feature_importances_:
feature_importances_四种参数选怎,gain为xgboost默认选项(注意:lightGBM模型为weight):
‘weight’: the number of times a feature is used to split the data across all trees.
‘gain’: the average gain across all splits the feature is used in.
‘cover’: the average coverage across all splits the feature is used in.
‘total_gain’: the total gain across all splits the feature is used in.
‘total_cover’: the total coverage across all splits the feature is used in.
下面代码可以看出xgb的特征重要性默认为gain,并且已经做了归一化(注:lightGBM模型没有做归一化,需要手动进行)
model_1 = xgb.XGBClassifier(n_estimators=1000, learning_rate = 0.05, verbose = -1)
model_2 = xgb.XGBClassifier(n_estimators=1000, learning_rate = 0.05, verbose = -1,importance_type='weight')
model_3 = xgb.XGBClassifier(n_estimators=1000, learning_rate = 0.05, verbose = -1,importance_type='total_gain')
model_4 = xgb.XGBClassifier(n_estimators=1000, learning_rate = 0.05, verbose = -1,importance_type='cover')
model_5 = xgb.XGBClassifier(n_estimators=1000, learning_rate = 0.05, verbose = -1,importance_type='total_cover')
model_6 = xgb.XGBClassifier(n_estimators=1000, learning_rate = 0.05, verbose = -1,importance_type='gain')
model_1.feature_importances_
array([0.11783394, 0.09764165, 0.14409864, 0.08691421, 0.15636262,
0. , 0.11602671, 0. , 0.15029922, 0.13082299],
dtype=float32)
model_2.feature_importances_
array([0.166699 , 0.1880458 , 0.06656317, 0.1682515 , 0.01106152,
0. , 0.0795653 , 0. , 0.14806908, 0.17174461],
dtype=float32)
model_3.feature_importances_
array([0.16660118, 0.1557304 , 0.08135206, 0.12402933, 0.01466974,
0. , 0.07829903, 0. , 0.18875381, 0.19056444],
dtype=float32)
model_4.feature_importances_
array([0.10467763, 0.08871602, 0.16086729, 0.08691181, 0.24720725,
0. , 0.10529327, 0. , 0.11173051, 0.09459624],
dtype=float32)
model_5.feature_importances_
array([0.1688149 , 0.16139482, 0.10359184, 0.1414691 , 0.02645452,
0. , 0.08104911, 0. , 0.16005163, 0.15717407],
dtype=float32)
model_6.feature_importances_
array([0.11783394, 0.09764165, 0.14409864, 0.08691421, 0.15636262,
0. , 0.11602671, 0. , 0.15029922, 0.13082299],
dtype=float32)
np.sum(model_6.feature_importances_)
1.0