Interpretable Models - Decision Tree

这本书以CART树为例

一、CART树分裂节点的过程

1、对于回归问题,最小化y的方差来决定分裂点。The variance tells us how much the y values in a node are spread around their mean value

2、对于分类问题,最小化y的GINI系数,The Gini index tells us how “impure” a node is, e.g. if all classes have the same frequency, the node is impure, if only one class is present, it is maximally pure。

对于连续数值型变量是分裂点,对类别变量是尝试单个特征下类别的组合

二、Interpretation

2.1 Feature importance

在决策树中,遍历一个特征所有分裂点,计算它比父结点降低了多少方差或者GINI系数,所有特征的importance之和为100。这意味着每个特征的importance可以表示为整个模型importance的百分比。

2.2 Tree decomposition树分解

2.3 优缺点

优点:可以处理捕捉特征的交叉关系、便于可视化、解释性好等。

There is no need to transform features. In linear models, it is sometimes necessary to take the logarithm of a feature. A decision tree works equally well with any monotonic transformation of a feature.

缺点:

不支持线性、缺少平滑性(Slight changes in the input feature can have a big impact on the predicted outcome, which is usually not desirable.因为对分裂点敏感)、不稳定

你可能感兴趣的:(模型的可解释性)