如何在shap里添加Adaboost的支持

项目场景:

项目场景:我正在运行 2个不同的模型(梯度提升、Ada Boost)并想用SHAP分析特征与标签值的关系。

问题描述

我设法将 SHAP 用于 GB ,但不适用于 Ada,但出现以下错误:

Exception: Model type not yet supported by TreeExplainer: <class 'sklearn.ensemble._weight_boosting.AdaBoostClassifier'>

原因分析:

开发者没有计划在shap中提供Adabost模型的支持,但在
https://github.com/slundberg/shap/issues/335
给出了回答,只需要像其他randomforest ,Gradient Boosting写一段同一的elif语句即可解决。


解决方案:

具体解决方案:
找到anaconda3\Lib\site-packages\shap\explainers 或者相应的路径下的_tree.py文件
找到与下面代码相似的部分

### Added AdaBoostClassifier based on the outdated StackOverflow response and Github issue here
### https://stackoverflow.com/questions/60433389/how-to-calculate-shap-values-for-adaboost-model/61108156#61108156
### https://github.com/slundberg/shap/issues/335
elif safe_isinstance(model, ["sklearn.ensemble.AdaBoostClassifier", "sklearn.ensemble._weighted_boosting.AdaBoostClassifier"]):
    assert hasattr(model, "estimators_"), "Model has no `estimators_`! Have you called `model.fit`?"
    self.internal_dtype = model.estimators_[0].tree_.value.dtype.type
    self.input_dtype = np.float32
    scaling = 1.0 / len(model.estimators_) # output is average of trees
    self.trees = [Tree(e.tree_, normalize=True, scaling=scaling) for e in model.estimators_]
    self.objective = objective_name_map.get(model.base_estimator_.criterion, None) #This line is done to get the decision criteria, for example gini.
    self.tree_output = "probability" #This is the last line added

我的部分是这样写的:

elif safe_isinstance(model, ["sklearn.ensemble.RandomForestClassifier", "sklearn.ensemble.forest.RandomForestClassifier"]):
            assert hasattr(model, "estimators_"), "Model has no `estimators_`! Have you called `model.fit`?"
            self.internal_dtype = model.estimators_[0].tree_.value.dtype.type
            self.input_dtype = np.float32
            scaling = 1.0 / len(model.estimators_) # output is average of trees
            self.trees = [SingleTree(e.tree_, normalize=True, scaling=scaling, data=data, data_missing=data_missing) for e in model.estimators_]
            self.objective = objective_name_map.get(model.criterion, None)
            self.tree_output = "probability"
        elif safe_isinstance(model, ["sklearn.ensemble.AdaBoostClassifier", "sklearn.ensemble._weighted_boosting.AdaBoostClassifier"]):
            assert hasattr(model, "estimators_"), "Model has no `estimators_`! Have you called `model.fit`?"
            self.internal_dtype = model.estimators_[0].tree_.value.dtype.type
            self.input_dtype = np.float32
            scaling = 1.0 / len(model.estimators_) # output is average of trees
            self.trees = [SingleTree(e.tree_, normalize=True, scaling=scaling, data=data, data_missing=data_missing) for e in model.estimators_]
            self.objective = objective_name_map.get(model.base_estimator_.criterion, None) #This line is done to get the decision criteria, for example gini.
            self.tree_output = "probability" #This is the last line added
            
        elif safe_isinstance(model, ["sklearn.ensemble.ExtraTreesClassifier", "sklearn.ensemble.forest.ExtraTreesClassifier"]): # TODO: add unit test for this case
            assert hasattr(model, "estimators_"), "Model has no `estimators_`! Have you called `model.fit`?"
            self.internal_dtype = model.estimators_[0].tree_.value.dtype.type
            self.input_dtype = np.float32
            scaling = 1.0 / len(model.estimators_) # output is average of trees
            self.trees = [SingleTree(e.tree_, normalize=True, scaling=scaling, data=data, data_missing=data_missing) for e in model.estimators_]
            self.objective = objective_name_map.get(model.criterion, None)
            self.tree_output = "probability"

参考了stackflow的回答,但是并不是完全一样
https://stackoverflow.com/questions/60433389/how-to-calculate-shap-values-for-adaboost-model


解决后可以成功输出Adaboost的shap summary plot
如何在shap里添加Adaboost的支持_第1张图片

你可能感兴趣的:(机器学习,sklearn,python,人工智能)