全部代码示例请参考:
https://github.com/Alex2Yang97/Learning_tutorials/tree/main/InterpretML
InterpretML 是一个为实践者和研究者提供机器学习可解释性算法的开源 Python 软件包。InterpretML 能提供两种类型的可解释性:(1)明箱(glassbox),这是针对可解释性设计的机器学习模型(比如线性模型、规则列表、广义加性模型);(2)**黑箱(blackbox)可解释技术,用于解释已有的系统(比如部分依赖、LIME、SHAP)。这个软件包可让实践者通过在一个统一的 API 下,借助内置的可扩展可视化平台,使用多种方法来轻松地比较可解释性算法。InterpretML 也包含了可解释 Boosting 机(Explainable Boosting Machine)**的首个实现,这是一种强大的可解释明箱模型,可以做到与许多黑箱模型同等准确。
import pandas as pd
from sklearn.model_selection import train_test_split
df = pd.read_csv(
"https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
header=None)
df.columns = [
"Age", "WorkClass", "fnlwgt", "Education", "EducationNum",
"MaritalStatus", "Occupation", "Relationship", "Race", "Gender",
"CapitalGain", "CapitalLoss", "HoursPerWeek", "NativeCountry", "Income"
]
train_cols = df.columns[0:-1]
label = df.columns[-1]
X = df[train_cols]
y = df[label]
seed = 1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)
from interpret.glassbox import ExplainableBoostingClassifier
ebm = ExplainableBoostingClassifier(random_state=seed)
ebm.fit(X_train, y_train)
全局解释 Global explanation
from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())
from interpret import show
ebm_global = ebm.explain_global()
show(ebm_global)
局部解释 Local explanation
ebm_local = ebm.explain_local(X_test[:5], y_test[:5])
show(ebm_local)
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
# We have to transform categorical variables to use sklearn models
X_enc = pd.get_dummies(X, prefix_sep='.')
feature_names = list(X_enc.columns)
y = df[label].apply(lambda x: 0 if x == " <=50K" else 1) # Turning response into 0 and 1
X_train, X_test, y_train, y_test = train_test_split(X_enc, y, test_size=0.20, random_state=seed)
#Blackbox system can include preprocessing, not just a classifier!
pca = PCA()
rf = RandomForestClassifier(n_estimators=100, n_jobs=-1)
blackbox_model = Pipeline([('pca', pca), ('rf', rf)])
blackbox_model.fit(X_train, y_train)
from interpret.blackbox import LimeTabular
from interpret import show
lime = LimeTabular(predict_fn=blackbox_model.predict_proba, data=X_train, random_state=seed)
lime_local = lime.explain_local(X_test[:5], y_test[:5])
show(lime_local)
>> X_test.iloc[0]["CapitalGain"]
>> 0
LIME可以得到每一个样本的解释,每个特征后面是对应的特征值
关于LIME模型讲解:https://blog.csdn.net/qq_41103204/article/details/125801073