特征工程建模可解释包(note)

Permutation Importance

一般情况下,使用集成算法去看特征重要性比较好。

关注某一个特征,计算其permutation importance:

  • 训练好当前模型

  • 考虑特征A对模型结果的影响。将特征A打乱顺序,比较模型结果,误差是否变得更大。如果误差改变不大,说明该特征不重要,如果误差改变大,则重要。

  • 工具包 eli5 https://eli5.readthedocs.io/en/latest/tutorials/xgboost-titanic.html#explaining-weights

建立模型

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
data = pd.read_csv('FIFA 2018 Statistics.csv')
y = (data['Man of the Match'] == "Yes")  # 转换标签
feature_names = [i for i in data.columns if data[i].dtype in [np.int64]]
X = data[feature_names]
X.head()
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
my_model = RandomForestClassifier(random_state=0).fit(train_X, train_y)

展示特征重要性

import eli5 #pip install eli5
from eli5.sklearn import PermutationImportance

perm = PermutationImportance(my_model, random_state=1).fit(val_X, val_y)
eli5.show_weights(perm, feature_names = val_X.columns.tolist())

特征工程建模可解释包(note)_第1张图片

你可能感兴趣的:(数据挖掘实战,机器学习,python)