在机器学习中,分类和回归算法的 feature_importances_
属性用于衡量每个特征对模型预测的重要性。这个属性通常在基于树的算法中使用,通过 feature_importances_
属性,您可以了解哪些特征对模型的预测最为重要,从而可以进行特征选择或特征工程,以提高模型的性能和解释性。
class sklearn.tree.DecisionTreeClassifier(*, criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, class_weight=None, ccp_alpha=0.0, monotonic_cst=None)[source]
class sklearn.tree.DecisionTreeRegressor(*, criterion='squared_error', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, ccp_alpha=0.0, monotonic_cst=None)
class sklearn.tree.ExtraTreeClassifier(*, criterion='gini', splitter='random', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='sqrt', random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, class_weight=None, ccp_alpha=0.0, monotonic_cst=None)
property feature_importances_
返回标准化后的特征的重要性,又被称为基尼重要性,即通过计算每个特征在决策树的各个节点上对Gini不纯度的减少量来评估特征的重要性。
具体来说,Gini重要性计算步骤如下:
最终,Gini重要性值越高,表示该特征对分类结果的贡献越大。Gini重要性常用于特征选择和特征工程中,以识别和保留对模型性能影响较大的特征。
class sklearn.ensemble.RandomForestClassifier(n_estimators=100, *, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='sqrt', max_leaf_nodes=None, min_impurity_decrease=0.0, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, class_weight=None, ccp_alpha=0.0, max_samples=None, monotonic_cst=None)
class sklearn.ensemble.RandomForestRegressor(n_estimators=100, *, criterion='squared_error', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=1.0, max_leaf_nodes=None, min_impurity_decrease=0.0, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, ccp_alpha=0.0, max_samples=None, monotonic_cst=None)
class sklearn.ensemble.GradientBoostingClassifier(*, loss='log_loss', learning_rate=0.1, n_estimators=100, subsample=1.0, criterion='friedman_mse', min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, init=None, random_state=None, max_features=None, verbose=0, max_leaf_nodes=None, warm_start=False, validation_fraction=0.1, n_iter_no_change=None, tol=0.0001, ccp_alpha=0.0)
class sklearn.ensemble.GradientBoostingRegressor(*, loss='squared_error', learning_rate=0.1, n_estimators=100, subsample=1.0, criterion='friedman_mse', min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, init=None, random_state=None, max_features=None, alpha=0.9, verbose=0, max_leaf_nodes=None, warm_start=False, validation_fraction=0.1, n_iter_no_change=None, tol=0.0001, ccp_alpha=0.0)
property feature_importances_
返回总和为1的特征重要性,除非所有树只有一个根节点。
class sklearn.ensemble.AdaBoostClassifier(estimator=None, *, n_estimators=50, learning_rate=1.0, algorithm='deprecated', random_state=None)
property feature_importances_
返回值:
feature_importances_:一维数组 (n_features,),The feature importances.
class sklearn.ensemble.AdaBoostRegressor(estimator=None, *, n_estimators=50, learning_rate=1.0, loss='linear', random_state=None)
property feature_importances_
返回值:
feature_importances_:一维数组 (n_features,),The feature importances.
classxgboost.XGBClassifier(*, importance_type='total_gain', objective='binary:logistic', **kwargs)
参数:
属性:propertyfeature_importances_
classxgboost.XGBRegressor(*, importance_type='total_gain', objective='reg:squarederror', **kwargs)
参数与属性相关内容同上
classlightgbm.LGBMClassifier(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=100, subsample_for_bin=200000, objective=None, class_weight=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=0, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=None, importance_type='split', **kwargs)
参数:
属性:property feature_importances_
classlightgbm.LGBMRegressor(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=100, subsample_for_bin=200000, objective=None, class_weight=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=0, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=None, importance_type='split', **kwargs)
参数与属性相关内容同上
-评估特征的重要性,且模型已经过训练,即模型已调用fit(X, y)。
-由参数scoring指定评估度量方式;打乱特征列的值,打乱次数由参数n_repeats指定;评估特征值打乱前后的scoring度量结果的差异,即表示该特征的重要程度。
sklearn.inspection.permutation_importance(estimator, X, y, *, scoring=None, n_repeats=5, n_jobs=None, random_state=None, sample_weight=None, max_samples=1.0)
参数:
estimator
:已经训练好的模型X
:用于计算特征重要性的输入数据。y
:目标值。scoring
:评估指标,默认为 None,使用模型的默认评分方法。n_repeats
:打乱特征的次数,默认为 5。返回值:
importances_mean
:每个特征基于n_repeats的重要性均值。importances_std
:每个特征基于n_repeats的的重要性标准差。importances
:每次打乱后特征的重要性。为二维数组,表示每个特征每次打乱后的重要性。feature_importances_
是一些基于树的模型自带的一个属性,用于衡量每个特征对模型预测结果的重要性。它通常通过计算每个特征在树结构中的分裂点上所带来的信息增益来确定特征的重要性。permutation_importance
是一种模型无关的方法,通过打乱特征的值来评估特征的重要性。具体来说,它通过打乱某个特征的值并观察模型性能的变化来确定该特征的重要性。如果打乱某个特征后模型性能显著下降,则说明该特征对模型非常重要。魔心依赖性 | 计算方式 | 解释性 | |
---|---|---|---|
feature_importances_ |
仅适用于一些特定的模型(如基于树的模型) | 基于模型内部的机制(如信息增益) | 结果依赖于模型的具体实现,可能不容易解释 |
permutation_importance |
方法是模型无关的,可以用于任何模型 | 通过打乱特征值并观察模型性能变化来计算 | 结果更直观,因为它直接反映了特征对模型性能的影响 |