手撕随机森林的超参数

随机森林有两大随机特点:

1、Random sampling of training data points when building trees
2、Random subsets of features considered when splitting nodes

from sklearn.ensemble import RandomForestRegressor

# 默认参数
model = RandomForestRegressor(n_estimators=10, 
                              criterion="mse", 
                              max_depth=None, 
                              min_samples_split=2, 
                              min_samples_leaf=1, 
                              min_weight_fraction_leaf=0., 
                              max_features="auto", 
                              max_leaf_nodes=None, 
                              min_impurity_decrease=0., 
                              min_impurity_split=None, 
                              bootstrap=True, 
                              oob_score=False, 
                              n_jobs=1, 
                              random_state=None, 
                              verbose=0, 
                              warm_start=False)

随机森林里使用的决策树类型是CART。

n_estimators
n_estimators=10, 决策树的数量

criterion
criterion="mse", string, optional,可以是mse:mean squared error,可以是mae:mean absolute error.

max_depth=None,
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.,

max_features
max_features="auto", int, float, string or None, optional,每次节点分割时考虑的特征的数量
int = n
float = int(max_features * n_features)
“auto” = n_features
“sqrt” = sqrt(n_features)
“log2” = log2(n_features)
None = n_features

max_leaf_nodes=None,
min_impurity_decrease=0.,
min_impurity_split=None,

bootstrap
bootstrap=True, boolean, optional,设为True时,每棵树用来训练的数据集都是通过有放回抽样得到的,通常抽样到跟原数据集同样大小;设为False时,则拟合每棵树都是用的所有数据。

oob_score
oob_score=False, bool, optional,设为True时,可以通过model.oob_score_得到袋外(Out-of-Bag)数据的得分。当bootstrap=True,即每棵树使用的数据都是通过有放回抽样得到,我们拟合第k颗树的时候,使用的数据集相比全量数据集肯定会有一部分没有包括,这些就叫做袋外数据。当bootstrap=False时,使用model.oob_score_会报错。

n_jobs=1,
random_state=None,
verbose=0,
warm_start=False

参考文献:
[1] An Implementation and Explanation of the Random Forest in Python

你可能感兴趣的:(手撕随机森林的超参数)