随机森林有两大随机特点:
1、Random sampling of training data points when building trees
2、Random subsets of features considered when splitting nodes
from sklearn.ensemble import RandomForestRegressor
# 默认参数
model = RandomForestRegressor(n_estimators=10,
criterion="mse",
max_depth=None,
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.,
max_features="auto",
max_leaf_nodes=None,
min_impurity_decrease=0.,
min_impurity_split=None,
bootstrap=True,
oob_score=False,
n_jobs=1,
random_state=None,
verbose=0,
warm_start=False)
随机森林里使用的决策树类型是CART。
n_estimators
n_estimators=10
, 决策树的数量
criterion
criterion="mse"
, string, optional,可以是mse:mean squared error,可以是mae:mean absolute error.
max_depth=None,
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.,
max_features
max_features="auto"
, int, float, string or None, optional,每次节点分割时考虑的特征的数量
int = n
float = int(max_features * n_features)
“auto” = n_features
“sqrt” = sqrt(n_features)
“log2” = log2(n_features)
None = n_features
max_leaf_nodes=None,
min_impurity_decrease=0.,
min_impurity_split=None,
bootstrap
bootstrap=True
, boolean, optional,设为True时,每棵树用来训练的数据集都是通过有放回抽样得到的,通常抽样到跟原数据集同样大小;设为False时,则拟合每棵树都是用的所有数据。
oob_score
oob_score=False
, bool, optional,设为True时,可以通过model.oob_score_得到袋外(Out-of-Bag)数据的得分。当bootstrap=True,即每棵树使用的数据都是通过有放回抽样得到,我们拟合第k颗树的时候,使用的数据集相比全量数据集肯定会有一部分没有包括,这些就叫做袋外数据。当bootstrap=False时,使用model.oob_score_会报错。
n_jobs=1,
random_state=None,
verbose=0,
warm_start=False
参考文献:
[1] An Implementation and Explanation of the Random Forest in Python