随机森林_处理不均衡数据

随机森林_处理不均衡数据

balanced 加上balanced 参数

# 处理不均衡的数据
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets
from sklearn.feature_selection import SelectFromModel
​
iris = datasets.load_iris()
features = iris.data
target = iris.target
# 删除前40个
features = features[40:, :]
target = target[40:]
# 二值化
target = np.where((target == 0), 0, 1)
# balanced   加上balanced 参数
randomforest = RandomForestClassifier(random_state=0, n_jobs=-1, class_weight="balanced")
# 训练模型  可设置权重值
model = randomforest.fit(features, target)
Discussion
A useful argument is balanced, wherein classes are automatically weighted inversely proptional to how frequently they appear in the data:
wj=nknj
wj=nknj
 
where  wjwj  is the weight to class j, n is the number of observations,  njnj  is the number of observations in class j, and k is the total number of classes.

你可能感兴趣的:(机器学习案例,算法)