类别不均衡(目标变量)

查看数据比例

from collections import Counter
# 查看所生成的样本类别分布,0和1样本比例9比1,属于类别不平衡数据
y.value_counts().plot(kind='pie')
print(Counter(y))
# Counter({0: 900, 1: 100})

SMOTE 过采样

# 过采样
import imblearn
from imblearn.over_sampling import SMOTE

sm = SMOTE(random_state=42)
X_train,y_train = sm.fit_sample(X_train,y_train)
y_train.value_counts().plot(kind='pie')

plt.show()

出现问题,参考https://stackoverflow.com/questions/57456475/using-smote-with-nan-values,改为如下代码:

# 过采样
import imblearn
from imblearn.over_sampling import SMOTE

from sklearn.impute import SimpleImputer
from imblearn.over_sampling import SMOTE
from imblearn.pipeline import make_pipeline
from imblearn.combine import SMOTEENN

smote = SMOTE(k_neighbors=5, n_jobs=-1)
smote_enn = make_pipeline(SimpleImputer(), SMOTEENN(smote=smote))
X_t, y_t= smote_enn.fit_resample(X_train, y_train)
# y_t.value_counts().plot(kind='pie')
print(Counter(y_smo))

你可能感兴趣的:(数据挖掘建模,算法实现篇(python),数据挖掘资源整理篇,数据分析与机器学习实战)