众所周知,不平衡回归相比于不平衡分类是一个很少被关注的话题. 因需要,笔者整理一些用于处理imbalanced regression的data level方法.
原始论文:
Branco, P., Torgo, L., Ribeiro, R. (2017). SMOGN: A Pre-Processing Approach for Imbalanced Regression. Proceedings of Machine Learning Research, 74:36-50. http://proceedings.mlr.press/v74/branco17a/branco17a.pdf.
该方法的官方实现是基于R语言, 该方法目前已经被收录进Python包(smogn)中, 可通过如下命令安装使用,
pip install smogn
项目地址见:https://github.com/nickkunz/smogn
原始论文:
Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of artificial intelligence research, 2002, 16: 321-357. https://www.jair.org/index.php/jair/article/download/10302/24590
SMOTE及其各种变体的实现大集合见项目:https://github.com/analyticalmindsltd/smote_variants
SMOTE用于Regression的应用论文:
论文: Data Augmentation for Imbalanced Regression, AISTATS 2023.
代码链接: https://github.com/sstocksieker/DAIR.
论文: REBAGG: REsampled BAGGing for Imbalanced Regression, LIDTA 2018.
基本思路: 结合了集成学习Bagging
Thesis, Re-sampling Approaches for Regression Tasks under Imbalanced Domains, 2014.
原始论文:
Branco P. ImbalancedLearningRegression-A Python Package to Tackle the Imbalanced Regression Problem[J]. 2022.https://2022.ecmlpkdd.org/wp-content/uploads/2022/09/sub_1456.pdf
该方法已经被收录进Python包 (ImbalancedLearningRegression)中,可通过如下命令安装使用,
pip install ImbalancedLearningRegression
官方项目地址:https://github.com/paobranco/ImbalancedLearningRegression.
虽然不多,应该还有,后面再补充…
上面提到的这些基本上都是应用到人工 构造特征的数据集上, 如何将其应用到端到端的深度学习方法中值得进一步研究,
此方方面的研究工作见:
Dablain D, Krawczyk B, Chawla N V. DeepSMOTE: Fusing deep learning and SMOTE for imbalanced data[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9694621, 发表于顶刊IEEE TNNLS, 膜拜.
后续再补充