最近工作中,Leader让了解一下Focal Loss,尝试解决信贷场景下样本天然不平衡的问题,于是就开始吭哧吭哧的查资料。
起源:
于是作者就提出了一个牛逼哄哄的办法,使用Focal Loss这种损失函数,来尝试解决这一问题!
针对样本不平衡的情况下,使用Focal Loss作为损失函数,加强对于hard example的训练!从而一定程度上解决样本不平衡问题!
Focal Loss核心思想是:整体缩放Loss,易分类样本缩放的比难分类样本更多,从而损失函数中就凸显了难分类样本的权重,使得模型在训练时更专注于难分类的样本。
具体来看下Focal Loss的原理,我们对比的是常见的交叉熵损失函数-binary loss。
那么Focal Loss改进的直观想法是如何的呢?上面binary loss最终每个样本的权重都是一致的,我们能不能设计一个系数,让易分类样本权重降低,难分类样本权重提高呢?完全可以!
Focal Loss的定义见下图:
我们来尝试做一个分解,现在这么看有点绕。
暂定先取α=0.25,γ=2。这两个参数需要根据具体的数据来进行参数调整。
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from LGB_Model_FL import * # 专属的脚本文件LGB_Model_FL.py
import lightgbm
from Model_Analysis import * # 专属的脚本文件Model_Analysis.py
from IV_Cal import * # 专属的脚本文件 IV_Cal.py
import os
from datetime import timedelta
from datetime import datetime
from dateutil.relativedelta import relativedelta
from scipy.misc import derivative
import warnings
warnings.filterwarnings('ignore')
# focal loss 损失函数
def focal_loss_lgb_sk(y_true, y_pred, alpha, gamma):
"""
Focal Loss for lightgbm
Parameters:
-----------
y_pred: numpy.ndarray
array with the predictions
dtrain: lightgbm.Dataset
alpha, gamma: float
See original paper https://arxiv.org/pdf/1708.02002.pdf
"""
a,g = alpha, gamma
def fl(x,t):
p = 1/(1+np.exp(-x))
return -( a*t + (1-a)*(1-t) ) * (( 1 - ( t*p + (1-t)*(1-p)) )**g) * ( t*np.log(p)+(1-t)*np.log(1-p) )
partial_fl = lambda x: fl(x, y_true)
grad = derivative(partial_fl, y_pred, n=1, dx=1e-6)
hess = derivative(partial_fl, y_pred, n=2, dx=1e-6)
return grad, hess
# focal loss 对应的评估函数metric
def focal_loss_lgb_eval_error_sk(y_true, y_pred, alpha, gamma):
"""
Adapation of the Focal Loss for lightgbm to be used as evaluation loss
Parameters:
-----------
y_pred: numpy.ndarray
array with the predictions
dtrain: lightgbm.Dataset
alpha, gamma: float
See original paper https://arxiv.org/pdf/1708.02002.pdf
"""
a,g = alpha, gamma
p = 1/(1+np.exp(-y_pred))
loss = -( a*y_true + (1-a)*(1-y_true) ) * (( 1 - ( y_true*p + (1-y_true)*(1-p)) )**g) * ( y_true*np.log(p)+(1-y_true)*np.log(1-p) )
return 'focal_loss', np.mean(loss), False
def sigmoid(x):
return 1/(1+np.exp(-x))
df = pd.read_csv('telecom_churn.csv')
df['churn'] = df['churn'].map(str)
churn_dic = {'True':1, 'False':0}
df['churn'] = df['churn'].map(churn_dic)
print(df.shape)
df.head()
(3333, 21)
state | account length | area code | phone number | international plan | voice mail plan | number vmail messages | total day minutes | total day calls | total day charge | ... | total eve calls | total eve charge | total night minutes | total night calls | total night charge | total intl minutes | total intl calls | total intl charge | customer service calls | churn | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | KS | 128 | 415 | 382-4657 | no | yes | 25 | 265.1 | 110 | 45.07 | ... | 99 | 16.78 | 244.7 | 91 | 11.01 | 10.0 | 3 | 2.70 | 1 | 0 |
1 | OH | 107 | 415 | 371-7191 | no | yes | 26 | 161.6 | 123 | 27.47 | ... | 103 | 16.62 | 254.4 | 103 | 11.45 | 13.7 | 3 | 3.70 | 1 | 0 |
2 | NJ | 137 | 415 | 358-1921 | no | no | 0 | 243.4 | 114 | 41.38 | ... | 110 | 10.30 | 162.6 | 104 | 7.32 | 12.2 | 5 | 3.29 | 0 | 0 |
3 | OH | 84 | 408 | 375-9999 | yes | no | 0 | 299.4 | 71 | 50.90 | ... | 88 | 5.26 | 196.9 | 89 | 8.86 | 6.6 | 7 | 1.78 | 2 | 0 |
4 | OK | 75 | 415 | 330-6626 | yes | no | 0 | 166.7 | 113 | 28.34 | ... | 122 | 12.61 | 186.9 | 121 | 8.41 | 10.1 | 3 | 2.73 | 3 | 0 |
5 rows × 21 columns
# 切分数据
X = df.iloc[:,8:19]
# X = df[['total day calls', 'total night charge', 'number vmail messages', 'total intl charge']]
y = df['churn'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3,
random_state = 23)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
(2333, 11) (1000, 11) (2333,) (1000,)
# LGB+Focal Loss 其中alpha:为不能让容易分类类别的损失函数太小;gamma:更加关注困难样本 即关注y=1的样本
focal_loss = lambda x,y: focal_loss_lgb_sk(x, y, alpha = 0.25, gamma = 2)
lgb_param = {
'learning_rate' : 0.01,
'max_depth':3,
'n_estimators':300,
'num_leaves' : 8,
'subsample':0.7,
'subsample_freq':3,
'colsample_bytree':0.7,
'scale_pos_weight':1,
'subsample_for_bin':200000,
'min_split_gain':0,
'min_child_weight':1e-3,
'min_child_samples':20,
'reg_alpha':0,
'reg_lambda':10,
'n_jobs':[-1],
'silent':True,
'class_weight':None,
'random_state':None,
'boosting_type':'gbdt',
'objective' : focal_loss
# 'objective' : 'binary_loss',
}
model = LGB_Train_Test(lgb_param, X_train, y_train, X_test, y_test)
Model Accuracy on Train set: 86.4981%
Model Accuracy on Test set: 83.8000%
The KS value of Train set is:
0.4307689687420647
The KS value of Test set is:
0.410751642652995
Focal Loss上述只是在一个demo数据集上跑通了,在实际的信贷数据中,Focal loss效果相比binary loss是有所提升的!涉及到公司的数据隐私,就不放图了。
完结撒花!