参考文献:https://blog.csdn.net/sinat_26917383/article/details/72857454、
https://www.cnblogs.com/bnuvincent/p/7357632.html
两者的区别为:
class_weight—主要针对的上数据不均衡问题,比如:异常检测的二项分类问题,异常数据仅占1%,正常数据占99%; 此时就要设置不同类对loss的影响。
sample_weight—主要解决的是样本质量不同的问题,比如前1000个样本的可信度,那么它的权重就要高,后1000个样本可能有错、不可信,那么权重就要调低。
class_weight的使用:
cw = {0: 1, 1: 50}
model.fit(x_train, y_train,batch_size=batch_size,epochs=epochs,verbose=1,callbacks=cbks,validation_data=(x_test, y_test), shuffle=True,class_weight=cw)
sample_weight的使用:
来源:https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/discussion/46673
from sklearn.utils import class_weight
list_classes = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
y = train[list_classes].values
sample_weights = class_weight.compute_sample_weight('balanced', y)
model.fit(X_t, y, batch_size=batch_size, epochs=epochs,validation_split=0.1,sample_weight=sample_weights, callbacks=callbacks_list)
其中,如何确定权重呢?
训练时,设置的权重:
class_weight={ 1: n_non_cancer_samples / n_cancer_samples * t } t也是权重0-1,根据需要自己设置,就是正负样本权重1:t,如t=1,则二者loss的权重是1:1
参考https://blog.csdn.net/ZZU_chenhao/article/details/98212110 (sklearn.utils.class_weight来计算权重)
https://blog.csdn.net/weixin_40755306/article/details/82290150?utm_source=blogxgwz2