论文:Rethinking the Inception Architecture for Computer Vision
今天来进行讨论深度学习中的一种优化方法Label smoothing Regularization(LSR),即“标签平滑归一化”。由名字可以知道,它的优化对象是Label(Train_y)。
对于分类问题,尤其是多类别分类问题中,常常把类别向量做成one-hot vector(独热向量)。
简单地说,就是对于多分类向量,计算机中往往用[0, 1, 3]等此类离散的、随机的而非有序(连续)的向量表示,而one-hot vector 对应的向量便可表示为[0, 1, 0],即对于长度为n 的数组,只有一个元素是1,其余都为0。
p:predicted probability,预测的example的概率;
q:groundtruth probablity,真实的example的label概率;对于one-hot,真实概率为Dirac函数,即q(k)=δk,yq(k)=δk,y,其中y是真实类别。
loss:Cross Entropy,采用交叉熵损失。
对于损失函数,我们需要用预测概率去拟合真实概率,而拟合one-hot的真实概率函数会带来两个问题:1)无法保证模型的泛化能力,容易造成过拟合;2) 全概率和0概率鼓励所属类别和其他类别之间的差距尽可能加大,而由梯度有界可知,这种情况很难adapt。会造成模型过于相信预测的类别。
文章表示,对K = 1000,ϵ = 0.1的优化参数,实验结果有0.2%的性能提升。
import torch
import torch.nn as nn
class NMTCritierion(nn.Module):
1. Add label smoothing
def __init__(self, label_smoothing=0.0):
super(NMTCritierion, self).__init__()
self.label_smoothing = label_smoothing
self.LogSoftmax = nn.LogSoftmax()
if label_smoothing > 0:
self.criterion = nn.KLDivLoss(size_average=False)
self.criterion = nn.NLLLoss(size_average=False, ignore_index=100000)
self.confidence = 1.0 - label_smoothing
def _smooth_label(self, num_tokens):
# When label smoothing is turned on,
# KL-divergence between q_{smoothed ground truth prob.}(w)
# and p_{prob. computed by model}(w) is minimized.
# If label smoothing value is set to zero, the loss
# is equivalent to NLLLoss or CrossEntropyLoss.
# All non-true labels are uniformly set to low-confidence.
one_hot = torch.randn(1, num_tokens)
one_hot.fill_(self.label_smoothing / (num_tokens - 1))
return one_hot
def _bottle(self, v):
return v.view(-1, v.size(2))
def forward(self, dec_outs, labels):
scores = self.LogSoftmax(dec_outs)
num_tokens = scores.size(-1)
# conduct label_smoothing module
gtruth = labels.view(-1)
if self.confidence < 1:
tdata = gtruth.detach()
one_hot = self._smooth_label(num_tokens) # Do label smoothing, shape is [M]
if labels.is_cuda:
one_hot = one_hot.cuda()
tmp_ = one_hot.repeat(gtruth.size(0), 1) # [N, M]
tmp_.scatter_(1, tdata.unsqueeze(1), self.confidence) # after tdata.unsqueeze(1) , tdata shape is [N,1]
gtruth = tmp_.detach()
loss = self.criterion(scores, gtruth)
return loss
criterion = NMTCritierion(0.1)
outputs = net(img)
loss = criterion(outputs, lb)