主要讲解model的pred和筛选的gt进行loss计算过程,包括正负样本的区分,以及二值交叉熵loss和forcal loss的转换及使用代码,还有关于IOU的计算。
def compute_loss(p, targets, model): # predictions, targets, model
device = p[0].device
lcls = torch.zeros(1, device=device) # Tensor(0)
lbox = torch.zeros(1, device=device) # Tensor(0)
lobj = torch.zeros(1, device=device) # Tensor(0)
tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets
h = model.hyp # hyperparameters
red = 'mean' # Loss reduction (sum or mean)
# Define criteria
BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device), reduction=red)
BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device), reduction=red)
# class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
cp, cn = smooth_BCE(eps=0.1)
# focal loss
g = h['fl_gamma'] # focal loss gamma
if g > 0:
BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)
# per output
nt = 0 # targets
for i, pi in enumerate(p): # layer index, layer predictions
b, a, gj, gi = indices[i] # image, anchor, gridy, gridx
tobj = torch.zeros_like(pi[..., 0], device=device) # target obj
nb = b.shape[0] # number of targets
if nb:
# 对应匹配到正样本的预测信息
ps = pi[b, a, gj, gi] # prediction subset corresponding to targets
# GIoU
pxy = ps[:, :2].sigmoid()
pwh = ps[:, 2:4].exp().clamp(max=1E3) * anchors[i]
pbox = torch.cat((pxy, pwh), 1) # predicted box
giou = bbox_iou(pbox.t(), tbox[i], x1y1x2y2=False, GIoU=True) # giou(prediction, target)
lbox += (1.0 - giou).mean() # giou loss
# Obj
tobj[b, a, gj, gi] = (1.0 - model.gr) + model.gr * giou.detach().clamp(0).type(tobj.dtype) # giou ratio
# Class
if model.nc > 1: # cls loss (only if multiple classes)
t = torch.full_like(ps[:, 5:], cn, device=device) # targets
t[range(nb), tcls[i]] = cp
lcls += BCEcls(ps[:, 5:], t) # BCE
# Append targets to text file
# with open('targets.txt', 'a') as file:
# [file.write('%11.5g ' * 4 % tuple(x) + '\n') for x in torch.cat((txy[i], twh[i]), 1)]
lobj += BCEobj(pi[..., 4], tobj) # obj loss
# 乘上每种损失的对应权重
lbox *= h['giou']
lobj *= h['obj']
lcls *= h['cls']
# loss = lbox + lobj + lcls
return {"box_loss": lbox,
"obj_loss": lobj,
"class_loss": lcls}
tbox:筛选出来的gt的box信息,tx,ty,w,h。其中tx,ty是偏移量;w,h是宽高, s h a p e ( Y o l o L a y e r _ n u m , t a r g e t s _ n u m , t x t y w h ) shape(YoloLayer\_num,targets\_num,t_xt_ywh) shape(YoloLayer_num,targets_num,txtywh)
red = 'mean' # Loss reduction (sum or mean)
# Define criteria
BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device), reduction=red)
BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device), reduction=red)
# class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
cp, cn = smooth_BCE(eps=0.1)
# focal loss
g = h['fl_gamma'] # focal loss gamma
if g > 0:
BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)
def smooth_BCE(eps=0.1): # https://github.com/ultralytics/yolov3/issues/238#issuecomment-598028441
# return positive, negative label smoothing BCE targets
return 1.0 - 0.5 * eps, 0.5 * eps
这里如果超参数fl_gamma>0,则将定义好的BCEloss module传入Foralloss module修改为Forcalloss。
class FocalLoss(nn.Module):
# Wraps focal loss around existing loss_fcn(), i.e. criteria = FocalLoss(nn.BCEWithLogitsLoss(), gamma=1.5)
def __init__(self, loss_fcn, gamma=1.5, alpha=0.25):
super(FocalLoss, self).__init__()
self.loss_fcn = loss_fcn # must be nn.BCEWithLogitsLoss()
self.gamma = gamma
self.alpha = alpha
self.reduction = loss_fcn.reduction
self.loss_fcn.reduction = 'none' # required to apply FL to each element
def forward(self, pred, true):
loss = self.loss_fcn(pred, true)
# p_t = torch.exp(-loss)
# loss *= self.alpha * (1.000001 - p_t) ** self.gamma # non-zero power for gradient stability
# TF implementation https://github.com/tensorflow/addons/blob/v0.7.1/tensorflow_addons/losses/focal_loss.py
pred_prob = torch.sigmoid(pred) # prob from logits
p_t = true * pred_prob + (1 - true) * (1 - pred_prob)
alpha_factor = true * self.alpha + (1 - true) * (1 - self.alpha)
modulating_factor = (1.0 - p_t) ** self.gamma
loss *= alpha_factor * modulating_factor
if self.reduction == 'mean':
return loss.mean()
elif self.reduction == 'sum':
return loss.sum()
else: # 'none'
return loss
# per output
nt = 0 # targets
for i, pi in enumerate(p): # layer index, layer predictions
b, a, gj, gi = indices[i] # image, anchor, gridy, gridx
tobj = torch.zeros_like(pi[..., 0], device=device) # target obj
nb = b.shape[0] # number of targets
b , a , g j , g i b,a,gj,gi b,a,gj,gi分别表示 i m a g e _ i n d e x , a n c h o r _ i n d e x , g r i d _ y , g i r d _ x image\_index,anchor\_index,grid\_y,gird\_x image_index,anchor_index,grid_y,gird_x
if nb:
# 对应匹配到正样本的预测信息
ps = pi[b, a, gj, gi] # prediction subset corresponding to targets
# GIoU
pxy = ps[:, :2].sigmoid()
pwh = ps[:, 2:4].exp().clamp(max=1E3) * anchors[i]
pbox = torch.cat((pxy, pwh), 1) # predicted box
giou = bbox_iou(pbox.t(), tbox[i], x1y1x2y2=False, GIoU=True) # giou(prediction, target)
lbox += (1.0 - giou).mean() # giou loss
ps = pi[b, a, gj, gi]表示取该模型输出中前四个维度为[image_index,anchor_index,gj,gi]的[x,y,w,h,obj,cls]
pwh将预测的未处理的宽高维度的信息基于anchor映射到feature map尺度上的宽高信息,shape为(targets_num,2)
注:pwh的映射调用一个clamp函数1E3表示 1 × 1 0 3 1\times 10^3 1×103,表示将pwh的值限制在1000之内,不清楚有无必要,models的处理没有用到这个。
io[..., 2:4] = torch.exp(io[..., 2:4]) * self.anchor_wh
论文中,关于预测输出的映射关系如下: σ \sigma σ表示sigmoid处理
pbox:预测输出中xy是对应grid_cell的中心偏移量,尺度均是feature map尺度
s h a p e ( t a r g e t s _ n u m , x y w h ) shape(targets\_num,xywh) shape(targets_num,xywh)
tbox:筛选出来的gt的box信息,tx,ty,w,h。其中tx,ty是偏移量;w,h是宽高,尺度均是feature map尺度
s h a p e ( Y o l o L a y e r _ n u m , t a r g e t s _ n u m , t x t y w h ) shape(YoloLayer\_num,targets\_num,t_xt_ywh) shape(YoloLayer_num,targets_num,txtywh)
pbox和tbox的尺度是feature map尺度
for i, pi in enumerate(p): # layer index, layer predictions
b, a, gj, gi = indices[i] # image, anchor, gridy, gridx
tobj = torch.zeros_like(pi[..., 0], device=device) # target obj
nb = b.shape[0] # number of targets
if nb:
# 对应匹配到正样本的预测信息
ps = pi[b, a, gj, gi] # prediction subset corresponding to targets
# GIoU
pxy = ps[:, :2].sigmoid()
pwh = ps[:, 2:4].exp().clamp(max=1E3) * anchors[i]
pbox = torch.cat((pxy, pwh), 1) # predicted box
pi的shape为(batch_size,anchor_num,grid_x,grid_y,xywh+obj_confidence+classes_num),其中xy表示基于当前grid_x和grid_y的偏移量,pxy对该偏移量经过sigmoid处理后得到的xy就是feature map尺度上的对应grid_x和grid_y的偏移量(注:ps将gt对应图片,使用anchor,所在的gridcell的预测xywh筛选出来,和tbox一一对应)
build_targets函数中tbox是feature map尺度上的,详情见YOLO-V3-SPP utils.py build_targets函数-详细解读(ultralytic版本)
giou = bbox_iou(pbox.t(), tbox[i], x1y1x2y2=False, GIoU=True)
# giou(prediction, target)
pbox:预测输出中恢复到feature map尺度的xywh,xy也是偏移量
s h a p e ( t a r g e t s _ n u m , x y w h ) shape(targets\_num,xywh) shape(targets_num,xywh)
s h a p e ( Y o l o L a y e r _ n u m , t a r g e t s _ n u m , t x t y w h ) shape(YoloLayer\_num,targets\_num,t_xt_ywh) shape(YoloLayer_num,targets_num,txtywh)
def bbox_iou(box1, box2, x1y1x2y2=True, GIoU=False, DIoU=False, CIoU=False):
# Returns the IoU of box1 to box2. box1 is 4, box2 is nx4
box2 = box2.t()
# Get the coordinates of bounding boxes
if x1y1x2y2: # x1, y1, x2, y2 = box1
b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]
b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]
else: # transform from xywh to xyxy
b1_x1, b1_x2 = box1[0] - box1[2] / 2, box1[0] + box1[2] / 2
b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2
b2_x1, b2_x2 = box2[0] - box2[2] / 2, box2[0] + box2[2] / 2
b2_y1, b2_y2 = box2[1] - box2[3] / 2, box2[1] + box2[3] / 2
# Intersection area
inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \
(torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0)
# Union Area
w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1
w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1
union = (w1 * h1 + 1e-16) + w2 * h2 - inter
iou = inter / union # iou
if GIoU or DIoU or CIoU:
cw = torch.max(b1_x2, b2_x2) - torch.min(b1_x1, b2_x1) # convex (smallest enclosing box) width
ch = torch.max(b1_y2, b2_y2) - torch.min(b1_y1, b2_y1) # convex height
if GIoU: # Generalized IoU https://arxiv.org/pdf/1902.09630.pdf
c_area = cw * ch + 1e-16 # convex area
return iou - (c_area - union) / c_area # GIoU
if DIoU or CIoU: # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
# convex diagonal squared
c2 = cw ** 2 + ch ** 2 + 1e-16
# centerpoint distance squared
rho2 = ((b2_x1 + b2_x2) - (b1_x1 + b1_x2)) ** 2 / 4 + ((b2_y1 + b2_y2) - (b1_y1 + b1_y2)) ** 2 / 4
if DIoU:
return iou - rho2 / c2 # DIoU
elif CIoU: # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
v = (4 / math.pi ** 2) * torch.pow(torch.atan(w2 / h2) - torch.atan(w1 / h1), 2)
with torch.no_grad():
alpha = v / (1 - iou + v)
return iou - (rho2 / c2 + v * alpha) # CIoU
return iou
# giou = bbox_iou(pbox.t(), tbox[i], x1y1x2y2=False, GIoU=True)
giou = bbox_iou(pbox.t(), tbox[i], x1y1x2y2=False, GIoU=True) # giou(prediction, target)
lbox += (1.0 - giou).mean() # giou loss
回顾GIOU损失计算公式: L G I O U = 1 − G I O U L_{GIOU}=1-GIOU LGIOU=1−GIOU
l b o x = ∑ j = 1 t a r g e t s _ n u m ( 1 − g i o u j ) t a r g e t s _ n u m lbox =\frac{\sum^{targets\_num}_{j=1}(1-giou_j)}{targets\_num} lbox=targets_num∑j=1targets_num(1−giouj)
# Obj
tobj[b, a, gj, gi] = (1.0 - model.gr) + model.gr * giou.detach().clamp(0).type(tobj.dtype) # giou ratio
lobj += BCEobj(pi[..., 4], tobj) # obj loss
这里giou经过了clamp(0),将giou的下限设为0,giou的理论范围是 [ − 1 , 1 ] [-1,1] [−1,1],在进行loss计算时, L G I O U = 1 − G I O U L_{GIOU}=1-GIOU LGIOU=1−GIOU使在pred和gt不重叠时得到的负值giou变为正值,从而能够训练。但在计算置信度损失时,对于不重叠的pred和gt,默认是将giou置0,表示置信度为0。
model.gr = 1.0 # giou loss ratio (obj_loss = 1.0 or giou)
tobj = torch.zeros_like(pi[..., 0], device=device) # target obj
tobj[b, a, gj, gi]筛选了对应targets的tensor维度的数据进行置信度填充
tobj[b, a, gj, gi] = (1.0 - model.gr) + model.gr * giou.detach().clamp(0).type(tobj.dtype)
lobj += BCEobj(pi[..., 4], tobj) # obj loss
L c o n f ( o , c ) = − ∑ i ( o i ln ( c ^ i ) + ( 1 − o i ) ln ( 1 − c ^ i ) ) N L_{conf}(o,c)=-\frac{\sum_i(o_i\ln(\hat{c}_i)+(1-o_i)\ln(1-\hat{c}_i))}{N} Lconf(o,c)=−N∑i(oiln(c^i)+(1−oi)ln(1−c^i))
c ^ i = S i g m o i d ( c i ) \hat{c}_i=Sigmoid(c_i) c^i=Sigmoid(ci)
其中 o i ∈ [ 0 , 1 ] o_i\in[0,1] oi∈[0,1],表示预测目标边界框与真实目标边界框的IOU,
c c c为预测值, c ^ i \hat{c}_i c^i为 c c c通过 S i g m o i d Sigmoid Sigmoid函数得到的预测置信度(预测的IOU)
N N N为正负样本个数
注: o i o_i oi和 c ^ i \widehat{c}_i c i均指IOU,唯一区别是 o i o_i oi是pred和gt的IOU(作为label指导训练),而 c ^ i \widehat{c}_i c i是网络pred的IOU。
BCEloss能解决二分类问题,这里一类是pred的 c ^ i \widehat{c}_i c i,一类是pred和gt的 o i o_i oi(作为label指导训练)。对于每个target,置信度的训练都是一个二分类问题,因此使用BCEloss作为loss计算,这里对每个targets求得的BCE进行求和取平均。
如果采用了Forcal loss,BCEloss将初始化为Forcal loss类
# focal loss
g = h['fl_gamma'] # focal loss gamma
if g > 0:
BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)
# Class
if model.nc > 1: # cls loss (only if multiple classes)
t = torch.full_like(ps[:, 5:], cn, device=device) # targets
t[range(nb), tcls[i]] = cp
lcls += BCEcls(ps[:, 5:], t) # BCE
# 乘上每种损失的对应权重
lbox *= h['giou']
lobj *= h['obj']
lcls *= h['cls']
# loss = lbox + lobj + lcls
return {"box_loss": lbox,
"obj_loss": lobj,
"class_loss": lcls}
置信度损失lbox在计算的时候需要计算正负样本的损失,所以在对lbox进行累加时,lbox的累加是放在if nb:之外的,即判断体之外,这样才能累加到正负样本。