YOLOv5代码注释版更新啦,注释的是最近的2021.07.14的版本,且注释更全
github: https://github.com/Laughing-q/yolov5_annotations
yolov5其他代码解析
暂时只做了build_targets和compute_loss函数的注释,主要是今天正好对yolov5的边框回归方式看了一下;
有时间再更新其它函数;
build_targets函数中有对yolov5边框回归的详细说明,毕竟现在也没有发paper,只能通过代码自己研究,要是有错误,欢迎指正。
def build_targets(p, targets, model):
"""
Args:
p: 网络输出,List[torch.tensor * 3], p[i].shape = (b, 3, h, w, nc+5), hw分别为特征图的长宽,b为batch-size
targets: targets.shape = (nt, 6) , 6=icxywh,i表示第一张图片,c为类别,然后为坐标xywh
model: 模型
Returns:
"""
# Build targets for compute_loss(), input targets(image,class,x,y,w,h)
# 获取每一个(3个)检测层
det = model.module.model[-1] if is_parallel(model) else model.model[-1] # Detect() module
# anchor数量和标签框数量
na, nt = det.na, targets.shape[0] # number of anchors, targets
tcls, tbox, indices, anch = [], [], [], []
gain = torch.ones(7, device=targets.device) # normalized to gridspace gain
# ai.shape = (na, nt) 生成anchor索引
ai = torch.arange(na, device=targets.device).float().view(na, 1).repeat(1, nt) # same as .repeat_interleave(nt)
# targets.shape = (na, nt, 7)
targets = torch.cat((targets.repeat(na, 1, 1), ai[:, :, None]), 2) # append anchor indices
# 设置偏移量
g = 0.5 # bias
# (5, 2)
off = torch.tensor([[0, 0],
[1, 0], [0, 1], [-1, 0], [0, -1], # j,k,l,m
# [1, 1], [1, -1], [-1, 1], [-1, -1], # jk,jm,lk,lm
], device=targets.device).float() * g # offsets
# 对每个检测层进行处理
for i in range(det.nl):
anchors = det.anchors[i]
# 得到特征图的坐标系数
"""
p[i].shape = (b, 3, h, w,nc+5), hw分别为特征图的长宽
gain = [1, 1, w, h, w, h, 1]
"""
gain[2:6] = torch.tensor(p[i].shape)[[3, 2, 3, 2]] # xyxy gain
# Match targets to anchors
# 将标签框的xywh从基于0~1映射到基于特征图
t = targets * gain
if nt:
# Matches
"""
预测的wh与anchor的wh做匹配,筛选掉比值大于hyp['anchor_t']的(这应该是yolov5的创新点),从而更好的回归(与新的边框回归方式有关)
由于yolov3回归wh采用的是out=exp(in),这很危险,因为out=exp(in)可能会无穷大,就会导致失控的梯度,不稳定,NaN损失并最终完全失去训练;
(当然原yolov3采用的是将targets进行反算来求in与网络输出的结果,就问题不大,但采用iou loss,就需要将网络输出算成out来进行loss求解,所以会面临这个问题);
所以作者采用新的wh回归方式:
(wh.sigmoid() * 2) ** 2 * anchors[i], 原来yolov3为anchors[i] * exp(wh)
将标签框与anchor的倍数控制在0~4之间;
hyp.scratch.yaml中的超参数anchor_t=4,所以也是通过此参数来判定anchors与标签框契合度;
"""
# 计算比值ratio
r = t[:, :, 4:6] / anchors[:, None] # wh ratio
"""
筛选满足1 / hyp['anchor_t'] < targets_wh/anchor_wh < hyp['anchor_t']的框;
由于wh回归公式中将标签框与anchor的倍数控制在0~4之间,所以这样筛选之后也会浪费一些输出空间;
由于分给每个特征金字塔层的anchor尺度都不一样,这里根据标签wh与anchor的wh的比例分配标签,
就相当于把不同尺度的GT分配给了不同的特征层来回归;
"""
j = torch.max(r, 1. / r).max(2)[0] < model.hyp['anchor_t'] # compare
# yolov5不再通过iou来分配标签,而仅仅使用网格分配;
# j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t'] # iou(3,n)=wh_iou(anchors(3,2), gwh(n,2))
# 筛选过后的t.shape = (M, 7),M为筛选过后的数量
t = t[j] # filter
# Offsets
# 得到中心点坐标xy(相对于左上角的), (M, 2)
gxy = t[:, 2:4] # grid xy
# 得到中心点相对于右下角的坐标, (M, 2)
gxi = gain[[2, 3]] - gxy # inverse
# ((gxy % 1. < g) & (gxy > 1.)).T shape为(2, M)
# jklm shape为(M, )
"""
把相对于各个网格左上角x<0.5,y<0.5和相对于右下角的x<0.5,y<0.5的框提取出来;
也就是j,k,l,m,在选取gij(也就是标签框分配给的网格的时候)对这四个部分的框都做一个偏移(减去上面的off),也就是下面的gij = (gxy - offsets).long()操作;
再将这四个部分的框与原始的gxy拼接在一起,总共就是五个部分;
也就是说:①将每个网格按照2x2分成四个部分,每个部分的框不仅采用当前网格的anchor进行回归,也采用该部分相邻的两个网格的anchor进行回归;
原yolov3就仅仅采用当前网格的anchor进行回归;
估计是用来缓解网格效应,但由于v5没发论文,所以也只是推测,yolov4也有相关解决网格效应的措施,是通过对sigmoid输出乘以一个大于1的系数;
这也与yolov5新的边框回归公式相关;
由于①,所以中心点回归也从yolov3的0~1的范围变成-0.5~1.5的范围;
所以中心点回归的公式变为:
xy.sigmoid() * 2. - 0.5 + cx
"""
j, k = ((gxy % 1. < g) & (gxy > 1.)).T
l, m = ((gxi % 1. < g) & (gxi > 1.)).T
# j.shape = (5, M)
j = torch.stack((torch.ones_like(j), j, k, l, m))
# t.shape = (5, M, 7)
# 得到筛选的框(N, 7), N为筛选后的个数
t = t.repeat((5, 1, 1))[j]
# 添加偏移量
# (1, M, 2) + (5, 1, 2) = (5, M, 2) --[j]--> (N, 2)
offsets = (torch.zeros_like(gxy)[None] + off[:, None])[j]
else:
t = targets[0]
offsets = 0
# Define
# b为batch中哪一张图片的索引,c为类别
b, c = t[:, :2].long().T # image, class
# 中心点回归标签
gxy = t[:, 2:4] # grid xy
# 长宽回归标签
gwh = t[:, 4:6] # grid wh
# 对应于原yolov3中,gij = gxy.long()
gij = (gxy - offsets).long()
gi, gj = gij.T # grid xy indices
# Append
# a为anchor的索引
a = t[:, 6].long() # anchor indices
# 添加索引,方便计算损失的时候取出对应位置的输出
indices.append((b, a, gj, gi)) # image, anchor, grid indices
tbox.append(torch.cat((gxy - gij, gwh), 1)) # box
anch.append(anchors[a]) # anchors
tcls.append(c) # class
return tcls, tbox, indices, anch
def compute_loss(p, targets, model): # predictions, targets, model
"""
Args:
p: 网络输出,List[torch.tensor * 3], p[i].shape = (b, 3, h, w, nc+5), hw分别为特征图的长宽,b为batch-size
targets: targets.shape = (nt, 6) , 6=icxywh,i表示第一张图片,c为类别,然后为坐标xywh
model: 模型
Returns:
"""
# 获取设备
device = targets.device
# 初始化各个部分损失
lcls, lbox, lobj = torch.zeros(1, device=device), torch.zeros(1, device=device), torch.zeros(1, device=device)
# 获得标签分类,边框,索引,anchor
tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets
# 获取超参数
h = model.hyp # hyperparameters
# Define criteria
# 定义损失函数
BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.Tensor([h['cls_pw']])).to(device)
BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.Tensor([h['obj_pw']])).to(device)
# Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
# 标签平滑,eps默认为0,其实是没用上。
cp, cn = smooth_BCE(eps=0.0)
# Focal loss
# 如果设置了fl_gamma参数,就使用focal loss,默认也是没使用的
g = h['fl_gamma'] # focal loss gamma
if g > 0:
BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)
# Losses
nt = 0 # number of targets
np = len(p) # number of outputs
# 设置三个特征图对应输出的损失系数
balance = [4.0, 1.0, 0.4] if np == 3 else [4.0, 1.0, 0.4, 0.1] # P3-5 or P3-6
for i, pi in enumerate(p): # layer index, layer predictions
# 根据indices获取索引,方便找到对应网格的输出
b, a, gj, gi = indices[i] # image, anchor, gridy, gridx
tobj = torch.zeros_like(pi[..., 0], device=device) # target obj
n = b.shape[0] # number of targets
if n:
nt += n # cumulative targets
# 找到对应网格的输出
"""
这一步只取与标签所在网格的预测值来回归,
从这里可以看出yolov5是根据网格来分配标签与预测(没有gt与预测框的iou计算);
"""
ps = pi[b, a, gj, gi] # prediction subset corresponding to targets
# Regression
# 对输出xywh做反算
pxy = ps[:, :2].sigmoid() * 2. - 0.5
pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i]
pbox = torch.cat((pxy, pwh), 1).to(device) # predicted box
# 计算边框损失,注意这个CIoU=True,计算的是ciou损失
giou = bbox_iou(pbox.T, tbox[i], x1y1x2y2=False, CIoU=True) # giou(prediction, target)
lbox += (1.0 - giou).mean() # giou loss
# Objectness
# 根据model.gr设置objectness的标签值
tobj[b, a, gj, gi] = (1.0 - model.gr) + model.gr * giou.detach().clamp(0).type(tobj.dtype) # giou ratio
# Classification
# 设置如果类别数大于1才计算分类损失
if model.nc > 1: # cls loss (only if multiple classes)
t = torch.full_like(ps[:, 5:], cn, device=device) # targets
t[range(n), tcls[i]] = cp
lcls += BCEcls(ps[:, 5:], t) # BCE
# Append targets to text file
# with open('targets.txt', 'a') as file:
# [file.write('%11.5g ' * 4 % tuple(x) + '\n') for x in torch.cat((txy[i], twh[i]), 1)]
# 计算objectness的损失
lobj += BCEobj(pi[..., 4], tobj) * balance[i] # obj loss
s = 3 / np # output count scaling
# 根据超参数设置的各个部分损失的系数 获取最终损失
lbox *= h['giou'] * s
lobj *= h['obj'] * s * (1.4 if np == 4 else 1.)
lcls *= h['cls'] * s
bs = tobj.shape[0] # batch size
loss = lbox + lobj + lcls
return loss * bs, torch.cat((lbox, lobj, lcls, loss)).detach()