YOLOV5 general.py注释与解析

YOLOv5代码注释版更新啦,注释的是最近的2021.07.14的版本,且注释更全
github: https://github.com/Laughing-q/yolov5_annotations

YOLOV5 general.py注释与解析

yolov5其他代码解析

暂时只做了build_targets和compute_loss函数的注释,主要是今天正好对yolov5的边框回归方式看了一下;
有时间再更新其它函数;
build_targets函数中有对yolov5边框回归的详细说明,毕竟现在也没有发paper,只能通过代码自己研究,要是有错误,欢迎指正。

def build_targets(p, targets, model):
    """
    Args:
        p: 网络输出,List[torch.tensor * 3], p[i].shape = (b, 3, h, w, nc+5), hw分别为特征图的长宽,b为batch-size
        targets: targets.shape = (nt, 6) , 6=icxywh,i表示第一张图片,c为类别,然后为坐标xywh
        model: 模型

    Returns:

    """
    # Build targets for compute_loss(), input targets(image,class,x,y,w,h)
    # 获取每一个(3个)检测层
    det = model.module.model[-1] if is_parallel(model) else model.model[-1]  # Detect() module
    # anchor数量和标签框数量
    na, nt = det.na, targets.shape[0]  # number of anchors, targets
    tcls, tbox, indices, anch = [], [], [], []
    gain = torch.ones(7, device=targets.device)  # normalized to gridspace gain
    # ai.shape = (na, nt) 生成anchor索引
    ai = torch.arange(na, device=targets.device).float().view(na, 1).repeat(1, nt)  # same as .repeat_interleave(nt)
    # targets.shape = (na, nt, 7)
    targets = torch.cat((targets.repeat(na, 1, 1), ai[:, :, None]), 2)  # append anchor indices

    # 设置偏移量
    g = 0.5  # bias
    # (5, 2)
    off = torch.tensor([[0, 0],
                        [1, 0], [0, 1], [-1, 0], [0, -1],  # j,k,l,m
                        # [1, 1], [1, -1], [-1, 1], [-1, -1],  # jk,jm,lk,lm
                        ], device=targets.device).float() * g  # offsets
    # 对每个检测层进行处理
    for i in range(det.nl):
        anchors = det.anchors[i]
        # 得到特征图的坐标系数
        """
        p[i].shape = (b, 3, h, w,nc+5), hw分别为特征图的长宽
        gain = [1, 1, w, h, w, h, 1]
        """
        gain[2:6] = torch.tensor(p[i].shape)[[3, 2, 3, 2]]  # xyxy gain

        # Match targets to anchors
        # 将标签框的xywh从基于0~1映射到基于特征图
        t = targets * gain
        if nt:
            # Matches
            """
            预测的wh与anchor的wh做匹配,筛选掉比值大于hyp['anchor_t']的(这应该是yolov5的创新点),从而更好的回归(与新的边框回归方式有关)
            由于yolov3回归wh采用的是out=exp(in),这很危险,因为out=exp(in)可能会无穷大,就会导致失控的梯度,不稳定,NaN损失并最终完全失去训练;
            (当然原yolov3采用的是将targets进行反算来求in与网络输出的结果,就问题不大,但采用iou loss,就需要将网络输出算成out来进行loss求解,所以会面临这个问题);
            所以作者采用新的wh回归方式:
            (wh.sigmoid() * 2) ** 2 * anchors[i], 原来yolov3为anchors[i] * exp(wh)
            将标签框与anchor的倍数控制在0~4之间;
            hyp.scratch.yaml中的超参数anchor_t=4,所以也是通过此参数来判定anchors与标签框契合度;
            """
            # 计算比值ratio
            r = t[:, :, 4:6] / anchors[:, None]  # wh ratio
            """
            筛选满足1 / hyp['anchor_t'] < targets_wh/anchor_wh < hyp['anchor_t']的框;
            由于wh回归公式中将标签框与anchor的倍数控制在0~4之间,所以这样筛选之后也会浪费一些输出空间;
            由于分给每个特征金字塔层的anchor尺度都不一样,这里根据标签wh与anchor的wh的比例分配标签,
            就相当于把不同尺度的GT分配给了不同的特征层来回归;
            """
            j = torch.max(r, 1. / r).max(2)[0] < model.hyp['anchor_t']  # compare
            # yolov5不再通过iou来分配标签,而仅仅使用网格分配;
            # j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t']  # iou(3,n)=wh_iou(anchors(3,2), gwh(n,2))
            # 筛选过后的t.shape = (M, 7),M为筛选过后的数量
            t = t[j]  # filter

            # Offsets
            # 得到中心点坐标xy(相对于左上角的), (M, 2)
            gxy = t[:, 2:4]  # grid xy
            # 得到中心点相对于右下角的坐标, (M, 2)
            gxi = gain[[2, 3]] - gxy  # inverse
            # ((gxy % 1. < g) & (gxy > 1.)).T shape为(2, M)
            # jklm shape为(M, )
            """
            把相对于各个网格左上角x<0.5,y<0.5和相对于右下角的x<0.5,y<0.5的框提取出来;
            也就是j,k,l,m,在选取gij(也就是标签框分配给的网格的时候)对这四个部分的框都做一个偏移(减去上面的off),也就是下面的gij = (gxy - offsets).long()操作;
            再将这四个部分的框与原始的gxy拼接在一起,总共就是五个部分;
            也就是说:①将每个网格按照2x2分成四个部分,每个部分的框不仅采用当前网格的anchor进行回归,也采用该部分相邻的两个网格的anchor进行回归;
            原yolov3就仅仅采用当前网格的anchor进行回归;
            估计是用来缓解网格效应,但由于v5没发论文,所以也只是推测,yolov4也有相关解决网格效应的措施,是通过对sigmoid输出乘以一个大于1的系数;
            这也与yolov5新的边框回归公式相关;
            由于①,所以中心点回归也从yolov3的0~1的范围变成-0.5~1.5的范围;
            所以中心点回归的公式变为:
            xy.sigmoid() * 2. - 0.5 + cx
            """
            j, k = ((gxy % 1. < g) & (gxy > 1.)).T
            l, m = ((gxi % 1. < g) & (gxi > 1.)).T
            # j.shape = (5, M)
            j = torch.stack((torch.ones_like(j), j, k, l, m))
            # t.shape = (5, M, 7)
            # 得到筛选的框(N, 7), N为筛选后的个数
            t = t.repeat((5, 1, 1))[j]
            # 添加偏移量
            # (1, M, 2) + (5, 1, 2) = (5, M, 2) --[j]--> (N, 2)
            offsets = (torch.zeros_like(gxy)[None] + off[:, None])[j]
        else:
            t = targets[0]
            offsets = 0

        # Define
        # b为batch中哪一张图片的索引,c为类别
        b, c = t[:, :2].long().T  # image, class
        # 中心点回归标签
        gxy = t[:, 2:4]  # grid xy
        # 长宽回归标签
        gwh = t[:, 4:6]  # grid wh
        # 对应于原yolov3中,gij = gxy.long()
        gij = (gxy - offsets).long()
        gi, gj = gij.T  # grid xy indices

        # Append
        # a为anchor的索引
        a = t[:, 6].long()  # anchor indices
        # 添加索引,方便计算损失的时候取出对应位置的输出
        indices.append((b, a, gj, gi))  # image, anchor, grid indices
        tbox.append(torch.cat((gxy - gij, gwh), 1))  # box
        anch.append(anchors[a])  # anchors
        tcls.append(c)  # class

    return tcls, tbox, indices, anch

def compute_loss(p, targets, model):  # predictions, targets, model
    """
    Args:
        p: 网络输出,List[torch.tensor * 3], p[i].shape = (b, 3, h, w, nc+5), hw分别为特征图的长宽,b为batch-size
        targets: targets.shape = (nt, 6) , 6=icxywh,i表示第一张图片,c为类别,然后为坐标xywh
        model: 模型

    Returns:

    """
    # 获取设备
    device = targets.device
    # 初始化各个部分损失
    lcls, lbox, lobj = torch.zeros(1, device=device), torch.zeros(1, device=device), torch.zeros(1, device=device)
    # 获得标签分类,边框,索引,anchor
    tcls, tbox, indices, anchors = build_targets(p, targets, model)  # targets
    # 获取超参数
    h = model.hyp  # hyperparameters

    # Define criteria
    # 定义损失函数
    BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.Tensor([h['cls_pw']])).to(device)
    BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.Tensor([h['obj_pw']])).to(device)

    # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
    # 标签平滑,eps默认为0,其实是没用上。
    cp, cn = smooth_BCE(eps=0.0)

    # Focal loss
    # 如果设置了fl_gamma参数,就使用focal loss,默认也是没使用的
    g = h['fl_gamma']  # focal loss gamma
    if g > 0:
        BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)

    # Losses
    nt = 0  # number of targets
    np = len(p)  # number of outputs
    # 设置三个特征图对应输出的损失系数
    balance = [4.0, 1.0, 0.4] if np == 3 else [4.0, 1.0, 0.4, 0.1]  # P3-5 or P3-6
    for i, pi in enumerate(p):  # layer index, layer predictions
        # 根据indices获取索引,方便找到对应网格的输出
        b, a, gj, gi = indices[i]  # image, anchor, gridy, gridx
        tobj = torch.zeros_like(pi[..., 0], device=device)  # target obj

        n = b.shape[0]  # number of targets
        if n:
            nt += n  # cumulative targets
            # 找到对应网格的输出
            """
            这一步只取与标签所在网格的预测值来回归,
            从这里可以看出yolov5是根据网格来分配标签与预测(没有gt与预测框的iou计算);
            """
            ps = pi[b, a, gj, gi]  # prediction subset corresponding to targets

            # Regression
            # 对输出xywh做反算
            pxy = ps[:, :2].sigmoid() * 2. - 0.5
            pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i]
            pbox = torch.cat((pxy, pwh), 1).to(device)  # predicted box
            # 计算边框损失,注意这个CIoU=True,计算的是ciou损失
            giou = bbox_iou(pbox.T, tbox[i], x1y1x2y2=False, CIoU=True)  # giou(prediction, target)
            lbox += (1.0 - giou).mean()  # giou loss

            # Objectness
            # 根据model.gr设置objectness的标签值
            tobj[b, a, gj, gi] = (1.0 - model.gr) + model.gr * giou.detach().clamp(0).type(tobj.dtype)  # giou ratio

            # Classification
            # 设置如果类别数大于1才计算分类损失
            if model.nc > 1:  # cls loss (only if multiple classes)
                t = torch.full_like(ps[:, 5:], cn, device=device)  # targets
                t[range(n), tcls[i]] = cp
                lcls += BCEcls(ps[:, 5:], t)  # BCE

            # Append targets to text file
            # with open('targets.txt', 'a') as file:
            #     [file.write('%11.5g ' * 4 % tuple(x) + '\n') for x in torch.cat((txy[i], twh[i]), 1)]
        # 计算objectness的损失
        lobj += BCEobj(pi[..., 4], tobj) * balance[i]  # obj loss

    s = 3 / np  # output count scaling
    # 根据超参数设置的各个部分损失的系数 获取最终损失
    lbox *= h['giou'] * s
    lobj *= h['obj'] * s * (1.4 if np == 4 else 1.)
    lcls *= h['cls'] * s
    bs = tobj.shape[0]  # batch size

    loss = lbox + lobj + lcls
    return loss * bs, torch.cat((lbox, lobj, lcls, loss)).detach()

你可能感兴趣的:(YOLOV5,深度学习,计算机视觉,pytorch,yolov5代码注释)