YoloV5 部分原理解析

输入端改进思路:

  • Mosaic数据增强

主要思想是将四张图片进行随机裁剪,再拼接到一张图上作为训练数据。这样做的好处是丰富了图片的背景,并且四张图片拼接在一起变相地提高了batch_size,在进行batch normalization的时候也会计算四张图片,所以对本身batch_size不是很依赖,单块GPU就可以训练。对四张图片进行拼接,每一张图片都有其对应的框,将四张图片拼接之后就获得一张新的图片,同时也获得这张图片对应的框,然后我们将这样一张新的图片传入到神经网络当中去学习,相当于一下子传入四张图片进行学习了。这极大丰富了检测物体的背景!且在标准化BN计算的时候一下子会计算四张图片的数据!

YoloV5 部分原理解析_第1张图片

数据增强之Mosaic (Mixup,Cutout,CutMix) - 知乎

  • 自适应锚框计算

Yolov5原本在模型配置文件(如yolov5l.py)中有默认的anchors,这些anchors是基于COCO数据集在640×640图像大小下锚定框的尺寸。Yolov5会自动按照新的数据集的labels自动学习anchors的尺寸。采用 k 均值和遗传学习算法对自定义数据集进行分析,获得适合自定义数据集中对象边界框预测的预设锚定框。

一开始会先计算Best Possible Recall (BPR)

再在kmean_anchors函数中进行k 均值和遗传学习算法更新anchors。

  • 自适应图片缩放

计算收缩比,计算收缩后的长宽,计算需要填充的像素,最后resize图片并填充像素

需要注意的是:

a.这里大白填充的是黑色,即(0,0,0),而Yolov5中填充的是灰色,即(114,114,114),都是一样的效果。

b.训练时没有采用缩减黑边的方式,还是采用传统填充的方式,即缩放到416*416大小。只是在测试,使用模型推理时,才采用缩减黑边的方式,提高目标检测,推理的速度。

Focus模块的原理

Focus模块在v5中是图片进入backbone前,对图片进行切片操作,具体操作是在一张图片中每隔一个像素拿到一个值,类似于邻近下采样,这样就拿到了四张图片,四张图片互补,长的差不多,但是没有信息丢失,这样一来,将W、H信息就集中到了通道空间,输入通道扩充了4倍,即拼接起来的图片相对于原先的RGB三通道模式变成了12个通道,最后将得到的新图片再经过卷积操作,最终得到了没有信息丢失情况下的二倍下采样特征图。

以yolov5s为例,原始的640 × 640 × 3的图像输入Focus结构,采用切片操作,先变成320 × 320 × 12的特征图,再经过一次卷积操作,最终变成320 × 320 × 32的特征图。切片操作如下:

YoloV5 部分原理解析_第2张图片

Focus层将w-h平面上的信息转换到通道维度,再通过3*3卷积的方式提取不同特征。采用这种方式可以减少下采样带来的信息损失 。

Sequential(
  (0): Focus(
    (conv): Conv(
      (conv): Conv2d(12, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (act): SiLU()
    )
  )
  (1): Conv(
    (conv): Conv2d(48, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (act): SiLU()
  )
  (2): C3(
    (cv1): Conv(
      (conv): Conv2d(96, 48, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv2): Conv(
      (conv): Conv2d(96, 48, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv3): Conv(
      (conv): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (1): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
    )
  )
  (3): Conv(
    (conv): Conv2d(96, 192, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (act): SiLU()
  )
  (4): C3(
    (cv1): Conv(
      (conv): Conv2d(192, 96, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv2): Conv(
      (conv): Conv2d(192, 96, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv3): Conv(
      (conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (1): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (2): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (3): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (4): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (5): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
    )
  )
  (5): Conv(
    (conv): Conv2d(192, 384, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (act): SiLU()
  )
  (6): C3(
    (cv1): Conv(
      (conv): Conv2d(384, 192, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv2): Conv(
      (conv): Conv2d(384, 192, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv3): Conv(
      (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (1): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (2): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (3): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (4): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (5): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
    )
  )
  (7): Conv(
    (conv): Conv2d(384, 768, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (act): SiLU()
  )
  (8): SPP(
    (cv1): Conv(
      (conv): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv2): Conv(
      (conv): Conv2d(1536, 768, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (m): ModuleList(
      (0): MaxPool2d(kernel_size=5, stride=1, padding=2, dilation=1, ceil_mode=False)
      (1): MaxPool2d(kernel_size=9, stride=1, padding=4, dilation=1, ceil_mode=False)
      (2): MaxPool2d(kernel_size=13, stride=1, padding=6, dilation=1, ceil_mode=False)
    )
  )
  (9): C3(
    (cv1): Conv(
      (conv): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv2): Conv(
      (conv): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv3): Conv(
      (conv): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (1): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
    )
  )
  (10): Conv(
    (conv): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1))
    (act): SiLU()
  )
  (11): Upsample(scale_factor=2.0, mode=nearest)
  (12): Concat()
  (13): C3(
    (cv1): Conv(
      (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv2): Conv(
      (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv3): Conv(
      (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (1): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
    )
  )
  (14): Conv(
    (conv): Conv2d(384, 192, kernel_size=(1, 1), stride=(1, 1))
    (act): SiLU()
  )
  (15): Upsample(scale_factor=2.0, mode=nearest)
  (16): Concat()
  (17): C3(
    (cv1): Conv(
      (conv): Conv2d(384, 96, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv2): Conv(
      (conv): Conv2d(384, 96, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv3): Conv(
      (conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (1): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
    )
  )
  (18): Conv(
    (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (act): SiLU()
  )
  (19): Concat()
  (20): C3(
    (cv1): Conv(
      (conv): Conv2d(384, 192, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv2): Conv(
      (conv): Conv2d(384, 192, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv3): Conv(
      (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (1): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
    )
  )
  (21): Conv(
    (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (act): SiLU()
  )
  (22): Concat()
  (23): C3(
    (cv1): Conv(
      (conv): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv2): Conv(
      (conv): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (cv3): Conv(
      (conv): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU()
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
      (1): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU()
        )
        (cv2): Conv(
          (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU()
        )
      )
    )
  )
  (24): Detect(
    (m): ModuleList(
      (0): Conv2d(192, 255, kernel_size=(1, 1), stride=(1, 1))
      (1): Conv2d(384, 255, kernel_size=(1, 1), stride=(1, 1))
      (2): Conv2d(768, 255, kernel_size=(1, 1), stride=(1, 1))
    )
  )
)

你可能感兴趣的:(深度学习,深度学习,神经网络,计算机视觉)