早在yolov3和yolov4的时候模型的定义搭建还都是以python模块的形式出现的,到了yolov5的时候,作者直接大改之前的方式,变成了基于yaml配置文件的形式了,虽说是可以很方便地通过参数来控制模型的复杂度和参数里,但是看起来就抽象了很多了,着实不是很方便,这里主要的目的就是将原始的yaml文件转化为python模块,方便学习理解。
原始yolov5s.yaml内容如下:
# parameters
nc: 80 # number of classes
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple
# anchors
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
# YOLOv5 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Focus, [64, 3]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, BottleneckCSP, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 9, BottleneckCSP, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
[-1, 9, BottleneckCSP, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 1, SPP, [1024, [5, 9, 13]]],
[-1, 3, BottleneckCSP, [1024, False]], # 9
]
# YOLOv5 head
head:
[[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P4
[-1, 3, BottleneckCSP, [512, False]], # 13
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]], # cat backbone P3
[-1, 3, BottleneckCSP, [256, False]], # 17 (P3/8-small)
[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]], # cat head P4
[-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]], # cat head P5
[-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]
经过翻译转化处理后得到的yolov5s.py如下所示:
from common import *
class yolov5s(nn.Module):
def __init__(
self,
num_classes=80,
anchors=[
[10, 13, 16, 30, 33, 23],
[30, 61, 62, 45, 59, 119],
[116, 90, 156, 198, 373, 326],
],
training=False,
):
super().__init__()
self.seq0_Focus = Focus(3, 32, 3)
self.seq1_Conv = Conv(32, 64, 3, 2)
self.seq2_BottleneckCSP = BottleneckCSP(64, 64, 1)
self.seq3_Conv = Conv(64, 128, 3, 2)
self.seq4_BottleneckCSP = BottleneckCSP(128, 128, 3)
self.seq5_Conv = Conv(128, 256, 3, 2)
self.seq6_BottleneckCSP = BottleneckCSP(256, 256, 3)
self.seq7_Conv = Conv(256, 512, 3, 2)
self.seq8_SPP = SPP(512, 512, [5, 9, 13])
self.seq9_BottleneckCSP = BottleneckCSP(512, 512, 1, False)
self.seq10_Conv = Conv(512, 256, 1, 1)
self.seq13_BottleneckCSP = BottleneckCSP(512, 256, 1, False)
self.seq14_Conv = Conv(256, 128, 1, 1)
self.seq17_BottleneckCSP = BottleneckCSP(256, 128, 1, False)
self.seq18_Conv = Conv(128, 128, 3, 2)
self.seq20_BottleneckCSP = BottleneckCSP(256, 256, 1, False)
self.seq21_Conv = Conv(256, 256, 3, 2)
self.seq23_BottleneckCSP = BottleneckCSP(512, 512, 1, False)
self.yolo_layers = Detect(
nc=num_classes, anchors=anchors, ch=[128, 256, 512], training=training
)
def forward(self, x):
x = self.seq0_Focus(x)
x = self.seq1_Conv(x)
x = self.seq2_BottleneckCSP(x)
x = self.seq3_Conv(x)
xRt0 = self.seq4_BottleneckCSP(x)
x = self.seq5_Conv(xRt0)
xRt1 = self.seq6_BottleneckCSP(x)
x = self.seq7_Conv(xRt1)
x = self.seq8_SPP(x)
x = self.seq9_BottleneckCSP(x)
xRt2 = self.seq10_Conv(x)
route = F.interpolate(
xRt2, size=(int(xRt2.shape[2] * 2), int(xRt2.shape[3] * 2)), mode="nearest"
)
x = torch.cat([route, xRt1], dim=1)
x = self.seq13_BottleneckCSP(x)
xRt3 = self.seq14_Conv(x)
route = F.interpolate(
xRt3, size=(int(xRt3.shape[2] * 2), int(xRt3.shape[3] * 2)), mode="nearest"
)
x = torch.cat([route, xRt0], dim=1)
out1 = self.seq17_BottleneckCSP(x)
route = self.seq18_Conv(out1)
x = torch.cat([route, xRt3], dim=1)
out2 = self.seq20_BottleneckCSP(x)
route = self.seq21_Conv(out2)
x = torch.cat([route, xRt2], dim=1)
out3 = self.seq23_BottleneckCSP(x)
output = self.yolo_layers([out1, out2, out3])
return output
直接阅读torch代码还是比较好理解的,相比于原始抽象的yaml文件来说,我还是更倾向于后者的。