【Yolov1模型复现】— by pytorch

参考视频

Model

模型结构【Yolov1模型复现】— by pytorch_第1张图片

*ps:最后4096 -> 7x7x30之间应该是:4096 -> 1470(=7x7x30) ->1470 reshape 为 7x7x30
输入:448 x 448 x 3 的图片
输出:7 x 7 x 30 向量,解释如下
【Yolov1模型复现】— by pytorch_第2张图片
7 x 7:论文中将每张图片划分为7 x 7的grid cell
30:每一个grid cell会生成两个boundingbox,1~5为第一个boundingbox的参数,6~10为第二个,11~30为20个类别的条件概率(论文中所使用的数据集所包含的类别)
boundingbox参数:
P(object):是物体的概率
x,y:grid cell 中心点的位置
w,h:boundingbox 的宽高
P(class|object):是物体的条件下是某个类的概率
【Yolov1模型复现】— by pytorch_第3张图片

代码

import torch
import torch.nn as nn

""" 
元祖:代表一个卷积结构 (kernel_size, out_channels, stride, padding) 
"M" 代表 maxpooling,stride 2x2 and kernel 2x2
列表: 包含卷积结构和重复次数
"""

architecture_config = [
    (7, 64, 2, 3),
    "M",
    (3, 192, 1, 1),
    "M",
    (1, 128, 1, 0),
    (3, 256, 1, 1),
    (1, 256, 1, 0),
    (3, 512, 1, 1),
    "M",
    [(1, 256, 1, 0), (3, 512, 1, 1), 4],
    (1, 512, 1, 0),
    (3, 1024, 1, 1),
    "M",
    [(1, 512, 1, 0), (3, 1024, 1, 1), 2],
    (3, 1024, 1, 1),
    (3, 1024, 2, 1),
    (3, 1024, 1, 1),
    (3, 1024, 1, 1),
]

# 构建一个cnn block 方便重复调用
class CNNBlock(nn.Module):
    def __init__(self, in_channels, out_channels, **kwargs):
        super().__init__()
        self.cnnblock = nn.Sequential(
        	nn.Conv2d(in_channels, out_channels, bias=False, **kwargs),
        	nn.BatchNorm2d(out_channels),
        	nn.LeakyReLU(0.1)
        )

    def forward(self, x):
        return self.cnnblock(x)

class Yolov1(nn.Module):
    def __init__(self, in_channels=3, **kwargs):
        super(Yolov1, self).__init__()
        self.architecture = architecture_config
        self.in_channels = in_channels
        # 作者表示该网络再借助函数实现,因此成为 darknet
        self.darknet = self._create_conv_layers(self.architecture)
        # fcs 最后的全联接层
        self.fcs = self._create_fcs(**kwargs)

    def forward(self, x):
        x = self.darknet(x)
        # 在输入全联接层前先展开为一个一维的向量
        return self.fcs(torch.flatten(x, start_dim=1))

    def _create_conv_layers(self, architecture):
    	# 将所构建的层都用一个列表保存起来
        layers = []
        in_channels = self.in_channels
		#遍历结构配置
        for x in architecture:
            if type(x) == tuple:
                layers += [
                    CNNBlock(
                        in_channels, 
                        out_channels=x[1], 
                        kernel_size=x[0], 
                        stride=x[2], 
                        padding=x[3],
                    )
                ]
                # 通过一层
                in_channels = x[1]

            elif type(x) == str:
                layers += [nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))]

            elif type(x) == list:
            	# 元组 1 —— 卷积核 1
                conv1 = x[0]
                # 元祖 2 —— 卷积核 2
                conv2 = x[1]
                num_repeats = x[2]

                for _ in range(num_repeats):
                    layers += [
                        CNNBlock(
                            in_channels,
                            out_channels=conv1[1],
                            kernel_size=conv1[0],
                            stride=conv1[2],
                            padding=conv1[3],
                        )
                    ]
                    layers += [
                        CNNBlock(
                            in_channels=conv1[1],
                            out_channels=conv2[1],
                            kernel_size=conv2[0],
                            stride=conv2[2],
                            padding=conv2[3],
                        )
                    ]
                    in_channels = conv2[1]

		# 将列表解包,获得所构建的每一层,python语法
        return nn.Sequential(*layers)

    def _create_fcs(self, split_size, num_boxes, num_classes):
        S, B, C = split_size, num_boxes, num_classes
        
        return nn.Sequential(
            nn.Flatten(),
            nn.Linear(1024 * S * S, 4096),
            nn.Dropout(0.0),
            nn.LeakyReLU(0.1),
            # 如*所述:4096 -> 1470
            nn.Linear(4096, S * S * (C + B * 5)),
        )

你可能感兴趣的:(pytorch,深度学习,人工智能)