[ 图像分类 ] 经典网络模型3——VGG 详解与复现


Author :Horizon Max

编程技巧篇:各种操作小结

机器视觉篇:会变魔术 OpenCV

深度学习篇:简单入门 PyTorch

神经网络篇:经典网络模型

算法篇:再忙也别忘了 LeetCode


[ 图像分类 ] 经典网络模型3——VGG 详解与复现

  • Visual Geometry Group
  • VGG16 详解
    • VGG16 网络特点
    • VGG16 网络结构
  • VGG16 复现

Visual Geometry Group

Visual Geometry Group 简称 VGG,是牛津大学视觉几何组 Oxford Visual Geometry Group 的缩写;

在2014年 ImageNet 挑战赛(ILSVRC) localisation 和 classification 任务中分别获得 冠军亚军

VGG网络的提出证明了增加 网络深度 能够在一定程度上影响网络最终的性能;


论文地址:Very Deep Convolutional Networks for Large Scale Image Recognition


[ 图像分类 ] 经典网络模型3——VGG 详解与复现_第1张图片


VGG16 详解

VGG16 网络特点

(1)小卷积核,卷积核大多采用 3x3 大小,部分采用 1x1 大小;
(2)小池化核,池化核都采用 2x2 大小,步长为 2
(3)网络更深3个3x3 卷积核来代替 7x7 卷积核,2个3x3 卷积核来代替 5x5 卷积核,减小参数量;
(4)全连接转卷积,网络测试阶段将训练阶段的三个全连接替换为三个卷积;

C 中采用的1x1卷积核:增加 决策函数的非线性 的同时不影响转换层的接受域;


VGG16 网络结构

输入大小为(224 x 224 x 3)
conv1:两次卷积(3,3),输出通道(3 → 64),池化(2,2),输出(112 x 112 x 64)
conv2:两次卷积(3,3),输出通道(64 → 128),池化(2,2),输出(56 x 56 x 128)
conv3:三次卷积(3,3),输出通道(128 → 256),池化(2,2),输出(28 x 28 x 256)
conv4:三次卷积(3,3),输出通道(256 → 512),池化(2,2),输出(14 x 14 x 512)
conv5:三次卷积(3,3),输出通道(512 → 512),池化(2,2),输出(7 x 7 x 512)
FC:而后通过三次全连接层,输出(1 x num_classes)
网络测试阶段将训练阶段的三个全连接替换为三个卷积:(1 x 1 x num_classes)

VGG16 C和D的区别:在每次卷积后有没有添加 BatchNorm2d()
C:Conv2d → ReLU
D:Conv2d → BatchNorm2d → ReLU


网络结构图

[ 图像分类 ] 经典网络模型3——VGG 详解与复现_第2张图片


VGG16 复现

# Here is the code :

import torch
import torch.nn as nn
from torchinfo import summary

vgg_layer = {
    'VGG11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}


class VGG(nn.Module):
    def __init__(self, vgg_name, num_classes=1000):
        super(VGG, self).__init__()
        self.features = self._make_layers(vgg_layer[vgg_name])
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        out = self.classifier(x)
        return out

    def _make_layers(self, vgg_layer):
        layers = []
        in_channels = 3
        for x in vgg_layer:
            if x == 'M':
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
            else:
                layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1, bias=False),
                           nn.BatchNorm2d(x),
                           nn.ReLU(inplace=True)]
                in_channels = x
        return nn.Sequential(*layers)

def VGG11():
    return VGG('VGG11')


def VGG13():
    return VGG('VGG13')


def VGG16():
    return VGG('VGG16')


def VGG19():
    return VGG('VGG19')


def test():
    net = VGG16()
    y = net(torch.randn(1, 3, 224, 224))
    print(y.size())
    summary(net, (1, 3, 224, 224))


if __name__ == '__main__':
    test()

输出结果:

torch.Size([1, 1000])
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
VGG                                      --                        --
├─Sequential: 1-1                        [1, 512, 7, 7]            --
│    └─Conv2d: 2-1                       [1, 64, 224, 224]         1,728
│    └─BatchNorm2d: 2-2                  [1, 64, 224, 224]         128
│    └─ReLU: 2-3                         [1, 64, 224, 224]         --
│    └─Conv2d: 2-4                       [1, 64, 224, 224]         36,864
│    └─BatchNorm2d: 2-5                  [1, 64, 224, 224]         128
│    └─ReLU: 2-6                         [1, 64, 224, 224]         --
│    └─MaxPool2d: 2-7                    [1, 64, 112, 112]         --
│    └─Conv2d: 2-8                       [1, 128, 112, 112]        73,728
│    └─BatchNorm2d: 2-9                  [1, 128, 112, 112]        256
│    └─ReLU: 2-10                        [1, 128, 112, 112]        --
│    └─Conv2d: 2-11                      [1, 128, 112, 112]        147,456
│    └─BatchNorm2d: 2-12                 [1, 128, 112, 112]        256
│    └─ReLU: 2-13                        [1, 128, 112, 112]        --
│    └─MaxPool2d: 2-14                   [1, 128, 56, 56]          --
│    └─Conv2d: 2-15                      [1, 256, 56, 56]          294,912
│    └─BatchNorm2d: 2-16                 [1, 256, 56, 56]          512
│    └─ReLU: 2-17                        [1, 256, 56, 56]          --
│    └─Conv2d: 2-18                      [1, 256, 56, 56]          589,824
│    └─BatchNorm2d: 2-19                 [1, 256, 56, 56]          512
│    └─ReLU: 2-20                        [1, 256, 56, 56]          --
│    └─Conv2d: 2-21                      [1, 256, 56, 56]          589,824
│    └─BatchNorm2d: 2-22                 [1, 256, 56, 56]          512
│    └─ReLU: 2-23                        [1, 256, 56, 56]          --
│    └─MaxPool2d: 2-24                   [1, 256, 28, 28]          --
│    └─Conv2d: 2-25                      [1, 512, 28, 28]          1,179,648
│    └─BatchNorm2d: 2-26                 [1, 512, 28, 28]          1,024
│    └─ReLU: 2-27                        [1, 512, 28, 28]          --
│    └─Conv2d: 2-28                      [1, 512, 28, 28]          2,359,296
│    └─BatchNorm2d: 2-29                 [1, 512, 28, 28]          1,024
│    └─ReLU: 2-30                        [1, 512, 28, 28]          --
│    └─Conv2d: 2-31                      [1, 512, 28, 28]          2,359,296
│    └─BatchNorm2d: 2-32                 [1, 512, 28, 28]          1,024
│    └─ReLU: 2-33                        [1, 512, 28, 28]          --
│    └─MaxPool2d: 2-34                   [1, 512, 14, 14]          --
│    └─Conv2d: 2-35                      [1, 512, 14, 14]          2,359,296
│    └─BatchNorm2d: 2-36                 [1, 512, 14, 14]          1,024
│    └─ReLU: 2-37                        [1, 512, 14, 14]          --
│    └─Conv2d: 2-38                      [1, 512, 14, 14]          2,359,296
│    └─BatchNorm2d: 2-39                 [1, 512, 14, 14]          1,024
│    └─ReLU: 2-40                        [1, 512, 14, 14]          --
│    └─Conv2d: 2-41                      [1, 512, 14, 14]          2,359,296
│    └─BatchNorm2d: 2-42                 [1, 512, 14, 14]          1,024
│    └─ReLU: 2-43                        [1, 512, 14, 14]          --
│    └─MaxPool2d: 2-44                   [1, 512, 7, 7]            --
├─AdaptiveAvgPool2d: 1-2                 [1, 512, 7, 7]            --
├─Sequential: 1-3                        [1, 1000]                 --
│    └─Linear: 2-45                      [1, 4096]                 102,764,544
│    └─ReLU: 2-46                        [1, 4096]                 --
│    └─Dropout: 2-47                     [1, 4096]                 --
│    └─Linear: 2-48                      [1, 4096]                 16,781,312
│    └─ReLU: 2-49                        [1, 4096]                 --
│    └─Dropout: 2-50                     [1, 4096]                 --
│    └─Linear: 2-51                      [1, 1000]                 4,097,000
==========================================================================================
Total params: 138,361,768
Trainable params: 138,361,768
Non-trainable params: 0
Total mult-adds (G): 15.47
==========================================================================================
Input size (MB): 0.60
Forward/backward pass size (MB): 216.83
Params size (MB): 553.45
Estimated Total Size (MB): 770.88
==========================================================================================


你可能感兴趣的:(经典网络模型,人工智能,深度学习,图像分类,VGG,神经网络)