Author :Horizon Max
✨ 编程技巧篇:各种操作小结
机器视觉篇:会变魔术 OpenCV
深度学习篇:简单入门 PyTorch
神经网络篇:经典网络模型
算法篇:再忙也别忘了 LeetCode
Visual Geometry Group 简称 VGG
,是牛津大学视觉几何组 Oxford Visual Geometry Group 的缩写;
在2014年 ImageNet 挑战赛(ILSVRC) localisation 和 classification 任务中分别获得 冠军 和 亚军 ;
VGG网络的提出证明了增加 网络深度
能够在一定程度上影响网络最终的性能;
论文地址:Very Deep Convolutional Networks for Large Scale Image Recognition
(1)小卷积核,卷积核大多采用 3x3 大小,部分采用 1x1 大小;
(2)小池化核,池化核都采用 2x2 大小,步长为 2;
(3)网络更深,3个3x3 卷积核来代替 7x7 卷积核,2个3x3 卷积核来代替 5x5 卷积核,减小参数量;
(4)全连接转卷积,网络测试阶段将训练阶段的三个全连接替换为三个卷积;
C 中采用的1x1卷积核:增加 决策函数的非线性 的同时不影响转换层的接受域;
输入大小为(224 x 224 x 3)
conv1:两次卷积(3,3),输出通道(3 → 64),池化(2,2),输出(112 x 112 x 64)
conv2:两次卷积(3,3),输出通道(64 → 128),池化(2,2),输出(56 x 56 x 128)
conv3:三次卷积(3,3),输出通道(128 → 256),池化(2,2),输出(28 x 28 x 256)
conv4:三次卷积(3,3),输出通道(256 → 512),池化(2,2),输出(14 x 14 x 512)
conv5:三次卷积(3,3),输出通道(512 → 512),池化(2,2),输出(7 x 7 x 512)
FC:而后通过三次全连接层,输出(1 x num_classes)
网络测试阶段将训练阶段的三个全连接替换为三个卷积:(1 x 1 x num_classes)
VGG16 C和D的区别:在每次卷积后有没有添加 BatchNorm2d()
C:Conv2d → ReLU
D:Conv2d → BatchNorm2d → ReLU
# Here is the code :
import torch
import torch.nn as nn
from torchinfo import summary
vgg_layer = {
'VGG11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}
class VGG(nn.Module):
def __init__(self, vgg_name, num_classes=1000):
super(VGG, self).__init__()
self.features = self._make_layers(vgg_layer[vgg_name])
self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
out = self.classifier(x)
return out
def _make_layers(self, vgg_layer):
layers = []
in_channels = 3
for x in vgg_layer:
if x == 'M':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
else:
layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(x),
nn.ReLU(inplace=True)]
in_channels = x
return nn.Sequential(*layers)
def VGG11():
return VGG('VGG11')
def VGG13():
return VGG('VGG13')
def VGG16():
return VGG('VGG16')
def VGG19():
return VGG('VGG19')
def test():
net = VGG16()
y = net(torch.randn(1, 3, 224, 224))
print(y.size())
summary(net, (1, 3, 224, 224))
if __name__ == '__main__':
test()
输出结果:
torch.Size([1, 1000])
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
VGG -- --
├─Sequential: 1-1 [1, 512, 7, 7] --
│ └─Conv2d: 2-1 [1, 64, 224, 224] 1,728
│ └─BatchNorm2d: 2-2 [1, 64, 224, 224] 128
│ └─ReLU: 2-3 [1, 64, 224, 224] --
│ └─Conv2d: 2-4 [1, 64, 224, 224] 36,864
│ └─BatchNorm2d: 2-5 [1, 64, 224, 224] 128
│ └─ReLU: 2-6 [1, 64, 224, 224] --
│ └─MaxPool2d: 2-7 [1, 64, 112, 112] --
│ └─Conv2d: 2-8 [1, 128, 112, 112] 73,728
│ └─BatchNorm2d: 2-9 [1, 128, 112, 112] 256
│ └─ReLU: 2-10 [1, 128, 112, 112] --
│ └─Conv2d: 2-11 [1, 128, 112, 112] 147,456
│ └─BatchNorm2d: 2-12 [1, 128, 112, 112] 256
│ └─ReLU: 2-13 [1, 128, 112, 112] --
│ └─MaxPool2d: 2-14 [1, 128, 56, 56] --
│ └─Conv2d: 2-15 [1, 256, 56, 56] 294,912
│ └─BatchNorm2d: 2-16 [1, 256, 56, 56] 512
│ └─ReLU: 2-17 [1, 256, 56, 56] --
│ └─Conv2d: 2-18 [1, 256, 56, 56] 589,824
│ └─BatchNorm2d: 2-19 [1, 256, 56, 56] 512
│ └─ReLU: 2-20 [1, 256, 56, 56] --
│ └─Conv2d: 2-21 [1, 256, 56, 56] 589,824
│ └─BatchNorm2d: 2-22 [1, 256, 56, 56] 512
│ └─ReLU: 2-23 [1, 256, 56, 56] --
│ └─MaxPool2d: 2-24 [1, 256, 28, 28] --
│ └─Conv2d: 2-25 [1, 512, 28, 28] 1,179,648
│ └─BatchNorm2d: 2-26 [1, 512, 28, 28] 1,024
│ └─ReLU: 2-27 [1, 512, 28, 28] --
│ └─Conv2d: 2-28 [1, 512, 28, 28] 2,359,296
│ └─BatchNorm2d: 2-29 [1, 512, 28, 28] 1,024
│ └─ReLU: 2-30 [1, 512, 28, 28] --
│ └─Conv2d: 2-31 [1, 512, 28, 28] 2,359,296
│ └─BatchNorm2d: 2-32 [1, 512, 28, 28] 1,024
│ └─ReLU: 2-33 [1, 512, 28, 28] --
│ └─MaxPool2d: 2-34 [1, 512, 14, 14] --
│ └─Conv2d: 2-35 [1, 512, 14, 14] 2,359,296
│ └─BatchNorm2d: 2-36 [1, 512, 14, 14] 1,024
│ └─ReLU: 2-37 [1, 512, 14, 14] --
│ └─Conv2d: 2-38 [1, 512, 14, 14] 2,359,296
│ └─BatchNorm2d: 2-39 [1, 512, 14, 14] 1,024
│ └─ReLU: 2-40 [1, 512, 14, 14] --
│ └─Conv2d: 2-41 [1, 512, 14, 14] 2,359,296
│ └─BatchNorm2d: 2-42 [1, 512, 14, 14] 1,024
│ └─ReLU: 2-43 [1, 512, 14, 14] --
│ └─MaxPool2d: 2-44 [1, 512, 7, 7] --
├─AdaptiveAvgPool2d: 1-2 [1, 512, 7, 7] --
├─Sequential: 1-3 [1, 1000] --
│ └─Linear: 2-45 [1, 4096] 102,764,544
│ └─ReLU: 2-46 [1, 4096] --
│ └─Dropout: 2-47 [1, 4096] --
│ └─Linear: 2-48 [1, 4096] 16,781,312
│ └─ReLU: 2-49 [1, 4096] --
│ └─Dropout: 2-50 [1, 4096] --
│ └─Linear: 2-51 [1, 1000] 4,097,000
==========================================================================================
Total params: 138,361,768
Trainable params: 138,361,768
Non-trainable params: 0
Total mult-adds (G): 15.47
==========================================================================================
Input size (MB): 0.60
Forward/backward pass size (MB): 216.83
Params size (MB): 553.45
Estimated Total Size (MB): 770.88
==========================================================================================