参考视频:土堆说 卷积计算
https://www.bilibili.com/video/BV1hE411t7RN
关于 torch.nn 和 torch.nn.function
torch.nn 是对 torch.nn.function 的封装,前者更方便实用。
卷积过程可见文末动画
卷积层输入特征图(input feature map)的尺寸为:H_i × W_i × C_i
卷积层的参数如下:
输出特征图(output feature map)的尺寸为 H_o × W_o × C_o
,其中每一个变量的计算方式如下:
参数量大小的计算,分为weights和biases:
首先,计算weights的参数量:F × F × C_i × K
接着计算biases的参数量:K
所以总参数量为:F × F × C_i × K + K
计算示例
输入 | 卷积核 | 步长 | padding | 输出 | 计算 |
---|---|---|---|---|---|
5x5 | 2x2 | 1 | 0 | 4x4 | 4 = (5-2)/1 + 1 |
5x5 | 3x3 | 1 | 0 | 3x3 | 3 = (5-3)/1 + 1 |
5x5 | 2x2 | 2 | 0 | 2x2 | 2 = (5-2)/2 + 1 |
6x6 | 2x2 | 2 | 0 | 3x3 | 3 = (6-2)/2 + 1 |
5x5 | 2x2 | 1 | 1 | 6x6 | 4 = (5 + 1*2 - 2)/1 + 1 |
5x5 | 3x3 | 1 | 1 | 5x5 | 3 = (5 + 1*2 - 3)/1 + 1 |
5x5 | 3x3 | 2 | 2 | 4x4 | 3 = (5 + 2*2 - 3)/2 + 1 |
torch.nn.functional.conv2d
官方说明torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode=‘zeros’, device=None, dtype=None)
一般只设置前五个参数
import torch
import torch.nn.functional as F
t1 = torch.Tensor([[1, 2, 0, 3, 1],
[0, 1, 2, 3, 1],
[1, 2, 1, 0, 0],
[5, 2, 3, 1, 1],
[2, 1, 0, 1, 1], ])
kernel = torch.Tensor([
[1, 2, 1],
[0, 1, 0],
[2, 1, 0]
])
t1.shape, kernel.shape
# (torch.Size([5, 5]), torch.Size([3, 3]))
# channel 和 batch_size 为 1
ip = torch.reshape(t1, (1, 1, 5, 5))
kernel = torch.reshape(kernel, (1, 1, 3, 3))
ip.shape, kernel.shape
# (torch.Size([1, 1, 5, 5]), torch.Size([1, 1, 3, 3]))
op = F.conv2d(ip, kernel, stride=1)
op, op.shape
'''
(tensor([[[[10., 12., 12.],
[18., 16., 16.],
[13., 9., 3.]]]]),
torch.Size([1, 1, 3, 3]))
'''
# 不同 stride
op = F.conv2d(ip, kernel, stride=2)
op, op.shape
'''
(tensor([[[[10., 12.],
[13., 3.]]]]),
torch.Size([1, 1, 2, 2]))
'''
# 增加 padding
op = F.conv2d(ip, kernel, stride=2, padding=1)
op, op.shape
'''
(tensor([[[[ 1., 4., 8.],
[ 7., 16., 8.],
[14., 9., 4.]]]]),
torch.Size([1, 1, 3, 3]))
'''
池化的目的是保留特征,减少数据量;
最大池化也被称为 下采样;
另外池化操作是分别应用到每一个深度切片层。输出深度 与 输入的深度 相同。
输出为:
H_o = (H_i - f_h)/S + 1
W_o = (W_i - f_w)/S + 1
D_o = D_i
输入维度是 4x4x5 (HxWxD)
滤波器大小 2x2 (HxW)
stride 的高和宽都是 2 (S)
torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
代码实现
import torch
import torch.nn as nn
# MaxPool2d 函数 input 需要是 4维
ip = torch.reshape(t1, (-1, 1, 5, 5))
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.maxpool = nn.MaxPool2d(kernel_size=3, ceil_mode=True)
# self.maxpool = nn.MaxPool2d(kernel_size=3, ceil_mode=False)
def forward(self, input):
output = self.maxpool(input)
return output
net = Net()
ret = net(ip)
ret
# tensor([[[[2., 3.], [5., 1.]]]]) # ceil_mode=True
# tensor([[[[2.]]]]) # ceil_mode=False
数据集中调用
import torchvision
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
data_path = '/xxxx/cifar10'
datasets = torchvision.datasets.CIFAR10(data_path, train=False, download=True,
transform=torchvision.transforms.ToTensor())
data_loader = DataLoader(datasets, batch_size=64)
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.maxpool1 = nn.MaxPool2d(kernel_size=3, ceil_mode=False)
# self.maxpool = nn.MaxPool2d(kernel_size=3, ceil_mode=False)
def forward(self, input):
output = self.maxpool1(input)
return output
writer = SummaryWriter('logs_maxpool1')
step = 0
net = Net()
for data in data_loader:
imgs, targets = data
writer.add_images('input', imgs, step)
output = net(imgs)
writer.add_images('output', output, step)
step = step + 1
writer.close()
tensorboard --logdir=logs_maxpool1
图片来自:https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
Arbitrary padding, no strides
Full padding, no strides
No padding, strides
Padding, strides
Padding, strides (odd)
No padding, no strides, transposed
Arbitrary padding, no strides, transposed
Half padding, no strides, transposed
Full padding, no strides, transposed
No padding, strides, transposed
Padding, strides, transposed (odd)