原版AlexNet用于ImageNet2012数据集的分类预测。但是ImageNet是一个巨大的数据集(132GB),同时在最新的Pytorch当中也不提供下载了,所以便于练习,这里复现使用了CIFAR10这个小型数据集,同时因为图片尺寸不同,对网络也进行了相应的调整。
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
主要差异在以下四点:
除此以外AlexNet的参数总量是LeNet-5的数十倍。原文
size/operation | kernel size | Depth stride | Stride | Padding | # Parameters |
---|---|---|---|---|---|
3 * 227 * 227 | |||||
Conv1 + ReLU | 11 | 96 | 4 | (11 * 11 * 3 + 1) * 96 = 34944 | |
96 * 55 * 55 | |||||
Max Pooling | 3 | 2 | |||
96 * 27 * 27 | |||||
Norm | |||||
Conv2 + ReLU | 5 | 256 | 1 | 2 | (5 * 5 * 96 + 1) * 256 = 614656 |
256 * 27 * 27 | |||||
Max Pooling | 3 | 2 | |||
256 * 13 * 13 | |||||
Norm | |||||
Conv3+ ReLU | 3 | 384 | 1 | 1 | (3 * 3 * 256 + 1) * 384 = 885120 |
384 * 13 * 13 | |||||
Conv4+ ReLU | 3 | 384 | 1 | 1 | (3 * 3 * 384 + 1) * 384 = 1327488 |
384 * 13 * 13 | |||||
Conv3+ ReLU | 3 | 256 | 1 | 1 | (3 * 3 * 384 + 1) * 256 = 884992 |
256 * 13 * 13 | |||||
Max Pooling | 3 | 2 | |||
256 * 6 * 6 | |||||
FC6+ReLU | (256 * 6 * 6 ) * 4096 = 37748736 | ||||
4096 | |||||
Dropout (rate 0.5) | |||||
FC7 + ReLU | 4096 * 4096 = 16777216 | ||||
4096 | |||||
FC8 + ReLU | 4096 * 1000 = 4096000 | ||||
1000 |
size/operation | kernel size | Depth stride | Stride | Padding | # Parameters |
---|---|---|---|---|---|
3 * 32 * 32 | |||||
Conv1 + ReLU | 7 | 96 | 2 | 2 | |
96 * 15 * 15 | |||||
Max Pooling | 3 | 2 | |||
96 * 7 * 7 | |||||
Norm | |||||
Conv2 + ReLU | 5 | 256 | 1 | 2 | |
256 * 7 * 7 | |||||
Max Pooling | 3 | 2 | |||
256 * 3* 3 | |||||
Norm | |||||
Conv3+ ReLU | 3 | 384 | 1 | 1 | |
384 * 3 * 3 | |||||
Conv4+ ReLU | 3 | 384 | 1 | 1 | |
384 * 3 * 3 | |||||
Conv5+ ReLU (相比于原模型这里少了MaxPool) | 3 | 256 | 1 | 1 | |
256 * 3 * 3 | |||||
FC6+ReLU | |||||
1024 | |||||
FC7 + ReLU | |||||
512 | |||||
FC8 + ReLU | |||||
10 |
batch_size = 64
lr = 0.001
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(...)
Epochs = 10
class AlexNet(nn.Module):
def __init__(self):
super(AlexNet, self).__init__()
self.cnn = nn.Sequential(
nn.Conv2d(3, 96, 7, 2, 2),
nn.ReLU(inplace=True),
nn.MaxPool2d(3, 2, 0),
nn.Conv2d(96, 256, 5, 1, 2),
nn.ReLU(inplace=True),
nn.MaxPool2d(3, 2, 0),
nn.Conv2d(256, 384, 3, 1, 1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 384, 3, 1, 1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, 3, 1, 1),
nn.ReLU(inplace=True)
)
self.fc = nn.Sequential(
nn.Linear(256*3*3, 1024),
nn.Dropout(p = 0.5),
nn.ReLU(),
nn.Linear(1024, 512),
nn.Dropout(p = 0.5),
nn.ReLU(),
nn.Linear(512, 10)
)
def forward(self, x):
x = self.cnn(x)
x = x.view(x.size()[0], -1)
x = self.fc(x)
return x
使用的数据集是cifar10(32x32图像)
由于原始的AlexNet是在ImageNet数据集(227x227尺寸)上训练的,因此在cifar10上训练的模型需要进行修改(kernel size和全连接层的节点数量)。
一般认为BatchNorm的作用是加速收敛,并不能显著降低过拟合。
Dropout 也就是随机丢弃一定比例的神经元,起到防止过拟合的作用。