AlexNet是一个卷积神经网络(CNN)的架构,由Alex Krizhevsky和Ilya Sutskever在Geoffrey Hinton的指导下设计。
AlexNet在2012年的ImageNet大规模视觉识别挑战赛中获得了第一名,它被认为是计算机视觉领域最有影响力的论文之一,推动了更多的论文使用CNN和GPU来加速深度学习。
AlexNet共有8层结构,前5层为卷积层用作特征提取,后3层为全连接层用作分类器。为了适应两个GPU的并行计算,AlexNet使用了分组卷积。
AlexNet输入的数据源是ImageNet数据库,它包含了超过一百万张的彩色图像,分为1000个不同的类别。
AlexNet输入的图像大小是227 X 227
,如果原始图像不是这个大小,需要先进行缩放或裁剪;输入的图像是RGB格式,每个通道有3个颜色分量。
本文使用的是CIFAR-100数据集,它包含60000张32x32的彩色图像,涵盖100个类别。其中50000张图像用于训练,10000张图像用于测试。每个类别包含600张图像;输入图像采用224 X 224
。
transform = {
# 进行数据增强的处理
"train": transforms.Compose([transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
"val": transforms.Compose([transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
}
nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers
print('Using {} dataloader workers every process'.format(nw))
train_dataset = torchvision.datasets.CIFAR100(root='path/to/train/data', train=True, download=True,
transform=transform["train"])
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=nw)
validate_dataset = torchvision.datasets.CIFAR100(root='path/to/test/data', train=False, download=False,
transform=transform["val"])
validate_loader = torch.utils.data.DataLoader(validate_dataset, batch_size=batch_size, shuffle=False, num_workers=nw)
这里的数据集预处理,主要有:
transforms.RandomResizedCrop(224)
:随机裁剪并缩放;transforms.RandomHorizontalFlip()
:随机翻转;transforms.ToTensor()
:将PIL Image或者numpy.ndarray数据类型转化为torch.FloatTensor类型,且数值归一化到[0.0, 1.0]
;transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])
:对数据进行标准化处理,使得数据的均值为mean,标准差为std。nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2)
该层主要作用是提取输入图像的特征。
11 X 11
[batch_size, 3, 224, 224]
[batch_size, 48, 54, 54]
输出形状里的54计算过程:
( 224 − 11 + 2 ∗ 2 ) / 4 + 1 = 54 (224-11+2*2) / 4 + 1 = 54 (224−11+2∗2)/4+1=54
nn.MaxPool2d(kernel_size=3, stride=2)
该层主要作用是对特征进行下采样,减小特征图的大小,同时保留特征的主要信息。
3 X 3
[batch_size, 48, 54, 54]
[batch_size, 48, 27, 27]
nn.Conv2d(48, 128, kernel_size=5, padding=2)
该层主要作用是进一步提取特征。
5 X 5
[batch_size, 48, 27, 27]
[batch_size, 128, 27, 27]
nn.MaxPool2d(kernel_size=3, stride=2)
该层主要作用是对特征进行下采样,减小特征图的大小,同时保留特征的主要信息。
3 X 3
[batch_size, 128, 27, 27]
[batch_size, 128, 13, 13]
nn.Conv2d(128, 192, kernel_size=3, padding=1)
该层主要作用是进一步提取特征。
3 X 3
[batch_size, 128, 13, 13]
[batch_size, 192, 13, 13]
nn.Conv2d(192, 192, kernel_size=3, padding=1)
该层主要作用是进一步提取特征。
3 X 3
[batch_size, 192, 13, 13]
[batch_size, 192, 13, 13]
nn.Conv2d(192, 128, kernel_size=3, padding=1)
该层主要作用是进一步提取特征。
3 X 3
[batch_size, 192, 13, 13]
[batch_size, 128, 13, 13]
nn.MaxPool2d(kernel_size=3, stride=2)
该层主要作用是对特征进行下采样,减小特征图的大小,同时保留特征的主要信息。
3 X 3
[batch_size, 128, 13, 13]
[batch_size, 128, 6, 6]
self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
将输入的特征图进行降维,将每个通道的特征图转化为一个标量,从而得到一个固定长度的特征向量。
(batch_size, 256, 4, 4)
(batch_size, 256, 6, 6)
全局平均池化层的作用是将特征图的空间信息进行压缩,从而减少模型的参数数量,防止过拟合。
x = torch.flatten(x, 1)
在前身传播的过程中,后面会使用torch.flatten对平均池化的张量处理,将输入的张量x展平成一个一维的张量,方便后面的全连接层的输入。其中,第一个参数1表示从第二个维度开始展平,即将每个样本的所有通道的像素值展平成一个一维的向量。
展平后输出形状为: (batch_size, 256 * 6 * 6)
在AlexNet模型中,分类器由三个全连接层组成。
在每个全连接层之间,都加入了一个Dropout层和一个ReLU激活函数。其中:
分类器代码如下:
self.classifier = nn.Sequential(
nn.Dropout(p=0.5),
# 输入形状:[batch_size, 128*6*6];输出形状:[batch_size, 2048]
nn.Linear(128 * 6 * 6, 2048),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5),
# 输入形状:[batch_size, 2048];输出形状:[batch_size, 2048]
nn.Linear(2048, 2048),
nn.ReLU(inplace=True),
# 输入形状:[batch_size, 2048];输出形状:[batch_size, num_classes]
nn.Linear(2048, num_classes),
)
nn.Linear(128 * 6 * 6, 2048)
该层主要作用是将卷积层提取的特征进行分类。
[batch_size, 128 X 6 X 6]
[batch_size, 2048]
。nn.Linear(2048, 2048)
该层主要作用是进一步提取特征,使用了一个 2048 维的向量进行特征提取。
[batch_size, 2048]
[batch_size, 2048]
nn.Linear(2048, num_classes)
[batch_size, 2048]
[batch_size, 100]
import torchvision
import os
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms
from tqdm import tqdm
class AlexNet(nn.Module):
def __init__(self, num_classes=1000):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(48, 128, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(128, 192, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 192, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 128, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
# output[128, 6, 6]
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.classifier = nn.Sequential(
nn.Dropout(p=0.5),
nn.Linear(128 * 6 * 6, 2048),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5),
nn.Linear(2048, 2048),
nn.ReLU(inplace=True),
nn.Linear(2048, num_classes),
)
def forward(self, x):
x = self.features(x)
for i, layer in enumerate(self.features):
x = layer(x)
print('Shape after layer {}: {}'.format(i + 1, x.shape))
x = torch.flatten(x, start_dim=1)
x = self.classifier(x)
return x
# 超参数
num_epochs = 200
batch_size = 200
learning_rate = 0.002
save_model = './path/model'
transform = {
# 进行数据增强的处理
"train": transforms.Compose([transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
"val": transforms.Compose([transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
}
nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers
print('Using {} dataloader workers every process'.format(nw))
train_dataset = torchvision.datasets.CIFAR100(root='path/to/train/data', train=True, download=True,
transform=transform["train"])
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=nw)
validate_dataset = torchvision.datasets.CIFAR100(root='path/to/test/data', train=False, download=True,
transform=transform["val"])
validate_loader = torch.utils.data.DataLoader(validate_dataset, batch_size=batch_size, shuffle=False, num_workers=nw)
train_num = len(train_dataset)
validate_num = len(validate_dataset)
print("using {} images for training, {} images for validation.".format(train_num, validate_num))
print("using {} images for training, {} images for validation.".format(train_num, validate_num))
# 定义模型、损失函数和优化器
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("using {} device.".format(device))
model = AlexNet(num_classes=100).to(device)
loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
best_acc = 0.0
# switch to train mode
model.train()
# 记录训练产生的数据
train_acc_list = []
train_loss_list = []
val_acc_list = []
for epoch in range(num_epochs):
# train
model.train()
running_loss_train = 0.0
train_accurate = 0.0
train_bar = tqdm(train_loader)
for images, labels in train_bar:
optimizer.zero_grad()
outputs = model(images.to(device))
loss = loss_function(outputs, labels.to(device))
loss.backward()
optimizer.step()
predict = torch.max(outputs, dim=1)[1]
train_accurate += torch.eq(predict, labels.to(device)).sum().item()
running_loss_train += loss.item()
train_accurate = train_accurate / train_num
running_loss_train = running_loss_train / train_num
train_acc_list.append(train_accurate)
train_loss_list.append(running_loss_train)
print('[epoch %d] train_loss: %.7f train_accuracy: %.3f' %
(epoch + 1, running_loss_train, train_accurate))
# validate
model.eval()
acc = 0.0 # accumulate accurate number / epoch
with torch.no_grad():
validate_loader = tqdm(validate_loader)
for val_data in validate_loader:
val_images, val_labels = val_data
outputs = model(val_images.to(device))
predict_y = torch.max(outputs, dim=1)[1]
acc += torch.eq(predict_y, val_labels.to(device)).sum().item()
val_accurate = acc / validate_num
val_acc_list.append(val_accurate)
print('[epoch %d] val_accuracy: %.3f' % (epoch + 1, val_accurate))
# 选择最好的模型进行保存
if val_accurate > best_acc:
best_acc = val_accurate
torch.save(model.state_dict(), save_model)
# 测试模型
model.eval()
with torch.no_grad():
correct = 0
total = 0
for images, labels in validate_loader:
images = images.to(device)
labels = labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += torch.eq(predicted, labels).sum().item()
print('Test Accuracy of the model on the {} test images: {} %'.format(total, 100 * correct / total))