★b站:https://space.bilibili.com/18161609/channel/index
★CSDN:https://blog.csdn.net/qq_37541097
MobileNet网络是由google团队在2017年提出的,专注于移动端或者嵌入式设备中的轻量级CNN网络。相比传统卷积神经网络,在准确率小幅降低的前提下大大减少模型参数与运算量。(相比VGG16准确率减少了0.9%,但模型参数只有VGG的1/32)
研究动机:传统卷积神经网络, 内存需求大、 运算量大,导致无法在移动设备以及嵌入式设备上运行
论文全称:MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
论文链接:MobileNet(v1):MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
网络中的亮点:
传统卷积:
其中,卷积核channel=输入特征矩阵channel,输出特征矩阵channel = # filters。如上图,输入特征矩阵channel=3,则卷积核channel=3。共有4个filters,则输出特征矩阵channel=4。
DW卷积(Depthwise Convolution):
DW卷积中,每个卷积核的channel都为1,每个卷积核只负责与输入特征矩阵中的1个channel进行卷积运算,然后再得到相应的输出特征矩阵中的1个channel。则,所有卷积核的channel都等于1,且输入特征矩阵channel=# filters(即卷积核个数)=输出特征矩阵channel。
PW卷积(Pointwise Conv):
PW卷积和普通卷积一样,特殊在于卷积核大小为1。
深度可分卷积(Depthwise Separable Conv):
由两部分组成:DW和PW。理论上普通卷积计算量是DW+PW的8到9倍。
上表第1行中Conv/s2表示普通卷积且步距为2,filter shape为3×3×3×32表示卷积核height=3,width=3,channel=3(rgb图片),#filters=32。
第2行Conv dw/s1表示采用DW卷积操作,且步距为1。由于DW卷积的卷积核深度为1,则filter shape为3×3×32 dw表示卷积核height=3,width=3,#filters=32。其中channel=1.
注:MoblieNet相比于GoogLeNet、VGG准确率只降低一点点,但是模型参数大概只有VGG网络的1/32。超参数α指卷积核个数的倍率,控制卷积过程中所采用的卷积核的个数。β指输入图像尺寸。
MobileNet v2网络是由google团队在2018年提出的,相比MobileNet V1网络,准确率更高,模型更小。
论文全称:MobileNetV2: Inverted Residuals and Linear Bottlenecks
论文链接:MobileNet(v2):MobileNetV2: Inverted Residuals and Linear Bottlenecks
网络中的亮点:
(1)残差结构与倒残差结构对比如下:
原始的残差结构先通过1×1卷积降维,然后经过3×3卷积,最后再经过1×1卷积升维。
而倒残差结构先通过1×1卷积升维,然后经过3×3卷积,最后再经过1×1卷积降维。与原始残差结构正好相反。
另外,普通的残差结构中采用的激活函数是relu激活函数,而在倒残差结构中采用的激活函数是relu6激活函数:
y = ReLU6 (x) = min (max (x, 0) , 6)
(2)Linear Bottlenecks
倒残差结构中最后一个1×1的卷积层,它使用了线性的激活函数而不是relu激活函数。因为relu函数对低维特征信息会产生大量损失。而在倒残差结构中最后经过1×1卷积降维,是一个低维特征向量,因此要用线性激活函数代替relu激活函数来避免信息的损失。
(3)原论文中倒残差结构的结构图为:
倒残差结构中的层信息为:
由上图,倒残差结构第1层为普通的卷积层,卷积核大小为1×1,激活函数为ReLU6,(1×1卷积升维),所采用的卷积核个数(# filters)为tk(t为倍率因子,用来扩大深度)。
第2层为DW卷积,卷积核大小为3,步距s为传入参数,使用ReLU6激活函数,输出特征矩阵深度与输入特征矩阵深度相同,为tk。但是高和宽缩减为1/s倍。
第3层为普通1×1卷积层,这里需要注意的是,激活函数使用的是线性激活函数,卷积核个数为k’(人为指定)。
需要注意的是:当stride=1且输入特征矩阵与输出特征矩阵shape相同时才有shortcut连接。(和图中表示的稍有不同)
在很多轻量级的网络中,MobileNet(v3)经常被使用到。MobileNet(v3)是Google在继MobileNet(v2)后提出的v3版本。
论文全称:Searching for MobileNetV3
论文链接:MobileNet(v3):Searching for MobileNetV3
网络中的亮点:
MobileNet(v2)中倒残差结构如下所示:
其中,需要注意的是:当stride=1且 输入特征矩阵与输出特征矩阵shape 相同时才有shortcut连接。
在MobileNet(v3)中,更新了Block,其中主要体现在1.加入了SE模块(注意力机制)。2.更新了激活函数。
其结构如下:
主要改变如下:
1.减少第一个卷积层的卷积核个数 (32->16)
在MobileNet v1,v2中,第一个卷积层的卷积核个数(即#filters or c)都是32,论文作者研究发现,将卷积核个数变为16个后,准确率和32个差不多,但是可以节省2ms的时间。
2.精简Last Stage
在MobileNet(v2)中,常用relu6激活函数。 ReLU6 (x) = min (max (x, 0) , 6)
现在介绍一种新的激活函数:swish (x) = x × σ(x),其中σ(x)=1/(1+e(-x)),但是这种激活函数计算、求导复杂,对量化过程不友好。因此作者提出了h-swish激活函数。
在这之前介绍一下h-sigmoid激活函数,h-sigmoid=ReLu(x+3)/6
定义h-swish函数为:
h-swish[x]=x×h-sigmoid=xReLu(x+3)/6.
作者在文中提到,将sigmoid激活函数替换为h-sigmoid激活函数,将swish激活函数替换为h-swish激活函数,对网络的推理过程有帮助,且对量化过程友好。
(1)MobileNet(v3)-Large
input表示当前层输入特征矩阵的shape,比如表中使用RGB彩色图片,它的高和宽都是244;
Operator表示相应的操作,其中①bneck表示V3中更新后的block,②其后紧跟的3×3表示DW卷积的卷积核大小③最后两层NBN表示不使用BN层;
exp size表示bneck结构中,第1个1×1升维卷积层要将输入特征矩阵升到的维度,即exp size给定多少,就将输入特征矩阵升到多少维;
#out表示输出特征矩阵的channel,上文强调过,为了减少耗时,第1层卷积层中使用的卷积核个数为16;
SE表示是否使用了SE注意力机制;
NL表示非线性激活函数,其中HS表示h-swish激活函数,RE表示使用relu激活函数;
s表示DW卷积的步距。
以下几点需注意:
(2)MobileNet(v3)-Small
MobileNet(v3)-Small与MobileNet(v3)-Large类似,详细参数如下:
首先定义Conv+BN+ReLU这样的组合层,在MobileNet中所有的卷积层,包括DW卷积操作,基本上都是有卷积conv+BN+ReLU6激活函数共同组成,唯一不同的是在倒残差结构的第3层,使用1×1的普通卷积,将其进行降维处理时,使用的是线性激活函数。
接下来定义倒残差结构,def InvertedResidual(nn.Module):
最后定义MobileNet(v2)网络结构,初始化函数中传入参数num_classes,即分类的类别个数。α是超参数,在v1中提到,控制卷积层所使用卷积核个数的倍率,round_nearest为基数,在定义的_make_divisible函数中起作用,_make_divisible的作用是将输入值调整为最接近基数值整数倍的数值。
input_channel = _make_divisible(32 * alpha, round_nearest) # _make_divisible将输入的卷积核个数调整为round_nearest的整数倍
input_channel = _make_divisible(32 * alpha, round_nearest)
即将32×alpha调整为最接近8的整数倍的数值(这里round_nearest值为8)。
模型部分全部代码如下:
from torch import nn
import torch
def _make_divisible(ch, divisor=8, min_ch=None): # ch指输入特征深度,divisor指基数
# 此函数的作用时讲ch调整为指定divisor这个数的整数倍,将ch调整为离8最近的整数倍的数值
"""
This function is taken from the original tf repo.
It ensures that all layers have a channel number that is divisible by 8
It can be seen here:
https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
"""
if min_ch is None:
min_ch = divisor
new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
# Make sure that round down does not go down by more than 10%.
if new_ch < 0.9 * ch:
new_ch += divisor
return new_ch
# 首先定义一个Conv+BN+ReLU这样的组合层,在MobileNet中所有的卷积层,包括DW卷积操作,基本上都是有卷积+BN+ReLU6激活函数共同组成。
class ConvBNReLU(nn.Sequential): # 继承来自于nn.Sequential,而不是nn.Module。与pytorch官方样例保持一致。
def __init__(self, in_channel, out_channel, kernel_size=3, stride=1, groups=1):
# groups如果设置为1,则为普通卷积。如果groups设置为in_channel,则为DW卷积(pytroch中DW卷积也调用nn.Conv2d来实现)
padding = (kernel_size - 1) // 2 # padding根据kernel_size来计算
super(ConvBNReLU, self).__init__( # 在super.__init__()中传入这3个层结构
nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding, groups=groups, bias=False),
# kernel_size默认3,stride默认1,padding计算得到,groups默认等于1,bias不使用(因为下面有BN层)
nn.BatchNorm2d(out_channel), # BN层输入特征矩阵深度为out_channel
nn.ReLU6(inplace=True)
)
class InvertedResidual(nn.Module):
def __init__(self, in_channel, out_channel, stride, expand_ratio):
# expand_ratio为倍率因子,用来扩大深度
super(InvertedResidual, self).__init__()
hidden_channel = in_channel * expand_ratio # hidden_channel为第1层卷积层卷积核个数,即tk
self.use_shortcut = stride == 1 and in_channel == out_channel # 定义1个布尔变量判断是否使用short cut分支
layers = []
if expand_ratio != 1: # 如果倍率因子=1,只有参数表第2行情况,这时不需要残差结构中第1层1×1卷积层
# 1x1 pointwise conv
layers.append(ConvBNReLU(in_channel, hidden_channel, kernel_size=1)) # 第1层:1×1卷积
layers.extend([ # 通过extend函数添加一系列层结构,与append功能相同,但extend能一次性批量插入很多元素
# 3x3 depthwise conv
# 第2层:DW卷积。输入c与输出c相同,都是hidden_channel.groups=hidden_channel控制着DW卷积区别于普通卷积。
ConvBNReLU(hidden_channel, hidden_channel, stride=stride, groups=hidden_channel),
# 1x1 pointwise conv(linear)
# 注意这里是线性激活函数,就不可以用刚才定义的ConvBNReLU()函数,这里用Conv2d。
nn.Conv2d(hidden_channel, out_channel, kernel_size=1, bias=False),
nn.BatchNorm2d(out_channel),
# 线性激活函数y=x,也就是不做处理。则不添加激活函数就相当于是线性激活函数。
])
self.conv = nn.Sequential(*layers)
def forward(self, x):
if self.use_shortcut: # 判断是否满足short cut分支连接条件
return x + self.conv(x) # 如果满足shortcut条件,返回shortcut分支结果与主分支结果的和
else:
return self.conv(x) # 如果不满足shortcut条件,只返回主分支结果
class MobileNetV2(nn.Module):
def __init__(self, num_classes=1000, alpha=1.0, round_nearest=8):
# num_classes分类的类别个数.α超参数,控制卷积层所使用卷积核个数的倍率,round_nearest为基数,在下面_make_divisible函数中
super(MobileNetV2, self).__init__()
block = InvertedResidual # 将上面定义的InvertedResidual类传给block
input_channel = _make_divisible(32 * alpha, round_nearest) # _make_divisible将输入的卷积核个数调整为round_nearest的整数倍
# input_channel表示表格中第1行Conv2d卷积层所使用的卷积核的个数,也等于下一层输入特征矩阵的深度
last_channel = _make_divisible(1280 * alpha, round_nearest)
# last_channel表示表格中倒数第3行1×1卷积层的卷积核个数
# 创建1个list列表,list列表中每一个元素就是表格中bottleneck对应每一行的参数t,c,n,s
inverted_residual_setting = [
# t, c, n, s
[1, 16, 1, 1],
[6, 24, 2, 2],
[6, 32, 3, 2],
[6, 64, 4, 2],
[6, 96, 3, 1],
[6, 160, 3, 2],
[6, 320, 1, 1],
]
features = []
# conv1 layer
features.append(ConvBNReLU(3, input_channel, stride=2)) # 首先在fearures中添加第1个卷积层conv2d
# building inverted residual residual blockes
# 接下来定义一系列block结构
for t, c, n, s in inverted_residual_setting: # 遍历参数列表,这样将,每一层的参数赋给了t,c,n,s
output_channel = _make_divisible(c * alpha, round_nearest) # 将输出的channel个数通过_make_divisible进行调整
for i in range(n): # 通过循环搭建每个block中的倒残差结构,n代表重复n次倒残差结构
stride = s if i == 0 else 1 # s只规定block中第1层倒残差结构的步距,其它层的步距都为1
features.append(block(input_channel, output_channel, stride, expand_ratio=t))
# 在features中添加一系列的倒残差结构,block即上面定义的倒残差结构
input_channel = output_channel # 更新输入特征矩阵深度
# 一系列block定义完毕,即参数信息表中第2行-第8行都在循环中完成
# building last several layers
# 定义参数表倒数第3行,1×1卷积层,输入特征矩阵深度为input_channel,输出特征矩阵深度为上面定义的last_channel,1指卷积核大小为1×1
features.append(ConvBNReLU(input_channel, last_channel, 1))
# combine feature layers
self.features = nn.Sequential(*features)
# 上面所定义的参数表line1-line9可称为特征提取层,通过nn.Sequential()将上面定义的一系列层结构通过位置参数的形式传入,打包为一个整体,取名为features
# 加下来定义分类器部分:包括一个平均池化下采样和一个全连接层
# building classifier
self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # 自适应平均池化下采样,规定输出特征矩阵高和宽都为1
self.classifier = nn.Sequential( # 将Dropout层和全连接层结合在一起,取名为classifier
nn.Dropout(0.2),
nn.Linear(last_channel, num_classes) # num_classes为预测的分类类别个数
)
# weight initialization,初始化权重
for m in self.modules():
if isinstance(m, nn.Conv2d): # 如果是卷积层,对权重进行凯明初始化
nn.init.kaiming_normal_(m.weight, mode='fan_out')
if m.bias is not None: # 如果有bias,将偏置设置为0
nn.init.zeros_(m.bias)
elif isinstance(m, nn.BatchNorm2d): # 如果子模块是BN层
nn.init.ones_(m.weight) # 将方差设置为1
nn.init.zeros_(m.bias) # 将均值设置为0
elif isinstance(m, nn.Linear): # 如果子模块是全连接层
nn.init.normal_(m.weight, 0, 0.01) # normal为正态分布函数,将权重调整为均值为0,方差为0.01的正态分布
nn.init.zeros_(m.bias) # 将偏置设置为0
def forward(self, x):
x = self.features(x) # 先经过特征提取器
x = self.avgpool(x) # 经过平均池化下采样
x = torch.flatten(x, 1) # 展平处理
x = self.classifier(x) # 最后经过分类器
return x
与之前的VGG、GoogleNet、ResNet大致相似,主要区别在于:实例化网络、下载并载入权重文件、修改权重文件和冻结部分权重参数。即:
# create model
net = MobileNetV2(num_classes=5) # 定义预测类别个数为5
# load pretrain weights
# download url: https://download.pytorch.org/models/mobilenet_v2-b0353104.pth
model_weight_path = "./mobilenet_v2.pth"
assert os.path.exists(model_weight_path), "file {} dose not exist.".format(model_weight_path)
pre_weights = torch.load(model_weight_path, map_location=device)
# 载入权重文件后,是一个字典类型,因为预训练是在ImageNet数据集,所以最后一层全连接层节点个数为1000,而自己的数据集为5
# delete classifier weights
pre_dict = {
k: v for k, v in pre_weights.items() if net.state_dict()[k].numel() == v.numel()}
# 遍历权重字典,如果classifer不在层名称中,则将其进行保存,保存在pre_dict字典中
missing_keys, unexpected_keys = net.load_state_dict(pre_dict, strict=False)
# 然后通过net.load_state_dict将权重字典进行载入,这样除了最后一层权重未载入,其他层的权重都载入了
# freeze features weights
# 冻结特征提取部分的所有权重,防止对其进行参数更新
for param in net.features.parameters():
param.requires_grad = False
net.to(device)
训练部分全部代码如下:
import os
import json
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms, datasets
from tqdm import tqdm
from model_v2 import MobileNetV2
# ########与VGG、GoogleNet、ResNet大致相似,主要区别在于line63-line84
def main():
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("using {} device.".format(device))
batch_size = 16
epochs = 5
data_transform = {
"train": transforms.Compose([transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),
"val": transforms.Compose([transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}
data_root = os.path.abspath(os.path.join(os.getcwd(), "../..")) # get data root path
image_path = os.path.join(data_root, "data_set", "flower_data") # flower data set path
assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),
transform=data_transform["train"])
train_num = len(train_dataset)
# {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
flower_list = train_dataset.class_to_idx
cla_dict = dict((val, key) for key, val in flower_list.items())
# write dict into json file
json_str = json.dumps(cla_dict, indent=4)
with open('class_indices.json', 'w') as json_file:
json_file.write(json_str)
nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers
print('Using {} dataloader workers every process'.format(nw))
train_loader = torch.utils.data.DataLoader(train_dataset,
batch_size=batch_size, shuffle=True,
num_workers=nw)
validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),
transform=data_transform["val"])
val_num = len(validate_dataset)
validate_loader = torch.utils.data.DataLoader(validate_dataset,
batch_size=batch_size, shuffle=False,
num_workers=nw)
print("using {} images for training, {} images for validation.".format(train_num,
val_num))
# create model
net = MobileNetV2(num_classes=5) # 定义预测类别个数为5
# load pretrain weights
# download url: https://download.pytorch.org/models/mobilenet_v2-b0353104.pth
model_weight_path = "./mobilenet_v2.pth"
assert os.path.exists(model_weight_path), "file {} dose not exist.".format(model_weight_path)
pre_weights = torch.load(model_weight_path, map_location=device)
# 载入权重文件后,是一个字典类型,因为预训练是在ImageNet数据集,所以最后一层全连接层节点个数为1000,而自己的数据集为5
# delete classifier weights
pre_dict = {
k: v for k, v in pre_weights.items() if net.state_dict()[k].numel() == v.numel()}
# 遍历权重字典,如果classifer不在层名称中,则将其进行保存,保存在pre_dict字典中
missing_keys, unexpected_keys = net.load_state_dict(pre_dict, strict=False)
# 然后通过net.load_state_dict将权重字典进行载入,这样除了最后一层权重未载入,其他层的权重都载入了
# freeze features weights
# 冻结特征提取部分的所有权重,防止对其进行参数更新
for param in net.features.parameters():
param.requires_grad = False
net.to(device)
# define loss function
loss_function = nn.CrossEntropyLoss()
# construct an optimizer
params = [p for p in net.parameters() if p.requires_grad]
optimizer = optim.Adam(params, lr=0.0001)
best_acc = 0.0
save_path = './MobileNetV2.pth'
train_steps = len(train_loader)
for epoch in range(epochs):
# train
net.train()
running_loss = 0.0
train_bar = tqdm(train_loader)
for step, data in enumerate(train_bar):
images, labels = data
optimizer.zero_grad()
logits = net(images.to(device))
loss = loss_function(logits, labels.to(device))
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,
epochs,
loss)
# validate
net.eval()
acc = 0.0 # accumulate accurate number / epoch
with torch.no_grad():
val_bar = tqdm(validate_loader)
for val_data in val_bar:
val_images, val_labels = val_data
outputs = net(val_images.to(device))
# loss = loss_function(outputs, test_labels)
predict_y = torch.max(outputs, dim=1)[1]
acc += torch.eq(predict_y, val_labels.to(device)).sum().item()
val_bar.desc = "valid epoch[{}/{}]".format(epoch + 1,
epochs)
val_accurate = acc / val_num
print('[epoch %d] train_loss: %.3f val_accuracy: %.3f' %
(epoch + 1, running_loss / train_steps, val_accurate))
if val_accurate > best_acc:
best_acc = val_accurate
torch.save(net.state_dict(), save_path)
print('Finished Training')
if __name__ == '__main__':
main()
使用与训练过程中相同的预处理方式,其他与之前的网络类似。
预测部分全部代码如下:
import os
import json
import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt
from model_v2 import MobileNetV2
def main():
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
data_transform = transforms.Compose(
[transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
# load image
img_path = "../tulip.jpg"
assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
img = Image.open(img_path)
plt.imshow(img)
# [N, C, H, W]
img = data_transform(img)
# expand batch dimension
img = torch.unsqueeze(img, dim=0) # 添加一个batch维度
# read class_indict
json_path = './class_indices.json'
assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)
json_file = open(json_path, "r")
class_indict = json.load(json_file)
# create model
model = MobileNetV2(num_classes=5).to(device) # 实例化模型
# load model weights
model_weight_path = "./MobileNetV2.pth"
model.load_state_dict(torch.load(model_weight_path, map_location=device)) # 载入训练好的权重
model.eval() # 进入eval模式
with torch.no_grad(): # 通过torch.no_grad()上下文管理器,来禁止运算过程中跟踪误差信息
# predict class
output = torch.squeeze(model(img.to(device))).cpu() # 通过squeeze函数压缩batch维度
predict = torch.softmax(output, dim=0) # 通过softmax将输出转化为概率分布
predict_cla = torch.argmax(predict).numpy() # 通过torch.argmax获取最大的预测值所对应的索引
print_res = "class: {} prob: {:.3}".format(class_indict[str(predict_cla)],
predict[predict_cla].numpy())
plt.title(print_res)
print(print_res)
plt.show()
if __name__ == '__main__':
main()
模型部分全部代码如下:
from typing import Callable, List, Optional
import torch
from torch import nn, Tensor
from torch.nn import functional as F
from functools import partial
def _make_divisible(ch, divisor=8, min_ch=None):
"""
This function is taken from the original tf repo.
It ensures that all layers have a channel number that is divisible by 8
It can be seen here:
https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
"""
if min_ch is None:
min_ch = divisor
new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
# Make sure that round down does not go down by more than 10%.
if new_ch < 0.9 * ch:
new_ch += divisor
return new_ch
class ConvBNActivation(nn.Sequential):
def __init__(self,
in_planes: int,
out_planes: int,
kernel_size: int = 3,
stride: int = 1,
groups: int = 1,
norm_layer: Optional[Callable[..., nn.Module]] = None,
activation_layer: Optional[Callable[..., nn.Module]] = None):
padding = (kernel_size - 1) // 2
if norm_layer is None:
norm_layer = nn.BatchNorm2d # 如果没有传入norm_layer,默认使用BN
if activation_layer is None:
activation_layer = nn.ReLU6 # 如果没有传入activation_layer,默认使用ReLU6
super(ConvBNActivation, self).__init__(nn.Conv2d(in_channels=in_planes,
out_channels=out_planes,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups,
bias=False),
norm_layer(out_planes),
activation_layer(inplace=True))
# SE模块,即注意力机制模块
class SqueezeExcitation(nn.Module):
def __init__(self, input_c: int, squeeze_factor: int = 4):
# 第1个FC层输出结点个数为输入特征矩阵channel的1/4,这里设置squeeze_factor: int = 4
super(SqueezeExcitation, self).__init__()
squeeze_c = _make_divisible(input_c // squeeze_factor, 8) # 计算第1个FC层的节点个数
self.fc1 = nn.Conv2d(input_c, squeeze_c, 1) # 直接用卷积Conv2d作为全连接层,FC1
self.fc2 = nn.Conv2d(squeeze_c, input_c, 1) # 因为FC2输出channel与输入的channel相同,因此输出参数设置为input_c
def forward(self, x: Tensor) -> Tensor:
scale = F.adaptive_avg_pool2d(x, output_size=(1, 1)) # 自适应平均池化,output_size=(1,1)这样可以将每个channel平均池化到1×1大小
scale = self.fc1(scale)
scale = F.relu(scale, inplace=True)
scale = self.fc2(scale)
scale = F.hardsigmoid(scale, inplace=True)
return scale * x # 将每个权重数据与原来的特征矩阵相乘
# 这里的InvertedResidualConfig对应的是MobileNetV3中的每一个bneck结构的参数配置
class InvertedResidualConfig:
def __init__(self,
input_c: int, # 输入特征矩阵的channel
kernel: int, # DW卷积所对应的卷积核大小
expanded_c: int, # exp size是第1层1×1卷积层所使用的卷积核个数
out_c: int, # 最后1层1×1卷积层所使用的卷积核大小
use_se: bool, # 是否使用SE模块
activation: str, # 激活函数,RE对应relu,HS对应h-swish
stride: int, # DW卷积对应的步距
width_multi: float # 对应V2中的α参数,用来调节每一个卷积层所使用channel的倍率因子
):
self.input_c = self.adjust_channels(input_c, width_multi) # 用adjust_channels得到调节后的输入特征矩阵channel
self.kernel = kernel
self.expanded_c = self.adjust_channels(expanded_c, width_multi) # 用adjust_channels得到调节后的expand channel
self.out_c = self.adjust_channels(out_c, width_multi)
self.use_se = use_se
self.use_hs = activation == "HS" # whether using h-swish activation
# 如果activation == "HS",则use_hs=True,使用h-swish激活函数。如果使用RE,则use_hs=False。
self.stride = stride
@staticmethod
def adjust_channels(channels: int, width_multi: float):
return _make_divisible(channels * width_multi, 8)
class InvertedResidual(nn.Module):
def __init__(self,
cnf: InvertedResidualConfig,
norm_layer: Callable[..., nn.Module]):
# 初始化函数中出入cnf文件,即上面的InvertedResidualConfig
super(InvertedResidual, self).__init__()
if cnf.stride not in [1, 2]: # 参数表中,步距只有1和2两种情况。如果不等于1或2,则为非法的。
raise ValueError("illegal stride value.")
self.use_res_connect = (cnf.stride == 1 and cnf.input_c == cnf.out_c) # 判断是否有short cut分支
layers: List[nn.Module] = [] # 创建1个空列表layers
activation_layer = nn.Hardswish if cnf.use_hs else nn.ReLU # 判断使用哪个激活函数
# expand,第1个1×1卷积层
if cnf.expanded_c != cnf.input_c: # 对应表格第2行,exp size=input c时,没有第1层1×1卷积层
layers.append(ConvBNActivation(cnf.input_c,
cnf.expanded_c,
kernel_size=1,
norm_layer=norm_layer,
activation_layer=activation_layer))
# depthwise,DW卷积
layers.append(ConvBNActivation(cnf.expanded_c, # 输入特征矩阵channel为上1层输出特征矩阵的channel
cnf.expanded_c, # DW卷积input c = output c
kernel_size=cnf.kernel,
stride=cnf.stride,
groups=cnf.expanded_c, # DW卷积针对每一个channel单独处理,groups数和channel数保持一致
norm_layer=norm_layer,
activation_layer=activation_layer))
if cnf.use_se: # 接下来判断当前层结构是否使用SE注意力机制
layers.append(SqueezeExcitation(cnf.expanded_c)) # SE模块只需要传入1个参数,即input channel,这里是exoanded_c(通过DW输出的c)
# project,最后1个1×1的卷积层
layers.append(ConvBNActivation(cnf.expanded_c, # 无论是否使用SE模块,最后一层卷积的input_c都等于DW卷积后的output_c
cnf.out_c, # 输出特征矩阵的channel为配置文件中给定的#out
kernel_size=1,
norm_layer=norm_layer,
activation_layer=nn.Identity)) # 最后1层卷积的激活函数为线性激活,即没有做任何处理
self.block = nn.Sequential(*layers)
self.out_channels = cnf.out_c
self.is_strided = cnf.stride > 1
def forward(self, x: Tensor) -> Tensor:
result = self.block(x)
if self.use_res_connect:
result += x # 如果使用了short cut连接,主分支block输出与原始x相加
return result
class MobileNetV3(nn.Module):
def __init__(self,
inverted_residual_setting: List[InvertedResidualConfig], # 对应一些列bneck参数的列表
last_channel: int, # 对应参数表中倒数第2个卷积层(也是FC层)输出结点的个数
num_classes: int = 1000,
block: Optional[Callable[..., nn.Module]] = None, # block对应上面定义的更新倒残差结构,默认设置为None
norm_layer: Optional[Callable[..., nn.Module]] = None):
super(MobileNetV3, self).__init__()
if not inverted_residual_setting: # 如果没有传入bneck参数,会报错
raise ValueError("The inverted_residual_setting should not be empty.")
# 下面进行数据检查,如果传入参数不是列表或列表中的参数不是InvertedResidualConfig的参数时,也会报错
elif not (isinstance(inverted_residual_setting, List) and
all([isinstance(s, InvertedResidualConfig) for s in inverted_residual_setting])):
raise TypeError("The inverted_residual_setting should be List[InvertedResidualConfig]")
if block is None: # 将block默认设置为InvertedResidual
block = InvertedResidual
if norm_layer is None: # 将norm_layer默认设置为BN
norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.01)
# partial为BatchNorm2d方法默认传入参数eps=0.001,momentum=0.01
layers: List[nn.Module] = [] # 创建一个空列表layers
# building first layer
firstconv_output_c = inverted_residual_setting[0].input_c # 获取第1个卷积层输出的channel,它对应着第1个bneck的input channel
# 定义第1个卷积层,对应参数列表第1行
layers.append(ConvBNActivation(3, # 使用rgb图像,故输入channel为3
firstconv_output_c, # 输出channel为下面第2行对应的第1个bneck的input_c
kernel_size=3,
stride=2,
norm_layer=norm_layer, # 将上面定义好的BN结构赋给norm_layer
activation_layer=nn.Hardswish)) # 第1层使用h-swish函数
# building inverted residual blocks
for cnf in inverted_residual_setting: # 遍历每1个bneck结构,将配置文件和norm_layer传给block,并将block添加到layers中
layers.append(block(cnf, norm_layer))
# 构建最后几个层结构,包括卷积、池化和全连接层
# building last several layers
lastconv_input_c = inverted_residual_setting[-1].out_c # 获取最后1个bneck结构的output_c,它是下一个卷积层的input_c
lastconv_output_c = 6 * lastconv_input_c # 根据参数列表中倒数第4行数据,该层卷积层的out_c=6*input_c
# 定义参数列表倒数第4行的卷积层,将其添加到layers[]中
layers.append(ConvBNActivation(lastconv_input_c,
lastconv_output_c,
kernel_size=1,
norm_layer=norm_layer,
activation_layer=nn.Hardswish))
self.features = nn.Sequential(*layers) # 将以上所有的层结构(第1行到倒数第3行)传入作为特征提取网络
# 接下来定义分类网络,主要包括平均池化和全连接层
self.avgpool = nn.AdaptiveAvgPool2d(1) # 自适应平均池化
# 构建最后啷个全连接层
self.classifier = nn.Sequential(nn.Linear(lastconv_output_c, last_channel),
# 输入c等于上面计算所得,输出last_channel为初始化中传入参数
nn.Hardswish(inplace=True), # 使用HS激活函数
nn.Dropout(p=0.2, inplace=True),
nn.Linear(last_channel, num_classes)) # 输入节点个数为上1个FC层输出的节点个数,输出节点个数为分类类别个数
# initial weights
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode="fan_out")
if m.bias is not None:
nn.init.zeros_(m.bias)
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.zeros_(m.bias)
def _forward_impl(self, x: Tensor) -> Tensor:
x = self.features(x)
x = self.avgpool(x) # 经过平均池化后,高和宽都变成1×1了
x = torch.flatten(x, 1) # 不在需要高和宽维度,展平成一维
x = self.classifier(x)
return x
def forward(self, x: Tensor) -> Tensor:
return self._forward_impl(x)
def mobilenet_v3_large(num_classes: int = 1000,
reduced_tail: bool = False) -> MobileNetV3:
"""
Constructs a large MobileNetV3 architecture from
"Searching for MobileNetV3" .
weights_link:
https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth
Args:
num_classes (int): number of classes
reduced_tail (bool): If True, reduces the channel counts of all feature layers
between C4 and C5 by 2. It is used to reduce the channel redundancy in the
backbone for Detection and Segmentation.
"""
width_multi = 1.0 # 设置1个超参数α
bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi) # 给InvertedResidualConfig传入默认超参数α=1
adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)
# 给adjust_channels方法传入超参数
reduce_divider = 2 if reduced_tail else 1
# pytorch官方给定的参数,改变最后3个bneck的channel数目,默认不使用。如果想进一步减少网络参数,可以设置其为True
inverted_residual_setting = [
# input_c, kernel, expanded_c, out_c, use_se, activation, stride
bneck_conf(16, 3, 16, 16, False, "RE", 1),
bneck_conf(16, 3, 64, 24, False, "RE", 2), # C1
bneck_conf(24, 3, 72, 24, False, "RE", 1),
bneck_conf(24, 5, 72, 40, True, "RE", 2), # C2
bneck_conf(40, 5, 120, 40, True, "RE", 1),
bneck_conf(40, 5, 120, 40, True, "RE", 1),
bneck_conf(40, 3, 240, 80, False, "HS", 2), # C3
bneck_conf(80, 3, 200, 80, False, "HS", 1),
bneck_conf(80, 3, 184, 80, False, "HS", 1),
bneck_conf(80, 3, 184, 80, False, "HS", 1),
bneck_conf(80, 3, 480, 112, True, "HS", 1),
bneck_conf(112, 3, 672, 112, True, "HS", 1),
bneck_conf(112, 5, 672, 160 // reduce_divider, True, "HS", 2), # C4,如果将reduced_tail设置为ture,这里就会进一步调整网络
bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
]
last_channel = adjust_channels(1280 // reduce_divider) # C5,默认情况下,该值等于1280,即倒数第2行全连接层的节点个数
return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
last_channel=last_channel,
num_classes=num_classes)
def mobilenet_v3_small(num_classes: int = 1000,
reduced_tail: bool = False) -> MobileNetV3:
"""
Constructs a large MobileNetV3 architecture from
"Searching for MobileNetV3" .
Args:
num_classes (int): number of classes
reduced_tail (bool): If True, reduces the channel counts of all feature layers
between C4 and C5 by 2. It is used to reduce the channel redundancy in the
backbone for Detection and Segmentation.
"""
width_multi = 1.0
bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi)
adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)
reduce_divider = 2 if reduced_tail else 1
inverted_residual_setting = [
# input_c, kernel, expanded_c, out_c, use_se, activation, stride
bneck_conf(16, 3, 16, 16, True, "RE", 2), # C1
bneck_conf(16, 3, 72, 24, False, "RE", 2), # C2
bneck_conf(24, 3, 88, 24, False, "RE", 1),
bneck_conf(24, 5, 96, 40, True, "HS", 2), # C3
bneck_conf(40, 5, 240, 40, True, "HS", 1),
bneck_conf(40, 5, 240, 40, True, "HS", 1),
bneck_conf(40, 5, 120, 48, True, "HS", 1),
bneck_conf(48, 5, 144, 48, True, "HS", 1),
bneck_conf(48, 5, 288, 96 // reduce_divider, True, "HS", 2), # C4
bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1),
bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1)
]
last_channel = adjust_channels(1024 // reduce_divider) # C5
return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
last_channel=last_channel,
num_classes=num_classes)
训练部分和MobileNet(v2)中 2.1.2 train.py中内容整体相同,不同的是在载入模块是改为:
from model_v3 import mobilenet_v3_large
在实例化网络和save path的路径中也做出相应改变,同时下载mobilenetv3_large的预训练权重。
与 2.2.2 train.py中相同,修改2.1.3 predict.py中import部分、实例化部分、载入权重部分和保存路径部分后,就可以进行预测了。