项目地址:https://github.com/Fafa-DL/Awesome-Backbones
项目说明:开箱即用,涵盖主流模型的图像分类|主干网络学习/对比/魔改项目
视频教程:https://www.bilibili.com/video/BV1SY411P7Nd
以models/mobilenet/mobilenet_v3_small.py
为例,介绍一个完整的模型是如何被搭建
mobilenet_v3_small.py
中涉及模型结构的部分如下
model_cfg = dict(
backbone=dict(type='MobileNetV3', arch='small'),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='StackedLinearClsHead',
num_classes=5,
in_channels=576,
mid_channels=[1024],
dropout_rate=0.2,
act_cfg=dict(type='HSwish'),
loss=dict(
type='CrossEntropyLoss', loss_weight=1.0),
init_cfg=dict(
type='Normal', layer='Linear', mean=0., std=0.01, bias=0.),
topk=(1, 5)))
可以看出一个完整的模型结构由Backbone
、Neck
、Head
组成,Loss
计算方式集成在Head中
现在以tools/train.py
为例,一起看看我是如何调用她们的,注意到这一行
model = init_model(model_cfg, data_cfg, device=device, mode='train')
init_model
函数在utils/inference.py
中,如下
def init_model(model_cfg, data_cfg, device='cuda:0',mode='eval'):
"""Initialize a classifier from config file.
Returns:
nn.Module: The constructed classifier.
"""
if device == '':
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = BuildNet(model_cfg)
if mode == 'train':
print('Initialize the weights.')
model.init_weights()
if data_cfg.get('train').get('pretrained_flag') and data_cfg.get('train').get('pretrained_weights'):
print('Loading {}'.format(data_cfg.get('train').get('pretrained_weights').split('/')[-1]))
model_dict = model.state_dict()
pretrained_dict = torch.load(data_cfg.get('train').get('pretrained_weights'), map_location=device)
if 'state_dict' in pretrained_dict:
pretrained_dict = pretrained_dict['state_dict']
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v) and 'backbone' in k}
model_dict.update(pretrained_dict)
print(model.load_state_dict(pretrained_dict,strict=False))
elif mode =='eval':
print('Loading {}'.format(data_cfg.get('test').get('ckpt').split('/')[-1]))
model_dict = model.state_dict()
pretrained_dict = torch.load(data_cfg.get('test').get('ckpt'), map_location=device)
if 'state_dict' in pretrained_dict:
pretrained_dict = pretrained_dict['state_dict']
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v)}
model_dict.update(pretrained_dict)
print(model.load_state_dict(pretrained_dict,strict=False))
model.to(device)
if mode == 'eval':
model.eval()
return model
mode
分两种情况,一种为train
,一种为eval
,最大区别在于权重加载方式。在加载之前网络已被定义完毕,所以重点看看model = BuildNet(model_cfg)
BuildNet
被定义在models/build.py
中,在讲该部分之前先讲个重要知识点eval
eval作用:将字符串类型转为可执行的python语句并执行,亦即把字符串视作有效表达式来求值并返回结果
举例
x = 3
test_1 = '3 * x'
def test_2():
return 'hahaha'
y = 'test_2'
result_1 = eval(test_1)
result_2 = eval(y)
print(result_1) # 9
print(result_2) #
有了eval的知识储备后,再来看看BuildNet
类的初始部分
class BuildNet(BaseModule):
def __init__(self,cfg):
super(BuildNet, self).__init__()
self.neck_cfg = cfg.get("neck")
self.head_cfg = cfg.get("head")
self.backbone = build_model(cfg.get("backbone"))
if self.neck_cfg is not None:
self.neck = build_model(cfg.get("neck"))
if self.head_cfg is not None:
self.head = build_model(cfg.get("head"))
参数cfg
对应配置文件中的model_cfg
,字典形式,获取字典中的键常用命令为get
,注意到在model_cfg
中backbone
、neck
、head
均为字典形式,所以get到的也是对应字典结果。
为了帮助小白理解,打印看看咱到底get到了啥!
print(cfg.get("backbone"))
# {'type': 'MobileNetV3', 'arch': 'small'}
print(model_cfg.get("neck"))
# {'type': 'GlobalAveragePooling'}
print(model_cfg.get("head"))
# {'type': 'StackedLinearClsHead', 'num_classes': 5, 'in_channels': 576, 'mid_channels': [1024], 'dropout_rate': 0.2, 'act_cfg': {'type': 'HSwish'}, 'loss': {'type': 'CrossEntropyLoss', 'loss_weight': 1.0}, 'init_cfg': {'type': 'Normal', 'layer': 'Linear', 'mean': 0.0, 'std': 0.01, 'bias': 0.0}, 'topk': (1, 5)}
继续看看backbone
、neck
、head
是如何搭建的,build_model
函数如下
def build_model(cfg):
if isinstance(cfg, list):
modules = [
eval(cfg_.pop("type"))(**cfg_) for cfg_ in cfg
]
return Sequential(*modules)
else:
return eval(cfg.pop("type"))(**cfg)
eval
在此处发挥关键作用,通过pop
字典中的type
获取对应模型,为什么要pop
而不是get
?稍后再解释。
问题来了,这type
对应的模型是在哪被定义和调用呢?把视线移至build.py
顶部导包部分。
from configs.backbones import *
from configs.necks import *
from configs.heads import *
原来所有的模型都在此处被导入,*
代表导入该文件下所有包,但一个文件有这么多类,一个文件夹包含这么多文件,是如何精确导入想要的东西呢?
这就涉及Python知识了,通过在文件夹中定义__init__.py
,即把该文件夹视作一个模块,当某程序调用该文件夹时,将率先访问__init__.py
文件,来看看configs/backbones
、configs/necks
、configs/heads
文件夹下的__init__.py
# backbones
from .mobilenet_v3 import MobileNetV3
from .mobilenet_v2 import MobileNetV2
from .alexnet import AlexNet
from .lenet import LeNet5
from .vgg import VGG
from .resnet import ResNet, ResNetV1c, ResNetV1d
from .shufflenet_v1 import ShuffleNetV1
from .shufflenet_v2 import ShuffleNetV2
from .efficientnet import EfficientNet
from .resnext import ResNeXt
from .seresnet import SEResNet
from .seresnext import SEResNeXt
from .regnet import RegNet
from .repvgg import RepVGG
from .res2net import Res2Net
from .convnext import ConvNeXt
from .hrnet import HRNet
from .convmixer import ConvMixer
from .cspnet import CSPDarkNet,CSPResNet,CSPResNeXt
from .swin_transformer import SwinTransformer
from .vision_transformer import VisionTransformer
from .tnt import TNT
from .mlp_mixer import MlpMixer
from .deit import DistilledVisionTransformer
from .conformer import Conformer
from .t2t_vit import T2T_ViT
from .twins import PCPVT, SVT
from .poolformer import PoolFormer
from .van import VAN
from .densenet import DenseNet
__all__ = ['MobileNetV3','MobileNetV2', 'AlexNet', 'LeNet5', 'VGG', 'ResNet', 'ResNetV1c', 'ResNetV1d', 'ShuffleNetV1', 'ShuffleNetV2','EfficientNet', 'ResNeXt', 'SEResNet', 'SEResNeXt', 'RegNet', 'RepVGG', 'Res2Net', 'ConvNeXt', 'HRNet', 'ConvMixer','CSPDarkNet','CSPResNet','CSPResNeXt', 'SwinTransformer', 'VisionTransformer', 'TNT', 'MlpMixer', 'DistilledVisionTransformer', 'Conformer', 'T2T_ViT', 'PCPVT', 'SVT', 'PoolFormer', 'VAN', 'DenseNet']
# necks
from .gap import GlobalAveragePooling
from .hr_fuse import HRFuseScales
__all__ = ['GlobalAveragePooling', 'HRFuseScales']
# heads
from .linear_head import LinearClsHead
from .stacked_head import StackedLinearClsHead
from .cls_head import ClsHead
from .vision_transformer_head import VisionTransformerClsHead
from .deit_head import DeiTClsHead
from .conformer_head import ConformerHead
__all__ = ['LinearClsHead', 'StackedLinearClsHead','ClsHead', 'VisionTransformerClsHead', 'DeiTClsHead', 'ConformerHead']
当我们使用from xx import *
,直接访问的是__all__
,而该列表中元素均对应前述导入的模型。所以文件就这样巧妙地被导入,而type
命名与这些文件一一对应,结合eval
完成调用!
那每个模型初始化所需的参数是如何传入的?注意到
eval(cfg.pop("type"))(**cfg)
这个**cfg
操作则是把pop
后字典剩下的键值传给对应模型(类/class),这也解释了为什么不使用get
,因为type
用途仅是让我们找到指定的模型,若全部传入则会报错没有定义type
,匹配不上。
行吧为了方便理解,以MobileNetV3
为例看看他的初始化
class MobileNetV3(BaseModule):
def __init__(self,
arch='small',
conv_cfg=None,
norm_cfg=dict(type='BN', eps=0.001, momentum=0.01),
out_indices=None,
frozen_stages=-1,
norm_eval=False,
with_cp=False,
init_cfg=[
dict(
type='Kaiming',
layer=['Conv2d'],
nonlinearity='leaky_relu'),
dict(type='Normal', layer=['Linear'], std=0.01),
dict(type='Constant', layer=['BatchNorm2d'], val=1)
]):
可以看到并不需要type
!
至此backbone
、neck
、head
均实例化完毕,对应BuildNet
中self.backbone
、self.neck
、self.head
此外,configs/basic
文件夹下,Convolution
、Normalization
、Activation
、Padding
层均是采用这种思想达到随不同模型灵活调用构建的目的,所需的所有功能均被定义在convolution.py
、normalization.py
、activations.py
、padding.py
中
最后再说说BaseModule
,很多模型都继承了该类
class BaseModule(nn.Module):
def __init__(self, init_cfg=None):
"""Initialize BaseModule, inherited from `torch.nn.Module`"""
super(BaseModule, self).__init__()
self._is_init = False
self.init_cfg = copy.deepcopy(init_cfg)
@property
def is_init(self):
return self._is_init
def init_weights(self):
"""Initialize the weights."""
可以看到BaseModule
主要功能是完成权重初始化,它继承了nn.Module
,使得所有继承了BaseModule
的类能正常进行训练所需的一切操作。所以在BuildNet
中可以看到重写了forward
。通过将输入特征喂入BuildNet
中self.backbone
、self.neck
、self.head
达到目的,至此,整个模型构建完毕!