【图像分类】模型如何搭建/文件如何调用

在这里插入图片描述
项目地址https://github.com/Fafa-DL/Awesome-Backbones

项目说明:开箱即用,涵盖主流模型的图像分类|主干网络学习/对比/魔改项目

视频教程https://www.bilibili.com/video/BV1SY411P7Nd

models/mobilenet/mobilenet_v3_small.py为例,介绍一个完整的模型是如何被搭建

mobilenet_v3_small.py中涉及模型结构的部分如下

model_cfg = dict(
    backbone=dict(type='MobileNetV3', arch='small'),
    neck=dict(type='GlobalAveragePooling'),
    head=dict(
        type='StackedLinearClsHead',
        num_classes=5,
        in_channels=576,
        mid_channels=[1024],
        dropout_rate=0.2,
        act_cfg=dict(type='HSwish'),
        loss=dict(
            type='CrossEntropyLoss', loss_weight=1.0),
        init_cfg=dict(
            type='Normal', layer='Linear', mean=0., std=0.01, bias=0.),
        topk=(1, 5)))

可以看出一个完整的模型结构由BackboneNeckHead组成,Loss计算方式集成在Head中

现在以tools/train.py为例,一起看看我是如何调用她们的,注意到这一行

model = init_model(model_cfg, data_cfg, device=device, mode='train')

init_model函数在utils/inference.py中,如下

def init_model(model_cfg, data_cfg, device='cuda:0',mode='eval'):
    """Initialize a classifier from config file.

    Returns:
        nn.Module: The constructed classifier.
    """
    if device == '':
        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = BuildNet(model_cfg)
    
    if mode == 'train':
        print('Initialize the weights.')
        model.init_weights()
        
        if data_cfg.get('train').get('pretrained_flag') and data_cfg.get('train').get('pretrained_weights'):
            print('Loading {}'.format(data_cfg.get('train').get('pretrained_weights').split('/')[-1]))
            model_dict = model.state_dict()
            pretrained_dict = torch.load(data_cfg.get('train').get('pretrained_weights'), map_location=device)
            if 'state_dict' in pretrained_dict:
                pretrained_dict = pretrained_dict['state_dict']  
            pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict.keys() and np.shape(model_dict[k]) ==  np.shape(v) and 'backbone' in k}
            model_dict.update(pretrained_dict)
            print(model.load_state_dict(pretrained_dict,strict=False))
            
    elif mode =='eval':
        print('Loading {}'.format(data_cfg.get('test').get('ckpt').split('/')[-1]))
        model_dict = model.state_dict()
        pretrained_dict = torch.load(data_cfg.get('test').get('ckpt'), map_location=device)
        
        if 'state_dict' in pretrained_dict:
            pretrained_dict = pretrained_dict['state_dict']  
        pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict.keys() and np.shape(model_dict[k]) ==  np.shape(v)}
        model_dict.update(pretrained_dict)
        print(model.load_state_dict(pretrained_dict,strict=False))
        
    model.to(device)
    
    if mode == 'eval':
        model.eval()
    
    return model

mode分两种情况,一种为train,一种为eval,最大区别在于权重加载方式。在加载之前网络已被定义完毕,所以重点看看model = BuildNet(model_cfg)

BuildNet被定义在models/build.py中,在讲该部分之前先讲个重要知识点eval

eval作用将字符串类型转为可执行的python语句并执行,亦即把字符串视作有效表达式来求值并返回结果

举例

x = 3
test_1 = '3 * x'

def test_2():
    return 'hahaha'
y = 'test_2'

result_1 = eval(test_1)
result_2 = eval(y)

print(result_1) # 9
print(result_2) # 

有了eval的知识储备后,再来看看BuildNet类的初始部分

class BuildNet(BaseModule):
    def __init__(self,cfg):
        super(BuildNet, self).__init__()
        self.neck_cfg = cfg.get("neck")
        self.head_cfg = cfg.get("head")
        self.backbone = build_model(cfg.get("backbone"))
        
        if self.neck_cfg is not None:
            self.neck = build_model(cfg.get("neck"))
        
        if self.head_cfg is not None:
            self.head = build_model(cfg.get("head"))

参数cfg对应配置文件中的model_cfg字典形式,获取字典中的键常用命令为get,注意到在model_cfgbackboneneckhead均为字典形式,所以get到的也是对应字典结果。

为了帮助小白理解,打印看看咱到底get到了啥!

print(cfg.get("backbone"))
# {'type': 'MobileNetV3', 'arch': 'small'}

print(model_cfg.get("neck"))
# {'type': 'GlobalAveragePooling'}

print(model_cfg.get("head"))
# {'type': 'StackedLinearClsHead', 'num_classes': 5, 'in_channels': 576, 'mid_channels': [1024], 'dropout_rate': 0.2, 'act_cfg': {'type': 'HSwish'}, 'loss': {'type': 'CrossEntropyLoss', 'loss_weight': 1.0}, 'init_cfg': {'type': 'Normal', 'layer': 'Linear', 'mean': 0.0, 'std': 0.01, 'bias': 0.0}, 'topk': (1, 5)}

继续看看backboneneckhead是如何搭建的,build_model函数如下

def build_model(cfg):
    if isinstance(cfg, list):
        modules = [
            eval(cfg_.pop("type"))(**cfg_) for cfg_ in cfg
        ]
        return Sequential(*modules)
    else:
        return eval(cfg.pop("type"))(**cfg)

eval在此处发挥关键作用,通过pop字典中的type获取对应模型,为什么要pop而不是get?稍后再解释。

问题来了,这type对应的模型是在哪被定义和调用呢?把视线移至build.py顶部导包部分。

from configs.backbones import *
from configs.necks import *
from configs.heads import *

原来所有的模型都在此处被导入,*代表导入该文件下所有包,但一个文件有这么多类,一个文件夹包含这么多文件,是如何精确导入想要的东西呢?

这就涉及Python知识了,通过在文件夹中定义__init__.py,即把该文件夹视作一个模块,当某程序调用该文件夹时,将率先访问__init__.py文件,来看看configs/backbonesconfigs/necksconfigs/heads文件夹下的__init__.py

# backbones
from .mobilenet_v3 import MobileNetV3
from .mobilenet_v2 import MobileNetV2
from .alexnet import AlexNet
from .lenet import LeNet5
from .vgg import VGG
from .resnet import ResNet, ResNetV1c, ResNetV1d
from .shufflenet_v1 import ShuffleNetV1
from .shufflenet_v2 import ShuffleNetV2
from .efficientnet import EfficientNet
from .resnext import ResNeXt
from .seresnet import SEResNet
from .seresnext import SEResNeXt
from .regnet import RegNet
from .repvgg import RepVGG
from .res2net import Res2Net
from .convnext import ConvNeXt
from .hrnet import HRNet
from .convmixer import ConvMixer
from .cspnet import CSPDarkNet,CSPResNet,CSPResNeXt
from .swin_transformer import SwinTransformer
from .vision_transformer import VisionTransformer
from .tnt import TNT
from .mlp_mixer import MlpMixer
from .deit import DistilledVisionTransformer
from .conformer import Conformer
from .t2t_vit import T2T_ViT
from .twins import PCPVT, SVT
from .poolformer import PoolFormer
from .van import VAN
from .densenet import DenseNet

__all__ = ['MobileNetV3','MobileNetV2', 'AlexNet', 'LeNet5', 'VGG', 'ResNet', 'ResNetV1c', 'ResNetV1d', 'ShuffleNetV1', 'ShuffleNetV2','EfficientNet', 'ResNeXt', 'SEResNet', 'SEResNeXt', 'RegNet', 'RepVGG', 'Res2Net', 'ConvNeXt', 'HRNet', 'ConvMixer','CSPDarkNet','CSPResNet','CSPResNeXt', 'SwinTransformer', 'VisionTransformer', 'TNT', 'MlpMixer', 'DistilledVisionTransformer', 'Conformer', 'T2T_ViT', 'PCPVT', 'SVT', 'PoolFormer', 'VAN', 'DenseNet']
# necks
from .gap import GlobalAveragePooling
from .hr_fuse import HRFuseScales

__all__ = ['GlobalAveragePooling', 'HRFuseScales']
# heads
from .linear_head import LinearClsHead
from .stacked_head import StackedLinearClsHead
from .cls_head import ClsHead
from .vision_transformer_head import VisionTransformerClsHead
from .deit_head import DeiTClsHead
from .conformer_head import ConformerHead

__all__ = ['LinearClsHead', 'StackedLinearClsHead','ClsHead', 'VisionTransformerClsHead', 'DeiTClsHead', 'ConformerHead']

当我们使用from xx import *,直接访问的是__all__,而该列表中元素均对应前述导入的模型。所以文件就这样巧妙地被导入,而type命名与这些文件一一对应,结合eval完成调用!

那每个模型初始化所需的参数是如何传入的?注意到

eval(cfg.pop("type"))(**cfg)

这个**cfg操作则是把pop后字典剩下的键值传给对应模型(类/class),这也解释了为什么不使用get,因为type用途仅是让我们找到指定的模型,若全部传入则会报错没有定义type,匹配不上。

行吧为了方便理解,以MobileNetV3为例看看他的初始化

class MobileNetV3(BaseModule):
def __init__(self,
                 arch='small',
                 conv_cfg=None,
                 norm_cfg=dict(type='BN', eps=0.001, momentum=0.01),
                 out_indices=None,
                 frozen_stages=-1,
                 norm_eval=False,
                 with_cp=False,
                 init_cfg=[
                     dict(
                         type='Kaiming',
                         layer=['Conv2d'],
                         nonlinearity='leaky_relu'),
                     dict(type='Normal', layer=['Linear'], std=0.01),
                     dict(type='Constant', layer=['BatchNorm2d'], val=1)
                 ]):

可以看到并不需要type

至此backboneneckhead均实例化完毕,对应BuildNetself.backboneself.neckself.head

此外,configs/basic文件夹下,ConvolutionNormalizationActivationPadding层均是采用这种思想达到随不同模型灵活调用构建的目的,所需的所有功能均被定义在convolution.pynormalization.pyactivations.pypadding.py

最后再说说BaseModule,很多模型都继承了该类

class BaseModule(nn.Module):
    def __init__(self, init_cfg=None):
        """Initialize BaseModule, inherited from `torch.nn.Module`"""
        super(BaseModule, self).__init__()
        self._is_init = False
        self.init_cfg = copy.deepcopy(init_cfg)
        
    @property
    def is_init(self):
        return self._is_init

    def init_weights(self):
        """Initialize the weights."""

可以看到BaseModule主要功能是完成权重初始化,它继承了nn.Module,使得所有继承了BaseModule的类能正常进行训练所需的一切操作。所以在BuildNet中可以看到重写了forward。通过将输入特征喂入BuildNetself.backboneself.neckself.head达到目的,至此,整个模型构建完毕!

你可能感兴趣的:(图像分类,分类,深度学习,pytorch,python,卷积神经网络)