MMdetection之necks之FPN

MMdetection之necks之FPN

MMdetection之necks之FPN_第1张图片
其横向为 1X1 的卷积,向下为 上采样

Specifically, for ResNets [16] we use the feature activations output by each stage’s last residual block.

We denote the output of these last residual blocks as {C2, C3, C4, C5} for conv2, conv3, conv4, and conv5 outputs, and note that they have strides of {4, 8, 16, 32} pixels with respect to the input image.

更多细节请看原文:特征金字塔FPN

MMdetection之FPN代码如下(文件位置:/mmdet/models/necks/fpn.py):

notice:此文章只以 faster_RCNN网络的角度来看,并未涉及其他额外卷积操作(如在retinaNet) 中,后续会进行补充。

import warnings

import torch.nn as nn
import torch.nn.functional as F
from mmcv.cnn import ConvModule, xavier_init
from mmcv.runner import auto_fp16

from ..builder import NECKS


@NECKS.register_module()
class FPN(nn.Module):
    r"""Feature Pyramid Network.

    This is an implementation of paper `Feature Pyramid Networks for Object
    Detection `_.

    Args:
        in_channels (List[int]): Number of input channels per scale.
        out_channels (int): Number of output channels (used at each scale)
        num_outs (int): Number of output scales.
        start_level (int): Index of the start input backbone level used to
            build the feature pyramid. Default: 0.
        end_level (int): Index of the end input backbone level (exclusive) to
            build the feature pyramid. Default: -1, which means the last level.
        add_extra_convs (bool | str): If bool, it decides whether to add conv
            layers on top of the original feature maps. Default to False.
            If True, its actual mode is specified by `extra_convs_on_inputs`.
            If str, it specifies the source feature map of the extra convs.
            Only the following options are allowed

            - 'on_input': Last feat map of neck inputs (i.e. backbone feature).
            - 'on_lateral':  Last feature map after lateral convs.
            - 'on_output': The last output feature map after fpn convs.
        extra_convs_on_inputs (bool, deprecated): Whether to apply extra convs
            on the original feature from the backbone. If True,
            it is equivalent to `add_extra_convs='on_input'`. If False, it is
            equivalent to set `add_extra_convs='on_output'`. Default to True.
        relu_before_extra_convs (bool): Whether to apply relu before the extra
            conv. Default: False.
        no_norm_on_lateral (bool): Whether to apply norm on lateral.
            Default: False.
        conv_cfg (dict): Config dict for convolution layer. Default: None.
        norm_cfg (dict): Config dict for normalization layer. Default: None.
        act_cfg (str): Config dict for activation layer in ConvModule.
            Default: None.
        upsample_cfg (dict): Config dict for interpolate layer.
            Default: `dict(mode='nearest')`

    Example:
        >>> import torch
        >>> in_channels = [2, 3, 5, 7]
        >>> scales = [340, 170, 84, 43]
        >>> inputs = [torch.rand(1, c, s, s)
        ...           for c, s in zip(in_channels, scales)]
        >>> self = FPN(in_channels, 11, len(in_channels)).eval()
        >>> outputs = self.forward(inputs)
        >>> for i in range(len(outputs)):
        ...     print(f'outputs[{i}].shape = {outputs[i].shape}')
        outputs[0].shape = torch.Size([1, 11, 340, 340])
        outputs[1].shape = torch.Size([1, 11, 170, 170])
        outputs[2].shape = torch.Size([1, 11, 84, 84])
        outputs[3].shape = torch.Size([1, 11, 43, 43])
    """

    def __init__(self,
                 in_channels,
                 out_channels,
                 num_outs,
                 start_level=0,
                 end_level=-1,
                 add_extra_convs=False,
                 extra_convs_on_inputs=True,# 弃用,会转换为 add_extra_convs的值
                 relu_before_extra_convs=False,
                 no_norm_on_lateral=False,
                 conv_cfg=None,
                 norm_cfg=None,
                 act_cfg=None,
                 upsample_cfg=dict(mode='nearest')):
        super(FPN, self).__init__()
        assert isinstance(in_channels, list)
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.num_ins = len(in_channels)
        self.num_outs = num_outs
        self.relu_before_extra_convs = relu_before_extra_convs
        self.no_norm_on_lateral = no_norm_on_lateral
        self.fp16_enabled = False
        self.upsample_cfg = upsample_cfg.copy()

        if end_level == -1:# self.backbone_end_level:理解为 用来 计数 需要计算的FPN的最后层数,要么为给定值,要么为 输入的channel的数量

            self.backbone_end_level = self.num_ins
            assert num_outs >= self.num_ins - start_level
            # 判断 输出scale数 是否合理。
        else:
            # if end_level < inputs, no extra level is allowed
            self.backbone_end_level = end_level # 此时 end_level 不等于 -1
            assert end_level <= len(in_channels) #判断 end_level 是否在合理范围内
            assert num_outs == end_level - start_level # # 判断 输出scale数 是否合理。
        self.start_level = start_level
        self.end_level = end_level
        self.add_extra_convs = add_extra_convs # bool
        assert isinstance(add_extra_convs, (str, bool))
        if isinstance(add_extra_convs, str):
            # Extra_convs_source choices: 'on_input', 'on_lateral', 'on_output'
            assert add_extra_convs in ('on_input', 'on_lateral', 'on_output')
        elif add_extra_convs:  # True
            if extra_convs_on_inputs:
                # TODO: deprecate(弃用) `extra_convs_on_inputs`
                warnings.simplefilter('once')
                warnings.warn(
                    '"extra_convs_on_inputs" will be deprecated in v2.9.0,'
                    'Please use "add_extra_convs"', DeprecationWarning)
                self.add_extra_convs = 'on_input'
            else:
                self.add_extra_convs = 'on_output'

        self.lateral_convs = nn.ModuleList()
        self.fpn_convs = nn.ModuleList()
        # 当添加 nn.ModuleList 作为 nn.Module 对象的一个成员时(即当我们添加模块到我们的网络时),
        # 所有 nn.ModuleList 内部的 nn.Module 的 parameter 也被添加作为 我们的网络的 parameter。
        for i in range(self.start_level, self.backbone_end_level):
            # in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1,同nn.Conv2d
            l_conv = ConvModule(
                in_channels[i],
                out_channels,
                1,
                conv_cfg=conv_cfg,
                norm_cfg=norm_cfg if not self.no_norm_on_lateral else None,
                # norm_cfg is None,bias=True;否则 bias=False
                act_cfg=act_cfg,
                inplace=False)# 5,inplace (bool): Whether to use inplace mode for activation.
            # # 1,in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1,同nn.Conv2d
            # # 2, bias (bool | str): True\False\Auto
            # # 如果是Auto,偏差由norm_cfg决定
            # # norm_cfg is None,bias=True;否则 bias=False
            # # 3, norm_cfg (dict):正则化层的配置字典Default: None.
            # # 4,act_cfg (dict): Config dict for activation layer.
            # # Default: dict(type='ReLU').
            # # 5,inplace (bool): Whether to use inplace mode for activation.
            # # Default: True.
            # # 6,with_spectral_norm (bool): Whether use spectral norm in conv module.
            # # Default: False.
            # # 7,padding_mode (str): 如果目前的pytorch中的 Conv2d 不支持该 padding_mode,使用自定义的padding layer。目前,该卷积模块支持官方的 [‘zeros’, ‘circular’] 和自己实现的[‘reflect’]。 Default: 'zeros'.
            # # 8,order (tuple[str]): conv/norm/activation layers的顺序. “conv”, “norm” and “act”.常见的有 ("conv", "norm", "act") and ("act", "conv", "norm"). Default: ('conv', 'norm', 'act').
            fpn_conv = ConvModule(
                out_channels,
                out_channels,
                3,
                padding=1,
                conv_cfg=conv_cfg,
                norm_cfg=norm_cfg,# 3, norm_cfg (dict):正则化层的配置字典Default: None.
                act_cfg=act_cfg,# 4,act_cfg (dict): Config dict for activation layer.
                inplace=False)

            self.lateral_convs.append(l_conv)
            self.fpn_convs.append(fpn_conv)
            # self.lateral_convs = nn.ModuleList()

        # add extra conv layers (e.g., RetinaNet)
        extra_levels = num_outs - self.backbone_end_level + self.start_level
        # 判断是否有 除原层数之外 的 多余层数
        # faster_rcnn中 self.add_extra_convs 为 false
        if self.add_extra_convs and extra_levels >= 1:
            for i in range(extra_levels):
                if i == 0 and self.add_extra_convs == 'on_input':
                    in_channels = self.in_channels[self.backbone_end_level - 1]
                else:
                    in_channels = out_channels
                extra_fpn_conv = ConvModule(
                    in_channels,
                    out_channels,
                    3,
                    stride=2,
                    padding=1,
                    conv_cfg=conv_cfg,
                    norm_cfg=norm_cfg,
                    act_cfg=act_cfg,
                    inplace=False)
                self.fpn_convs.append(extra_fpn_conv)

    # default init_weights for conv(msra) and norm in ConvModule
    def init_weights(self):
        """Initialize the weights of FPN module."""
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                xavier_init(m, distribution='uniform')

    @auto_fp16()
    def forward(self, inputs):
        """Forward function."""
        assert len(inputs) == len(self.in_channels)

        # build laterals
        # 卷积核 和 步长 都为 1 的横向卷积
        laterals = [
            lateral_conv(inputs[i + self.start_level])
            for i, lateral_conv in enumerate(self.lateral_convs)
        ]

        # build top-down path
        used_backbone_levels = len(laterals)
        for i in range(used_backbone_levels - 1, 0, -1):# 倒序
            # In some cases, fixing `scale factor` (e.g. 2) is preferred, but
            #  it cannot co-exist with `size` in `F.interpolate`.
            if 'scale_factor' in self.upsample_cfg:
                laterals[i - 1] += F.interpolate(laterals[i],
                                                 **self.upsample_cfg)## **解析字典中的值,然后 以关键字参数放入函数中。元组同理。
                # input (Tensor) – the input tensor
                # size (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int]) – output spatial size.
                # scale_factor (float or Tuple[float]) – multiplier for spatial size. Has to match input size if it is a tuple.
                # mode (str) – algorithm used for upsampling: 'nearest' | 'linear' | 'bilinear' | 'bicubic' |'trilinear' | 'area'. Default: 'nearest'
                # align_corners (bool, optional) – Geometrically, we consider the pixels of the input and output as squares rather than points. If set to True, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels. If set to False, the input and output tensors are aligned by the corner points of their corner pixels, and the interpolation uses edge value padding for out-of-boundary values, making this operation independent of input size when scale_factor is kept the same. This only has an effect when mode is 'linear', 'bilinear', 'bicubic' or 'trilinear'. Default: False
            else:
                prev_shape = laterals[i - 1].shape[2:]
                laterals[i - 1] += F.interpolate(
                    laterals[i], size=prev_shape, **self.upsample_cfg)
            # 不断的上采样,然后叠加结果

        # build outputs
        # part 1: from original levels
        outs = [
            self.fpn_convs[i](laterals[i]) for i in range(used_backbone_levels)
        ]
        # 注意此时 针对 叠加的每一层的结果,又进行了一次卷积,是因为 消除上采样带来的影响。
        # 文章原话为:Finally, we append a 3×3 convolution on each merged map to generate the final feature map, which is to reduce the aliasing effect of upsampling.

        # part 2: add extra levels
        if self.num_outs > len(outs):
            # use max pool to get more levels on top of outputs
            # (e.g., Faster R-CNN, Mask R-CNN)
            if not self.add_extra_convs:
                for i in range(self.num_outs - used_backbone_levels):
                    outs.append(F.max_pool2d(outs[-1], 1, stride=2))
            # add conv layers on top of original feature maps (RetinaNet)
            else:
                if self.add_extra_convs == 'on_input':
                    extra_source = inputs[self.backbone_end_level - 1]
                elif self.add_extra_convs == 'on_lateral':
                    extra_source = laterals[-1]
                elif self.add_extra_convs == 'on_output':
                    extra_source = outs[-1]
                else:
                    raise NotImplementedError
                outs.append(self.fpn_convs[used_backbone_levels](extra_source))
                for i in range(used_backbone_levels + 1, self.num_outs):
                    if self.relu_before_extra_convs:
                        outs.append(self.fpn_convs[i](F.relu(outs[-1])))
                    else:
                        outs.append(self.fpn_convs[i](outs[-1]))
        return tuple(outs)

# 参数:
# 1,in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1,同nn.Conv2d
# 2, bias (bool | str): True\False\Auto
# 如果是Auto,偏差由norm_cfg决定
# norm_cfg is None,bias=True;否则 bias=False
# 3, norm_cfg (dict):正则化层的配置字典Default: None.
# 4,act_cfg (dict): Config dict for activation layer.
# Default: dict(type='ReLU').
# 5,inplace (bool): Whether to use inplace mode for activation.
# Default: True.
# 6,with_spectral_norm (bool): Whether use spectral norm in conv module.
# Default: False.
# 7,padding_mode (str): 如果目前的pytorch中的 Conv2d 不支持该 padding_mode,使用自定义的padding layer。目前,该卷积模块支持官方的 [‘zeros’, ‘circular’] 和自己实现的[‘reflect’]。 Default: 'zeros'.
# 8,order (tuple[str]): conv/norm/activation layers的顺序. “conv”, “norm” and “act”.常见的有 ("conv", "norm", "act") and ("act", "conv", "norm"). Default: ('conv', 'norm', 'act').

# ————————————————
# 版权声明:本文为CSDN博主「Bella_wanna_Better」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
# 原文链接:https://blog.csdn.net/bairw_Bella/article/details/114880225

其中,运行example可以得到。

from mmdet.models.necks import FPN
import torch
in_channels = [2, 3, 5, 7]
scales = [340, 170, 84, 43]
inputs = [torch.rand(1, c, s, s)
          for c, s in zip(in_channels, scales)]
print(inputs[0].shape)

self = FPN(in_channels, 11,num_outs=7).eval()
outputs = self.forward(inputs)
for i in range(len(outputs)):
    print(f'outputs[{i}].shape = {outputs[i].shape}')

# num_outs=5
# outputs[0].shape = torch.Size([1, 11, 340, 340])
# outputs[1].shape = torch.Size([1, 11, 170, 170])
# outputs[2].shape = torch.Size([1, 11, 84, 84])
# outputs[3].shape = torch.Size([1, 11, 43, 43])
# outputs[4].shape = torch.Size([1, 11, 22, 22])

# num_outs=4
# outputs[0].shape = torch.Size([1, 11, 340, 340])
# outputs[1].shape = torch.Size([1, 11, 170, 170])
# outputs[2].shape = torch.Size([1, 11, 84, 84])
# outputs[3].shape = torch.Size([1, 11, 43, 43])

# num_outs=7
# outputs[0].shape = torch.Size([1, 11, 340, 340])
# outputs[1].shape = torch.Size([1, 11, 170, 170])
# outputs[2].shape = torch.Size([1, 11, 84, 84])
# outputs[3].shape = torch.Size([1, 11, 43, 43])
# outputs[4].shape = torch.Size([1, 11, 22, 22])
# outputs[5].shape = torch.Size([1, 11, 11, 11])
# outputs[6].shape = torch.Size([1, 11, 6, 6])

你可能感兴趣的:(MMdetection,python,pytorch,深度学习,神经网络,机器学习)