【NAM】《NAM:Normalization-based Attention Module》

【NAM】《NAM:Normalization-based Attention Module》_第1张图片

NeurIPS-2021 workshop


文章目录

  • 1 Background and Motivation
  • 2 Related Work
  • 3 Advantages / Contributions
  • 4 Method
  • 5 Experiments
    • 5.1 Datasets and Metrics
    • 5.2 Experiments
  • 6 Conclusion(own)


1 Background and Motivation

注意力机制是近些年视觉领域研究的热门方向之一

We aim to utilize the contributing factors of weights for the improvement of attention mechanisms.

2 Related Work

However, these works neglect information from the tuned weights from training.

3 Advantages / Contributions

提出 Normalization-based Attention Modul,在 resnet 和 mobilenet 上 验证了其有效性

4 Method

a NAM module is embedded at the end of each network block

【NAM】《NAM:Normalization-based Attention Module》_第2张图片

在这里插入图片描述

【NAM】《NAM:Normalization-based Attention Module》_第3张图片

W γ W_{\gamma} Wγ W λ W_{\lambda} Wλ 的计算方法如图 1

作者还对 γ \gamma γ λ \lambda λ 进行了归一化约束

p p p is the penalty that balances g ( γ ) g(\gamma) g(γ) and g ( λ ) g(\lambda) g(λ)

看看作者开源的代码,https://github.com/Christian-lyc/NAM

import torch.nn as nn
import torch
from torch.nn import functional as F


class Channel_Att(nn.Module):
    def __init__(self, channels, t=16):
        super(Channel_Att, self).__init__()
        self.channels = channels
      
        self.bn2 = nn.BatchNorm2d(self.channels, affine=True)


    def forward(self, x):
        residual = x

        x = self.bn2(x)
        weight_bn = self.bn2.weight.data.abs() / torch.sum(self.bn2.weight.data.abs())
        x = x.permute(0, 2, 3, 1).contiguous()
        x = torch.mul(weight_bn, x)
        x = x.permute(0, 3, 1, 2).contiguous()
        
        x = torch.sigmoid(x) * residual #
        
        return x


class Att(nn.Module):
    def __init__(self, channels,shape, out_channels=None, no_spatial=True):
        super(Att, self).__init__()
        self.Channel_Att = Channel_Att(channels)
  
    def forward(self, x):
        x_out1=self.Channel_Att(x)
 
        return x_out1  

仅有 channel normalization-based attention 的部分

5 Experiments

5.1 Datasets and Metrics

  • CIFAR-100

  • ImageNet

top1 and top5

5.2 Experiments

【NAM】《NAM:Normalization-based Attention Module》_第4张图片
单加 channel NAM 比单加 spatial 的要好

【NAM】《NAM:Normalization-based Attention Module》_第5张图片
提升不是特别的明显,优势在于基本没有引入额外的参数量,下面具体看看参数量

【NAM】《NAM:Normalization-based Attention Module》_第6张图片

乘以 4,仅看作者开源的代码的话,应该是乘以 2,也就是 BN 的参数量

【NAM】《NAM:Normalization-based Attention Module》_第7张图片

6 Conclusion(own)

文章篇幅较短,细节未可知,eg: pixel normalization 的具体实现

你可能感兴趣的:(CNN,/,Transformer,人工智能,NAM,attention)