【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》

【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第1张图片

arXiv-2020


文章目录

  • 1 Background and Motivation
  • 2 Related Work
  • 3 Advantages / Contributions
  • 4 Method
  • 5 Experiments
    • 5.1 Datasets and Metrics
    • 5.2 PSA vs. Baselines
    • 5.3 Semantic Segmentation
    • 5.4 Ablation Study
  • 6 Conclusion(own)


1 Background and Motivation

论文名字的又来(参考来自 Polarized Self-Attention: Towards High-quality Pixel-wise Regression)

In photography, there are always random lights in transverse directions that produce glares/reflections. Polarized filtering, by only allowing the light pass orthogonal to the transverse direction, can potentially improve the contrast of the photo. Due to the loss of total intensity, the light after filtering usually has a small dynamic range, therefore needs a additional boost, e.g. by High Dynamic Range(HDR), to recover the details of the original scene

【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第2张图片

【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第3张图片

【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第4张图片

为防止眩光/反射,需偏振片滤光,但总能量会消失,需要额外的提升(eg:HDR),用来以恢复原始场景的详细信息

很像 attention

为了解决同时对空间和通道建模时,如果不进行维度缩减,就会导致计算量、显存爆炸的问题,作者在PSA中采用了一种极化滤波(polarized filtering)机制。

(1)滤波(Filtering):使得一个维度的特征(比如通道维度)完全坍塌,同时让正交方向的维度(比如空间维度)保持高分辨率。

(2)High Dynamic Range(HDR):首先在 attention 模块中最小的 tensor 上用 Softmax 函数来增加注意力的范围,然后再用Sigmoid 函数进行动态的映射。


深度学习发展 coarse-grained(分类 / 检测)-> fine-grained computer vision tasks(关键点 / 分割)

the pixel-wise regression problem has a higher problem complexity by the order of output element numbers

  • Keeping high internal resolution at a reasonable cost
  • Fitting output distribution such as that of the key-point heatmaps or segmentation masks

面对上面的难点,作者从 plug-and-play solution 的方向研究来提升 the pixel-wise regression problem 的精度,提出 Polarized Self-Attention

注意力机制设计过程中,通道注意力尽可能保留通道信息,空间注意力尽可能保留空间信息

输出时结合了 softmax 的高斯分布与 sigmoid 的二项式分布

【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第5张图片

2 Related Work

  • Pixel-wise Regression Tasks
    • keypoint estimation(heatmaps)
    • semantic segmentation
  • Self-attention and its Variants
  • Full-tensor and simplified attention blocks
    non-local 优化

3 Advantages / Contributions

借鉴光学中偏振滤波的思想,提出 Polarized self attention,关键点检测和语义分割任务上公开数据集提点明显

4 Method

2D Gaussian distribution (keypoint heatmaps)

2D Binormial distribution (binary segmentation masks)

fuse softmax-sigmoid composition in both channel-only and spatial-only attention branches
【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第6张图片

通道注意力获取 C × 1 × 1 C\times1\times1 C×1×1

C × 1 × 1 C\times1\times1 C×1×1 也可以由 ( C × H W ) × ( H W × 1 × 1 ) (C\times HW) \times (HW\times1\times1) (C×HW)×(HW×1×1) 获取得到

空间注意力获取 1 × H × W 1\times H\times W 1×H×W

1 × H × W 1\times H\times W 1×H×W 也可以是 ( 1 × C ) × ( C × H W ) (1\times C) \times (C\times HW) 1×C×C×HW 再 reshape 一下得到

和 CBAM 相比(【Attention】《CBAM: Convolutional Block Attention Module》)

通道注意力中利用了更多的空间信息(not global pooling),空间注意力中更加充分的利用了通道信息(not mean)

看看公式表达

(1)通道注意力

【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第7张图片

在这里插入图片描述
σ \sigma σ 是 reshape 操作

F S M F_{SM} FSM softmax

(2)空间注意力
【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第8张图片
在这里插入图片描述
通道空间注意力并行

【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第9张图片
通道空间注意力串行

【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第10张图片

Relation of PSA to other Self-Attentions

  • Internal Resolution vs Complexity:higher-resolution squeeze-and-excitation
  • Output Distribution/Non-linearity:Both the PSA channel-only and spatial-only branches use a Softmax-Sigmoid composition

看看代码

import numpy as np
import torch
from torch import nn
from torch.nn import init

class ParallelPolarizedSelfAttention(nn.Module):
    def __init__(self, channel=512):
        super().__init__()
        self.ch_wv=nn.Conv2d(channel,channel//2,kernel_size=(1,1))
        self.ch_wq=nn.Conv2d(channel,1,kernel_size=(1,1))
        self.softmax_channel=nn.Softmax(1)
        self.softmax_spatial=nn.Softmax(-1)
        self.ch_wz=nn.Conv2d(channel//2,channel,kernel_size=(1,1))
        self.ln=nn.LayerNorm(channel)
        self.sigmoid=nn.Sigmoid()
        self.sp_wv=nn.Conv2d(channel,channel//2,kernel_size=(1,1))
        self.sp_wq=nn.Conv2d(channel,channel//2,kernel_size=(1,1))
        self.agp=nn.AdaptiveAvgPool2d((1,1))

    def forward(self, x):
        b, c, h, w = x.size()

        #Channel-only Self-Attention
        channel_wv=self.ch_wv(x) #bs,c//2,h,w
        channel_wq=self.ch_wq(x) #bs,1,h,w
        channel_wv=channel_wv.reshape(b,c//2,-1) #bs,c//2,h*w
        channel_wq=channel_wq.reshape(b,-1,1) #bs,h*w,1
        channel_wq=self.softmax_channel(channel_wq)
        channel_wz=torch.matmul(channel_wv,channel_wq).unsqueeze(-1) #bs,c//2,1,1
        channel_weight=self.sigmoid(self.ln(self.ch_wz(channel_wz).reshape(b,c,1).permute(0,2,1))).permute(0,2,1).reshape(b,c,1,1) #bs,c,1,1
        channel_out=channel_weight*x

        #Spatial-only Self-Attention
        spatial_wv=self.sp_wv(x) #bs,c//2,h,w
        spatial_wq=self.sp_wq(x) #bs,c//2,h,w
        spatial_wq=self.agp(spatial_wq) #bs,c//2,1,1
        spatial_wv=spatial_wv.reshape(b,c//2,-1) #bs,c//2,h*w
        spatial_wq=spatial_wq.permute(0,2,3,1).reshape(b,1,c//2) #bs,1,c//2
        spatial_wq=self.softmax_spatial(spatial_wq) #bs,1,c//2
        spatial_wz=torch.matmul(spatial_wq,spatial_wv) #bs,1,h*w
        spatial_weight=self.sigmoid(spatial_wz.reshape(b,1,h,w)) #bs,1,h,w
        spatial_out=spatial_weight*x
        out=spatial_out+channel_out
        return out

class SequentialPolarizedSelfAttention(nn.Module):
    def __init__(self, channel=512):
        super().__init__()
        self.ch_wv=nn.Conv2d(channel,channel//2,kernel_size=(1,1))
        self.ch_wq=nn.Conv2d(channel,1,kernel_size=(1,1))
        self.softmax_channel=nn.Softmax(1)
        self.softmax_spatial=nn.Softmax(-1)
        self.ch_wz=nn.Conv2d(channel//2,channel,kernel_size=(1,1))
        self.ln=nn.LayerNorm(channel)
        self.sigmoid=nn.Sigmoid()
        self.sp_wv=nn.Conv2d(channel,channel//2,kernel_size=(1,1))
        self.sp_wq=nn.Conv2d(channel,channel//2,kernel_size=(1,1))
        self.agp=nn.AdaptiveAvgPool2d((1,1))

    def forward(self, x):
        b, c, h, w = x.size()

        #Channel-only Self-Attention
        channel_wv=self.ch_wv(x) #bs,c//2,h,w
        channel_wq=self.ch_wq(x) #bs,1,h,w
        channel_wv=channel_wv.reshape(b,c//2,-1) # bs,c//2,h*w
        channel_wq=channel_wq.reshape(b,-1,1) # bs,h*w,1
        channel_wq=self.softmax_channel(channel_wq) # bs,h*w,1
        channel_wz=torch.matmul(channel_wv,channel_wq).unsqueeze(-1) #bs,c//2,1,1
        channel_weight=self.sigmoid(self.ln(self.ch_wz(channel_wz).reshape(b,c,1).permute(0,2,1))).permute(0,2,1).reshape(b,c,1,1) #bs,c,1,1
        channel_out=channel_weight*x

        #Spatial-only Self-Attention
        spatial_wv=self.sp_wv(channel_out) #bs,c//2,h,w
        spatial_wq=self.sp_wq(channel_out) #bs,c//2,h,w
        spatial_wq=self.agp(spatial_wq) #bs,c//2,1,1
        spatial_wv=spatial_wv.reshape(b,c//2,-1) #bs,c//2,h*w
        spatial_wq=spatial_wq.permute(0,2,3,1).reshape(b,1,c//2) #bs,1,c//2
        spatial_wq=self.softmax_spatial(spatial_wq)
        spatial_wz=torch.matmul(spatial_wq,spatial_wv) #bs,1,h*w
        spatial_weight=self.sigmoid(spatial_wz.reshape(b,1,h,w)) #bs,1,h,w
        spatial_out=spatial_weight*channel_out
        return spatial_out

if __name__ == '__main__':
    input=torch.randn(1,512,7,7)
    psa = SequentialPolarizedSelfAttention(channel=512)
    output=psa(input)
    print(output.shape)

还是比较直观的

5 Experiments

we add PSAs after the first 3 × 3 convolution in every residual blocks, respectively.

【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第11张图片

5.1 Datasets and Metrics

  • MS-COCO 2017 human pose estimation(AP)

  • Pascal VOC2012 semantic segmentation(mIoU)

5.2 PSA vs. Baselines

(1)Top-Down 2D Human Pose Estimation

【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第12张图片

输出热力图尺寸 96 × 72,(1/4)

(2)Semantic Segmentation
【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第13张图片
没有关键点提升明显

5.3 Semantic Segmentation

【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第14张图片

通道注意力和空间注意力并行(p)和串行(s)并无太大差别,marginal metric differences

【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第15张图片
均有提点

5.4 Ablation Study

【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》_第16张图片
通道注意力和空间注意力都用比单用好,串行和并行两者效果相仿

6 Conclusion(own)

更多论文笔记,可以参考 【Paper Reading】

  • Channel-only attention blocks put the same weights on different spatial locations, such that the classification task still benefits since its spatial information eventually collapses by pooling, and the anchor displacement regression in object detection benefits since the channel-only attention unanimously highlights all foreground pixels

  • PSA in complex DCNN heads 效果如何作者还没有做

  • 光学故事背景讲的可以,提点也OK,但是没有全文反复强调,力没有集中发到一处,不够精确打击

你可能感兴趣的:(CNN,/,Transformer,人工智能,深度学习,PSA,polarized,attention)