Swin Transformer, SwinIR, SwinFIR

Swin Transformer

痛点:尺度变化 – 提出Hieracical

attention的变化: sliding window attention,减少了attention的计算复杂度,同时通过sliding增强connection,实现全局attention

和ViT的区别:ViT16*下采样,尺寸单一;Swin Transformer多尺度

Swin Transformer, SwinIR, SwinFIR_第1张图片

local的思维:在一个小范围算attention是基本够用的,全局算self-attention有浪费。(基于假设:属性相似的目标距离是相对接近的)

Patch Partition: 相当于打碎成block

Patch Merging: MaxPooling,提供多尺度特征

Swin Transformer, SwinIR, SwinFIR_第2张图片

从H * W * C变换为H/2 * W/2 * 2C

总backbone:

Swin Transformer, SwinIR, SwinFIR_第3张图片

回顾一下Attention is all you need: 从CNN, RNN变为Transformer

​ CNN:Transformer借鉴了CNN的多通道以提取不同特性的特征,另外CNN的金字塔结构可以将相隔较远的信息归纳在一起
​ RNN:认为RNN结构并行化差,序列化执行效率低下 纳在一起 ​ RNN:认为RNN结构并行化差,序列化执行效率低下
​ 总体来说就是借鉴了优点、改进缺陷

SwinIR: Swin Transformer for Image Restoration

Related Work(IR的方法)

  1. Traditional model-based

  2. CNN-based (SRCNN…):

    a flurry of CNN-based models have been proposed to improve model representation ability by using more elaborate neural network architecture designs, such as residual block, dense block and others . Some of them have exploited the attention mechanism inside the CNN framework, such as channel attention, non-local attention and adaptive patch aggregation.

  3. Vision Transformer

SwinIR Model

Swin Transformer, SwinIR, SwinFIR_第4张图片

SwinIR consists of three modules:
shallow feature extraction, deep feature extraction and high-quality (HQ) image reconstruction modules.

1.Shallow feature extraction

Shallow Features: we use a 3*3 convolutional layer HSF() to extract shallow feature as F 0 F_0 F0 … The convolution layer is good at early visual processing, leading to more stable optimization and better results. It also provides a simple way to map the input image space to a higher dimensional feature space.

2. Deep feature extraction

Deep features: Then, we extract deep feature F D F F_{DF} FDF from F 0 F_0 F0 as F D F = H D F ( F 0 ) F_{DF} = H_{DF}(F_0) FDF=HDF(F0)where F D F ( . ) F_{DF}(.) FDF(.) is the deep feature extraction module and it contains K residual Swin Transformer blocks (RSTB) and a 3 * 3 convolutional layer.

3. Image Reconstruction

SR Task: we reconstruct the high-quality image I R H Q I_{RHQ} IRHQ by aggregating shallow and deep features as

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SS9Rhomc-1667876348926)(1103 SwinIR SwinFIR.assets/image-20221025084910210.png)]
where H R E C ( ) H_{REC}() HREC() is the function of the reconstruction module.

non-Upsampling Task: such as image denoising and JPEG compression artifact reduction, a single convolution layer is used for reconstruction. Besides, we use residual learning to reconstruct the residual between the LQ and the HQ image instead of the HQ image. This is formulated as

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mFG4S6wX-1667876348926)(1103 SwinIR SwinFIR.assets/image-20221025085212435.png)]

4. Loss Design

Swin Transformer, SwinIR, SwinFIR_第5张图片

​ 也就是RHQ和HQ之间的residual

SwinFIR

Swin Transformer, SwinIR, SwinFIR_第6张图片

你可能感兴趣的:(transformer,计算机视觉)