论文阅读:EDVR: Video Restoration with Enhanced Deformable Convolutional Networks

Abstract

**1.benchmark:**REDS is released in the NTIRE19 Challenge,contain larger and more complex motions
2.难点:
(1)align multiple frames given large motions
(2)effectively fuse(融合) different frames with diverse motion and blur。
3.方法:Enhanced Deformable convolutions,
(1)a Pyramid, Cascading
and Deformable (PCD) alignment module, in which frame
alignment is done at the feature level using deformable convolutions in a coarse-to-fine manner
(2)a Temporal and Spatial Attention (TSA) fusion module to emphasize important features for subsequent restoration.
4.代码:https://github.com/xinntao/EDVR.
**论文地址:**https://arxiv.org/abs/1905.02716v1

Introduction

1.Earlier studies: a simple extension of image restoration,The temporal redundancy among
neighboring frames is not fully exploited
**2.Recent studies:**精细化,主要包含四部分feature extraction, alignment, fusion(融合), and
reconstruction(重建)。当存在遮挡,大幅度运动,严重模糊时,主要的挑战在alignment 和 fusion modules。为了得到高质量的图像必须(1)align and establish accurate correspondences among multiple frames(2) effectively fuse
the aligned features for reconstruction
**3.Alignment:**多用 flow-based methods
Fusion:use convolutions to perform early fusion on all
frames or adopt recurrent networks to gradually fuse
multiple frames,Ding Liu, Robust video super-resolution with learned temporal dynamics propose a temporal
adaptive network that can dynamically fuse across different temporal scales。
4.our solution:EDVR are (1) an alignment module known as Pyramid,Cascading and Deformable convolutions (PCD), and (2) a fusion module known as Temporal and Spatial Attention
(TSA)
PCD:灵感来源Yapeng Tian,TDAN: Temporally deformable alignment network for video super-resolution. using
deformable convolutions to align each neighboring frame
to the reference frame at the feature level.Different from
TDAN, we perform alignment in (1)a coarse-to-fine manner to handle large and complex motions. (2)pyramid structure,(3) cascade an additional deformable convolution after the pyramidal alignment
TSA:(1)temporal attention(2) spatial attention

Related Work

1.Video Restoration:SRCNN首次采用deeplearning,还有其他方法(flow-based),但是occlusion 和运动accurate flow is difficult to obtain given occlusion and large motions,DUF [10] and TDAN [40]circumvent the problem by implicit motion compensation and surpass the flow-based methods.
2.Deformable Convolution.:Jifeng Dai, Deformable convolutional networks. obtain information away from its regular local neighborhood, improving the capability of regular convolutions.Deformable convolutions are widely used in various tasks such as video object detection [1], action recognition [53], semantic segmentation [3],and video super-resolution [40]. In particular, TDAN [40]uses deformable convolutions to align the input frames at the feature level without explicit motion estimation or image warping.
3.Attention Mechanism:Attention has proven its effectiveness in many tasks。

Methodoloy

1.structure
论文阅读:EDVR: Video Restoration with Enhanced Deformable Convolutional Networks_第1张图片
2.PCD Alignment:aligning features of each neighboring frame to that of the reference one,align on features of each frame,
DConv is the deformable convolution ; three-level pyramid
论文阅读:EDVR: Video Restoration with Enhanced Deformable Convolutional Networks_第2张图片
3.Fusion with TSA:Inter-frame temporal relation and intra-frame spatial relation are critical in fusion。给相似度大的neighboring frame 分配更多的注意。
论文阅读:EDVR: Video Restoration with Enhanced Deformable Convolutional Networks_第3张图片
4. Two-Stage Restoration:

Expe and result

1.datasets:REDS [26] is a newly proposed high-quality (720p) video dataset in the NTIRE19 Competition
2.效果很好

你可能感兴趣的:(视频超分)