ConvTransformer: A Convolutional Transformer Network for Video Frame

论文摘抄好词好句

  • ConvTransformer: A Convolutional Transformer Network for Video Frame Synthesis
    • 好词
    • 好句子
      • Abstract
      • Introduction
      • Related Work
      • usage
      • Experiments and Analysis
      • Conclusion

ConvTransformer: A Convolutional Transformer Network for Video Frame Synthesis

好词

1.To the best of our knowledge 据我们所知
2.In order to bridge this gap 为了弥合这一差距
3.It is worth mentioning that 值得一提的是
4It is should be emphasized that 应该强调的是
5.Comparisons with State-of-the-arts 与最先进的技术比

好句子

Abstract

1.Deep Convolutional Neural Networks (CNNs) are powerful models that have achieved excellent performance on difficult computer vision tasks.

2.Although CNNS perform well whenever large labeled training samples are available,they work badly on video frame synthesis due to objects deforming and moving, scene lighting changes, and cameras moving in video sequence.

3.In this paper, we present a novel and general end-to-end architecture, called convolutional Transformer or ConvTransformer, for video frame sequence learning and video frame synthesis.

  1. The core ingredient of ConvTransformer is the proposed attention layer,
    i.e., multi-head convolutional self-attention, that learns the sequential dependence of video sequence.
  2. Our method ConvTransformer uses an encoder, built upon multi-head convolutional self-attention layers, to map the input sequence to a
    feature map sequence, and then another deep networks, incorporating multi-head convolutional self-attention layers,decode the target synthesized frames from the feature maps sequence.
    6.Experiments on video future frame extrapolation task show ConvTransformer to be superior in quality while being more parallelizable to recent approaches built upon convoltuional LSTM (ConvLSTM).
    7.To the best of our knowledge, this is the first timethatConvTransformerarchitecture is proposed and applied to video frame synthesis.

Introduction

1.Video frame synthesis, aiming to synthesize spatially and temporally coherent intermediated frames between two consecutive reals frames or synthesize the future frames of a frame sequence, is a classical and fundamental problem in video processing and computer vision community.
2.The abrupt motion artifacts and temporal aliasing in video sequence can be compressed with the help of video frame synthesis, and hence it can be applied to numerous appli cations ranging from motion deblurring [5] to video frame rate up-sampling [7, 3], video editing [23, 37], novel view
synthesis [9] and autonomous vehicle [34].
3.Although these methods perform well when optical flow is accurately estimated, they would generate motion blur and artifacts with inaccurate optical estimation.
4.In order to bridge this gap, we, in this work, propose a general end-to-end video frame synthesis network, i.e., convolutional Transformer (ConvTransformer), which simplifies the video frame synthesis as an encoder and decoder problem.
5.The main contributions of this paper are therefore as follows.

Related Work

1.Although their methods achieves high-quality results, these methods suffer from heavy computation and are sensitive to motion.
2.In order to overcome this issue, a convolutional Transformer (ConvTransformer) is proposed in this work, and has been successfully applied to video frame synthesis. The experiment results show that the simplified architecture ConvTransformerachieves competitive results as compared to state-of-the-art well-designed networks, such as MCNet, DAIN and BMBC. To the best of our knowledge, it is the
first time that ConvTransformer architecture is proposed.

usage

Experiments and Analysis

1.For a fair comparison, we implemented and retrained these methods with the same trainset for training ConvTransformer.
2.In order to evaluate and justify the efficiency and superiority of each part in the proposed ConvTransformer architecture, several ablation experiments have been conducted
in this work

Conclusion

1.In this work, we propose a novel video frame synthesis architecture ConvTransformer, which not only works well on video frame extrapolation, but also interpolates photo-realistic middle frames.
2.Extensive quantitative and qualitative evaluations indicate that the proposed solution ConvTransformer performsfavorablyagainst existing frame extrapolation and interpolation methods.
3. The successfully implementation of ConvTransformersheds light for applying it for other video
tasks that need to exploit the long-term sequential dependence in video.

你可能感兴趣的:(语句,机器学习)