(一零八):GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features

(一零八):GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features

  • Abstract
  • 1. Introduction
  • 2. Related Work
    • 2.1 Visual Representations for Image Captioning
    • 2.2 Application of Transformer in Vision/Language Tasks
  • 3. Method:Grid-and Region-based Image captioning Transformer
    • 3.1 Extracting Visual Features from Images
    • 3.2 Caption Generation Using Dual Visual Features

你可能感兴趣的:(transformer,计算机视觉,深度学习)