多模态论文学习之ALBEF(Align BEfore Fusing)

ALBEF泛读

  • Title
  • Links
  • Motivation
  • How to solve it?(Contribution)
  • Model
  • Experiments
    • Pre-training Datasets
    • Downstream tasks
    • Ablation Experiment

Title

《Align before Fuse: Vision and Language
Representation Learning with Momentum Distillation》

Links

Paper地址

Motivation

大多数多模态模型都是用transformer的编码器同时编码视觉的token(region-based image features)和文本的token。用了目标检测器后

你可能感兴趣的:(学习,论文阅读,论文笔记)