BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation ...

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Abstract

本文提出 BART (Bidirectional and Auto-Regressive Transformers),一个用于预训练 seq2seq 模型的去噪自编码器。BART 的训练过程包括:

  • corrupting text with an arbitrary noising fuction
  • learnging a model to reconstruct the original text.

BART 使用标准的基于 Transformer 的神经机器翻译架构,可以看做是 BERT(双向编码器),GPT(从左到右的解码器), 以及近期很多其他预训练机制的推广。

我们评估了很多 noising approaches, 发现性能最好的两种是:

  • 随机打乱句子的顺序
  • in-filling scheme: spans of text are replaced with a single mas

你可能感兴趣的:(NLP,Papers,NLP,AI,nlp,deep,learning,ai)