
1. 提升self-attention的时间、空间利用率

Linformer: Self-Attention with Linear Complexity


  1. 混合精度
    Mixed precision training. 2017
    fairseq: A fast, extensible toolkit for sequence modeling. 2019
    Quantization and training of neural networks for efficient integer-arithmetic-only inference. 2018
    Training with quantization noise for extreme fixed-point compression. 2020

  2. 知识蒸馏
    Distilling the knowledge in a neural network. 2015
    Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. 2019

  3. Sparse Attention
    Generating long sequences with sparse transformers. 2019
    Blockwise self-attention for long document understanding. 2019

  4. LSH Attention
    Reformer: The efficient transformer. 2020

  5. Improving Optimizer Efficiency
    Gpipe: Efficient training of giant neural networks using pipeline parallelism. 2019
    Training deep nets with sublinear memory cost. 2016

2. 数据增强

Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution

3. mt book推断加速

  1. 输出层的词汇选择
    On Using Very Large Target Vocabulary for Neural Machine Translation. 2015

  2. 消除冗余计算
    Sharing Attention Weights for Fast Transformer. 2019
    Recurrent Stacking of Layers for Compact Neural Machine Translation Models. 2019 代码 tf

  3. 轻量解码端及小模型
    (1)把解码端的网络变得更 “浅”、更 “窄”
    (2)化简 Transformer 的解码端神经网络
    ①使用平均注意力机制代替原始的 Transformer 自注意力机制
    Accelerating Neural Transformer via an Average Attention Network. 2018 代码 tf

    Pay Less Attention with Lightweight and Dynamic Convolutions. 2019 论文解读 代码 fairseq!!!

    Sharing Attention Weights for Fast Transformer. 2019

    The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation. 2018 代码1 fairseq!!! 代码2 tensor2tensor

  4. 批量推断

  5. 低精度运算

  6. 非自回归翻译

  7. 其他


Linear Transformers Are Secretly Fast Weight Memory Systems. 2021 代码 pytorch fairseq!!!
DEFINE: deep factorized input token embeddings for neural sequence modeling. ICLR 2020
DELIGHT: Deep and Light-Weight Transformer. ICLR 2021 代码 fairseq
Performers: Rethinking attention with performers. ICLR 2021 代码 tf 里面有一部分pytorch的实现,数据是随机初始化的
Efficient transformer for mobile applications. ICLR 2020
Learning Light-Weight Translation Models from Deep Transforer. 2020
Reformer: the efficient transformer. ICLR 2020 代码 trax
Universal transformers. ICLR 2019 代码 trax tensor2tensor
Depth-adaptive transformer. ICLR 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. 2020

Are Pre-trained Convolutions Better than Pre-trained Transformers?. 2021
Measuring and Increasing Context Usage in Context-Aware Machine Translation. 2021


