WMT的英德翻译

1. The University of Cambridge’s Machine Translation Systems for WMT18

1. basic Architecture

Combine the three most commonly used architectures: recurrent, convolutional, and self-attention-based models like the Transformer

2. system combination

If we want to combine q models , we first divide the models into two groups by selecting a p with 1 p q.

Then, we refer to the first group as full posterior scores and the second group as MBR-based scores.

Full-posterior models scores compute as follows:


在这里插入图片描述

Combined scores compute as follows:


WMT的英德翻译_第1张图片
在这里插入图片描述

3. Data

1. language detection (Nakatani, 2010) on all available monolingual and parallel data
2. additionally filtered on ParaCrawl
  • No words contain more than 40 characters.
  • Sentences must not contain HTML tags.
  • The minimum sentence length is 4 words.
  • The character ratio between source and targetmust not exceed 1:3 or 3:1
  • Source and target sentences must be equal af-ter stripping out non-numerical characters.
  • Sentences must end with punctuation marks.

2. NTT’s Neural Machine Translation Systems for WMT 2018

1. basic Architecture

Transformer Big

2. Data

  • Noisy Data Filtering
  1. use language model (such as KenLM) to evaluate a sentences naturalness
  2. use a word alignment model (such as fast_align) to check whether the sentence pair has the same meaning
  • Synthetic Corpus
  1. translating monolingual sentences with Transformer -> seudo-parallel corpora
  2. Back-translate & evaluate -> selected the high-scoring sentence pair
  • Right-to-Left Re-ranking
  1. R2L model re-ranks an n-best hypothesis generated by the Left-to-Right (L2R) model (n=10)

3. Microsoft’s Submission to the WMT2018 News Translation Task:How I Learned to Stop Worrying and Love the Data

1. basic Architecture

Transformer Big + Ensemble-decoding + R2L Reranking

2. Data

  • Dual conditional cross-entropy filtering

    For a sentence pair(x, y), cross-entropy compute as follows:

    where A and B are translation models trained on the same data but in inverse directions.(We setting and )

    is the probability distribution for a model M

  • Data weighting

    sentence instance weighting is a feature available in Marian(Junczys-Dowmunt et al., 2018) .

    sentence score = Data weighting * cross-entropy -> sort and select by sentence score

你可能感兴趣的:(WMT的英德翻译)