UnsupervisedMT 架构的理解

Hi, Mr Kim. Recently, I re-read the two papers

and

I have some ideas may not right and some questions.

The improvement from the first paper to the second paper comes from:
i) adding language model training before and during the MT training process, since the lm consists of encoder and decoder, so the better encoder and decoder parameters can help the translation results get more smooth.


ii) adding on-the-fly back translation


image.png

MT will be improved iteratively


Question:
i) in lm.py, the language model using shared layers (which is LSTM layer)of the encoder or decoder, so this means there is no implementation for Transformer-LM. Why? any reference says the RNN-LM works betters than Transformer-LM?


ii) in transformer.py


image.png

one_hot=True has not been implemented for transformer, why? I think Transformer should also use one-hot for loss and training


iii) in trainer.py

image.png

there are three ways to train the encoder/decoder LM, I did not see the reason why we need train lm_enc_rev, and also here we do not add_noise for LM training, which is different from
image.png


iiii) add_noise is only called in autoencoder training, not in lm training as written in main.py, though they may work the same but I think the authors should better notice this in this paper.

What's your thought on my ideas and questions, if you have comments, tell me please~

你可能感兴趣的:(UnsupervisedMT 架构的理解)