Attention-based encoder-decoder model for neural machine translation //基于注意力编码解码 模型
This package is based on the dl4mt-tutorial by Kyunghyun Cho et al. ( https://github.com/nyu-dl/dl4mt-tutorial ). It was used to produce top-scoring systems at the WMT 16 shared translation task.//基于在WMT16里获得优秀表现的dl4mt
The changes to Nematus include:
the model has been re-implemented in tensorflow. See https://github.com/EdinburghNLP/nematus/tree/theano for the Theano-based version of Nematus.//使用TensorFlow重构了
new architecture variants for better performance: //新的变量结构带来更好的表现
improvements to scoring and decoding:
usability improvements:
see changelog for more info. //看changelog有更多信息
For general support requests, there is a Google Groups mailing list at https://groups.google.com/d/forum/nematus-support . You can also send an e-mail to [email protected] .
Nematus requires the following packages: //安装需要
To install tensorflow, we recommend following the steps at: ( https://www.tensorflow.org/install/ ) //安装TensorFlow的步骤
the following packages are optional, but highly recommended //如果是gpu运行的话推荐安装cuda
docker简介:https://baike.baidu.com/item/Docker/13344470?fr=aladdin
http://www.runoob.com/docker/docker-tutorial.html
You can also create docker image by running following command, where you change suffix
to either cpu
or gpu
:
docker build -t nematus-docker -f Dockerfile.suffix .
To run a CPU docker instance with the current working directory shared with the Docker container, execute:
docker run -v `pwd`:/playground -it nematus-docker
For GPU you need to have nvidia-docker installed and run:
nvidia-docker run -v `pwd`:/playground -it nematus-docker
Training speed depends heavily on having appropriate hardware (ideally a recent NVIDIA GPU), and having installed the appropriate software packages.
To test your setup, we provide some speed benchmarks with `test/test_train.sh', on an Intel Xeon CPU E5-2620 v4, with a Nvidia GeForce GTX Titan X (Pascal) and CUDA 9.0:
GPU, CuDNN 5.1, tensorflow 1.0.1:
CUDA_VISIBLE_DEVICES=0 ./test_train.sh
225.25 sentenses/s
All of the scripts below can be run with --help
flag to get usage information. //可以在对所有脚本使用 Python --help
Sample commands with toy examples are available in the test
directory; for training a full-scale system, consider the training scripts at http://data.statmt.org/wmt17_systems/training/
// 为了训练一个完整的系统,需要考虑WMT17里面的sample(转2)
接下来是具体的模型和参数介绍:
nematus/nmt.py
: use to train a new model //训练新模型的nmt.py
data sets; model loading and saving //数据集模型的装载与保存
parameter | description |
---|---|
--source_dataset PATH | parallel training corpus (source side) |
--target_dataset PATH | parallel training corpus (target side) |
--dictionaries PATH [PATH ...] | network vocabularies (one per source factor, plus target vocabulary) |
--model PATH | model file name (default: model.npz) |
--saveFreq INT | save frequency (default: 30000) |
--reload | load existing model (if '--model' points to existing model) |
--no_reload_training_progress | don't reload training progress (only used if --reload is enabled) |
--summary_dir | directory for saving summaries (default: same directory as the --saveto file) |
--summaryFreq | Save summaries after INT updates, if 0 do not save summaries (default: 0) |
network parameters
parameter | description |
---|---|
--embedding_size INT | embedding layer size (default: 512) |
--state_size INT | hidden layer size (default: 1000) |
--source_vocab_sizes INT | source vocabulary sizes (one per input factor) (default: None) |
--target_vocab_size INT | target vocabulary size (default: None) |
--factors INT | number of input factors (default: 1) |
--dim_per_factor INT [INT ...] | list of word vector dimensionalities (one per factor): '--dim_per_factor 250 200 50' for total dimensionality of 500 (default: None) |
--use_dropout | use dropout layer (default: False) |
--dropout_embedding FLOAT | dropout for input embeddings (0: no dropout) (default: 0.2) |
--dropout_hidden FLOAT | dropout for hidden layer (0: no dropout) (default: 0.2) |
--dropout_source FLOAT | dropout source words (0: no dropout) (default: 0) |
--dropout_target FLOAT | dropout target words (0: no dropout) (default: 0) |
--layer_normalisation | use layer normalisation (default: False) |
--tie_decoder_embeddings | tie the input embeddings of the decoder with the softmax output embeddings |
--enc_depth INT | number of encoder layers (default: 1) |
--enc_recurrence_transition_depth | number of GRU transition operations applied in an encoder layer (default: 1) |
--dec_depth INT | number of decoder layers (default: 1) |
--dec_base_recurrence_transition_depth | number of GRU transition operations applied in first decoder layer (default: 2) |
--dec_high_recurrence_transition_depth | number of GRU transition operations applied in decoder layers after the first (default: 1) |
--dec_deep_context | pass context vector (from first layer) to deep decoder layers |
--output_hidden_activation | activation function in hidden layer of the output network (default: tanh) |
training parameters
parameter | description |
---|---|
--maxlen INT | maximum sequence length (default: 100) |
--batch_size INT | minibatch size (default: 80) |
--token_batch_size INT | minibatch size (expressed in number of source or target tokens). Sentence-level minibatch size will be dynamic. If this is enabled, batch_size only affects sorting by length. |
--max_epochs INT | maximum number of epochs (default: 5000) |
--finish_after INT | maximum number of updates (minibatches) (default: 10000000) |
--decay_c FLOAT | L2 regularization penalty (default: 0) |
--map_decay_c FLOAT | MAP-L2 regularization penalty towards original weights (default: 0) |
--prior_model STR | Prior model for MAP-L2 regularization. Unless using "--reload", this will also be used for initialization. |
--clip_c FLOAT | gradient clipping threshold (default: 1) |
--learning_rate FLOAT | learning rate (default: 0.0001) |
--label_smoothing FLOAT | label smoothing (default: 0) |
--no_shuffle | disable shuffling of training data (for each epoch) |
--no_sort_by_length | do not sort sentences in maxibatch by length |
--maxibatch_size INT | size of maxibatch (number of minibatches that are sorted by length) (default: 20) |
--optimizer | optimizer (default: adam) |
--keep_train_set_in_memory | Keep training dataset lines stores in RAM during training |
validation parameters
parameter | description |
---|---|
--valid_source_dataset PATH | parallel validation corpus (source side) |
--valid_target_dataset PATH | parallel validation corpus (target side) |
--valid_batch_size INT | validation minibatch size (default: 80) |
--valid_token_batch_size INT | validation minibatch size (expressed in number of source or target tokens). Sentence-level minibatch size will be dynamic. If this is enabled, valid_batch_size only affects sorting by length. |
--validFreq INT | validation frequency (default: 10000) |
--patience INT | early stopping patience (default: 10) |
--run_validation | Compute validation score on validation dataset |
display parameters
parameter | description |
---|---|
--dispFreq INT | display loss after INT updates (default: 1000) |
--sampleFreq INT | display some samples after INT updates (default: 10000) |
--beamFreq INT | display some beam_search samples after INT updates (default: 10000) |
--beam_size INT | size of the beam (default: 12) |
nematus/translate.py
: use an existing model to translate a source text
parameter | description |
---|---|
-k K | Beam size (default: 5)) |
-p P | Number of processes (default: 5)) |
-n | Normalize scores by sentence length |
-v | verbose mode. |
--models MODELS [MODELS ...], -m MODELS [MODELS ...] | model to use. Provide multiple models (with same vocabulary) for ensemble decoding |
--input PATH, -i PATH | Input file (default: standard input) |
--output PATH, -o PATH | Output file (default: standard output) |
--n-best | Write n-best list (of size k) |
nematus/score.py
: use an existing model to score a parallel corpus
parameter | description |
---|---|
-b B | Minibatch size (default: 80)) |
-n | Normalize scores by sentence length |
-v | verbose mode. |
--models MODELS [MODELS ...], -m MODELS [MODELS ...] | model to use. Provide multiple models (with same vocabulary) for ensemble decoding |
--source PATH, -s PATH | Source text file |
--target PATH, -t PATH | Target text file |
--output PATH, -o PATH | Output file (default: standard output) |
nematus/rescore.py
: use an existing model to rescore an n-best list.
The n-best list is assumed to have the same format as Moses:
sentence-ID (starting from 0) ||| translation ||| scores
new scores will be appended to the end. rescore.py
has the same arguments as score.py
, with the exception of this additional parameter:
parameter | description |
---|---|
--input PATH, -i PATH | Input n-best list file (default: standard input) |
nematus/theano_tf_convert.py
: convert an existing theano model to a tensorflow model
If you have a Theano model (model.npz) with network architecture features that are currently supported then you can convert it into a tensorflow model using nematus/theano_tf_convert.py
.
if you use Nematus, please cite the following paper:
Rico Sennrich, Orhan Firat, Kyunghyun Cho, Alexandra Birch, Barry Haddow, Julian Hitschler, Marcin Junczys-Dowmunt, Samuel Läubli, Antonio Valerio Miceli Barone, Jozef Mokry and Maria Nadejde (2017): Nematus: a Toolkit for Neural Machine Translation. In Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp. 65-68.
@InProceedings{sennrich-EtAl:2017:EACLDemo,
author = {Sennrich, Rico and Firat, Orhan and Cho, Kyunghyun and Birch, Alexandra and Haddow, Barry and Hitschler, Julian and Junczys-Dowmunt, Marcin and L\"{a}ubli, Samuel and Miceli Barone, Antonio Valerio and Mokry, Jozef and Nadejde, Maria},
title = {Nematus: a Toolkit for Neural Machine Translation},
booktitle = {Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics},
month = {April},
year = {2017},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {65--68},
url = {http://aclweb.org/anthology/E17-3017}
}
the code is based on the following model:
Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio (2015): Neural Machine Translation by Jointly Learning to Align and Translate, Proceedings of the International Conference on Learning Representations (ICLR).
please refer to the Nematus paper for a description of implementation differences
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreements 645452 (QT21), 644333 (TraMOOC), 644402 (HimL) and 688139 (SUMMA).
![]() |
Name | Last modified | Size | Description |
---|---|---|---|---|
|
||||
![]() |
Parent Directory | - | ||
![]() |
data/ | 2017-07-24 11:16 | - | |
![]() |
downloads/ | 2017-07-24 11:16 | - | |
![]() |
model/ | 2017-07-24 11:16 | - | |
![]() |
scripts.tensorflow/ | 2018-07-16 09:11 | - | |
![]() |
scripts/ | 2018-05-25 16:29 | - | |
![]() |
vars | 2017-07-20 10:36 | 14 | |
|
We used various different approaches for preprocessing and data augmentation for monolingual data for different languages. Check the system description for more detail. //用了各种不同的方法来预处理,对于不同语言的单语数据增加需要查看更多细节描述。
In this directory, we provide a sample configuration for preprocessing and training for English->German.
//在这份目录里我们提供的预处理和训练的样本配置是en-ge(英语→德语)
Please note that this script will not reproduce our WMT17 results, which also rely on the use of back-translated monolingual data, and combination of multiple models. //注意结果不一定会复现WMT17的结果,因为它也依赖于回译单语数据的使用和多个模型的组合
Please also have a look at last year’s accompanying scripts and sample configurations; among others, there is documentation for right-to-left reranking: https://github.com/rsennrich/wmt16-scripts //可以看去年的样本,也有一些文件从右到左排序
//可以也去看看16的脚本和样本配置,
Note: since the WMT17 models were developed, Nematus has switched from using a Theano back-end to using TensorFlow. The scripts provided in the scripts
directory are for use with the Theano version; updated scripts for use with the current TensorFlow version can be found in scripts.tensorflow
.// 注意:因为WMT17模型又拓展了,nematus也从使用theano后端到了TensorFlow后端。theano版本的脚本被提供在script文件夹,升级后的脚本在script.TensorFlow文件夹。
download sample files (WMT17 parallel training data, dev and test sets):
scripts/download_files.sh
preprocess the training, development and test corpora:
scripts/preprocess.sh
train a Nematus model:
scripts/train.sh
evaluate your model:
scripts/evaluate.sh
This directory contains some of the University of Edinburgh’s submissions to the WMT17 shared translation task, and a ‘training’ directory with scripts to preprocess and train your own model.//这个目录包括爱丁堡大学在WMT17会议上共享的翻译任务,有一个‘training’目录和脚本在一起来预处理和训练你自己的模型
If you are accessing this through a git repository, it will contain all scripts and documentation, but no model files - the models are accessible at http://data.statmt.org/wmt17_systems //如果你想通过git仓库访问它,git文件里会包括所有script和document,但是没有模型文件,模型可以上面这个网址被找到
Use the git repository to keep track of changes to this directory: https://github.com/EdinburghNLP/wmt17-scripts //这里是git的地址
The models use the following software: //这个模型需要以下的准备
Please set the appropriate paths in the ‘vars’ file. //在vars文件设置合适的路径
you can download all files in this directory with this command: //用下面这个命令可以下载所有的文件
wget -r -e robots=off -nH -np -R index.html* http://data.statmt.org/wmt17_systems/
to download just one language pair (such as en-de), execute://或者只下载一个语种
wget -r -e robots=off -nH -np -R index.html* http://data.statmt.org/wmt17_systems/en-de/
to download just a single model (approx 2GB) and the corresponding translation scripts, ignoring ensembles, execute://只下载某个模型和对应的翻译脚本(无视总体)
wget -r -e robots=off -nH -np -R *ens2* -R *ens3* -R *ens4* -R *r2l* -R translate-ensemble.sh -R translate-reranked.sh -R index.html* http://data.statmt.org/wmt17_systems/en-de/
if you only download selected language pairs or models, you should also download these files which are shared: //如果你只下载某个语言,别忘了下载共享的其他文件
wget -r -e robots=off -nH -np -R index.html* http://data.statmt.org/wmt17_systems/scripts/ http://data.statmt.org/wmt17_systems/vars
first, ensure that all requirements are present, and that the path names in the ‘vars’ file are up-to-date. If you want to decode on a GPU, you can also update the ‘device’ variable in that file.//首先,确保所有需要的文件都准备好了,然后vars里的数据都更新了,如果你想在GPU上decode,记得更新device变量
each subdirectory comes with several scripts translate-*.sh. //所有来自几个script的子目录都叫trans-*.sh
For translation with a single model, execute://为了训练一个单语模型
./translate-single.sh < your_input_file > your_output_file
the input should be UTF-8 plain text in the source language, one sentence per line. //输入是UTP-8的普通句子 一个句子一行
We also provide ensembles of left-to-right models: //总体的从左向右模型
./translate-ensemble.sh < your_input_file > your_output_file
For some language pairs, we built models that use right-to-left models for reranking: //对于很多语言对,还有自右向左的重排
./translate-reranked.sh < your_input_file > your_output_file
We used systems that include ensembles and right-to-left reranking for our official submissions; //我们用包括总体和自右向左的重排序来进行正式提交
result may vary slightly from the official submissions due to post-submission improvements - see the shared task description for more details.//结果可能会轻微的取决于后提交改进,可以看共享任务描述获取更多信息
For training your own models, follow the instructions in training/README.md (转2)
All scripts in this directory are distributed under MIT license.
The use of the models provided in this directory is permitted under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported license (CC BY-NC-SA 3.0):https://creativecommons.org/licenses/by-nc-sa/3.0/
Attribution - You must give appropriate credit [please use the citation below], provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial - You may not use the material for commercial purposes.
ShareAlike - If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
The models are described in the following publication:
Rico Sennrich, Alexandra Birch, Anna Currey, Ulrich Germann, Barry Haddow, Kenneth Heafield, Antonio Valerio Miceli Barone, and Philip Williams (2017). “The University of Edinburgh’s Neural MT Systems for WMT17”. In: Proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task Papers. Copenhagen, Denmark.
@inproceedings{uedin-nmt:2017,
address = "Copenhagen, Denmark",
author = "Sennrich, Rico and Birch, Alexandra and Currey, Anna and
Germann, Ulrich and Haddow, Barry and Heafield, Kenneth and
{Miceli Barone}, Antonio Valerio and Williams, Philip",
booktitle = "{Proceedings of the Second Conference on Machine Translation,
Volume 2: Shared Task Papers}",
title = "{The University of Edinburgh's Neural MT Systems for WMT17}",
year = "2017"
}