NMT数据集汇总

1. 多模态NMT的数据集:

主要来源于WMT16,WMT17,WMT18的共享任务(Multi30k EN-DE,EN-Fr,EN-CS):

http://www.statmt.org/wmt16/multimodal-task.html

http://www.statmt.org/wmt17/multimodal-task.html

http://www.statmt.org/wmt18/multimodal-task.html

2. IWSLT(国际口语研讨会)数据集:

IWSLT2011~IWSLT2020:https://wit3.fbk.eu/home;

如IWSLT2015:  

           train和dev:https://wit3.fbk.eu/2015-01

            test.en:https://wit3.fbk.eu/2015-01-b

            test.de:https://wit3.fbk.eu/2015-01-c

           总数据集下载: https://github.com/pengr/iwslt15/blob/master/en-de.tgz

2. WMT(国际机器翻译研讨会)数据集:https://www.tensorflow.org/datasets/catalog/wmt15_translate#wmt15_translateru-ensubwords8k

3. OPUS:https://opus.nlpl.eu/

3. 中文机器翻译数据集:https://www.jianshu.com/p/df85ddf56eef

4. 大规模中文自然语言处理语料:https://github.com/brightmart/nlp_chinese_corpus

5. 中文自然语言处理机器翻译语料库:https://github.com/didi/ChineseNLP/blob/master/docs/machine_translation.md

 

 

 

你可能感兴趣的:(深度学习工具)