Machine TranslationUseful Links: Techniques, Toolkits, Videos
TianLiang
2011-12-2
There are some useful links about the machine translation research field, which you can refer for your research work orstudy. Here we give you some short descriptions about the function of therelated toolkits as well as the related URL. For more details, you can refer tothe links.
(1) Apertium:
Short Description:Afree/open-source rule-based machine translation platform.
URL:http://www.apertium.org/
(2) OpenLogos:
Short Description: It’s an open source portof the Logos machine translation system for Linux.
URL: http://en.wikipedia.org/wiki/OpenLogos
(3) Matxin:
ShortDescriptions: It’s an open-source transfer machinetranslation engine.
URL:http://matxin.sourceforge.net/
Example-based Machine Translation Toolkits:
(1) Marclator:
ShortDescription:Marclator is a free example-basedmachine translation system based on the marker hypothesis, comprisinga marker-driven chunk, a collection of chunk aligners, and a simpleproof-of-concept monotonic recombination or "decoder".
URL:http://www.openmatrex.org/marclator/
(2) PBMBMT:
ShortDescription: APhrase-based Memory-based Machine Translation system, based on memory-basedclassifiers.
URL:http://ilk.uvt.nl/mbmt/pbmbmt/
(3) OpenMaTrEx:
ShortDescription: It’s afree/open-source marker-driven example-based machine translation system.
URL:http://openmatrex.org/
Statistical-basedMachine TranslationToolkits:
(1) cdec:
Short Description:It’s a software used for decoding and alignment.
URL:http://cdec-decoder.org/
(2) Ncode:
Short Description: It’s an open source statistical machine translationsystem based on bilingual n-grams
URL:http://www.limsi.fr/Individu/jmcrego/bincoder/
(3) DoMY™ CE
Short Description:DoMY™ CE combines the best open source SMT translation softwareinto one easy-to-install package
URL: http://www.precisiontranslationtools.com/index.php?option=com_content&view=article&id=1&Itemid=22
(4) Phramer:
Short Description: An open-source statistical phrase-based MT decoder.
URL: http://www.hlt.utdallas.edu/~marian/phramer/
(5) Joshua:
Short Description:Joshua is an open-source statisticalmachine translation decoder for hierarchical and syntax-based machinetranslation, written in Java
URL:http://joshua.sourceforge.net/Joshua/Welcome.html
(6) Moses:
Short Description:The most usedSMT system, including phrase-based and tree-based models.
URL: http://www.statmt.org/moses/
(7) Thot:
ShortDescription: It is a toolkit to trainphrase-based models for statistical machine translation. Thot allows toestimate phrase-based models and to obtain the best alignments at phrase levelfor a given set of sentence pairs.
URL:http://sourceforge.net/projects/thot/
Combined Machine Translation System:
(1) Anusaaraka:
ShortDescription: Anusaaraka is anEnglish-Hindi language accessing software. It is a machine translation toolwith insights from Panini's Ashtadhyayi (Grammar rules); and aims at the fusionof traditional Indian shastras and advanced modern technologies.
URL:http://anusaaraka.iiit.ac.in/
(2) MANY:
ShortDescription: It’s MT system combination system.
URL:http://code.google.com/p/many/
(3) MEMT:
ShortDescription:Systemcombination; won 6/8 language pairs in WMT11.
URL: http://kheafield.com/code/
(4) Cunei:
ShortDescription:Cunei is a data-driven platform for machine translation.
URL:http://www.cunei.org/
Alignment Tools:
(1) ABBYY Aligner:
Short Description: ABBYY Aligner is a professional tool foraligning parallel texts and creating Translation Memory databases. Thiseasy-to-use and convenient software accurately finds matching segments inparallel texts and allows saving them into TMX files for further use inCAT-tools or into RTF files. Based on ABBYY's advanced linguistic technology,ABBYY Aligner ensures excellent quality of parallel text alignment. Thesoftware has an intuitive interface and wide function capabilities for quickand efficient work.
URL:http://www.abbyy.com/aligner/
(2) GIZA++:
Short Description: GIZA++is a statisticalmachine translation toolkit that is used to train IBM Models 1-5 and an HMMword alignment model.
URL:http://code.google.com/p/giza-pp/
(3) Anymalign:
Short Description: Anymalignis amultilingual sub-sentential aligner. It can extract lexical equivalences fromsentence-aligned parallel corpora. Its main advantage over other similar toolsis that it can align any number of languages simultaneously
URL:http://www.limsi.fr/Individu/alardill/anymalign/
(4) Hualign:
ShortDescription:hunalign aligns bilingual text on the sentence level. Its input is tokenizedand sentence-segmented text in two languages. In the simplest case, its outputis a sequence of bilingual sentence pairs
URL:http://mokk.bme.hu/en/resources/hunalign/
(5) Araya:
Short Description:Bilingualalignment and alignment editor creatingTMX files
URL:http://www.heartsome.de/en/araya.php#TMX
(6) MGIZA++:
ShortDescription: A word alignment tool based on famous GIZA++,extended to support multi-threading, resume training and incremental training.
URL:http://sourceforge.net/projects/mgizapp/
(7) Berkeley WordAligner:
Short Description: The Berkeley Word Aligner is astatistical machine translation tool that automatically aligns words in asentence-aligned parallel corpus in supervised and unsupervised ways.
URL:http://code.google.com/p/berkeleyaligner/
(8) PostCAT:
Short Description: This package contains code to perform word alignment using IBM model 1, 2 and the HMM model, using both EM to train and also using constrained EM with agreement constraints and sub stochastic constraints.
URL:http://www.seas.upenn.edu/~strctlrn/CAT/CAT.html
(9) BIA:
Short Description:A suite consisting of a discriminative phrase-based alignment decoder based onlinear alignment models, along with training and tuning tools. In the trainingphase, relative link probabilities are calculated based on an initialalignment. The tuning of the model weights may be performed directly accordingto MT metrics.
URL:http://code.google.com/p/bia-aligner/
(10) RegAligner:
Short Description: It is an adequate replacement for GIZA++ as the models IBM-1, 2, 3, 4 and HMM are implemented.
URL: https://github.com/Thomas1205/RegAligner
(11) Tree Aligner:
ShortDescription:A statisticaltree-to-tree aligner, which can be used for the automatic generation ofparallel treebanks.
URL:http://www.ventsislavzhechev.eu/Home/Software/Software.html
(12) CorpusFiltergraph:
ShortDescription:Statistical machine translation support toolbox toextract, filter, align and transform text data from multilingual documents intoparallel training corpora.
URL:http://sourceforge.net/projects/corpfiltergraph/
(13) tree-alignment-visualizer:
ShortDescription:The majority of existing tools has beencreated for the visualization of basic word alignments as well as for manualannotation of sentence pairs. At the same time, the growing interest in theresearch community in syntax augmented machine translation makes thesimultaneous visualization of alignment links and parse trees increasinglyimportant. We provide a visualization tool for this purpose, which givinginsight in the data facilitates the research toward better translation systems.
URL:http://code.google.com/p/tree-alignment-visualizer/
Evaluation Tools:
(1) NIST/BLEU Confidence Estimator:
ShortDescription:ConfidenceInterval Estimation for MT Evaluations.
URL:http://projectile.sv.cmu.edu/research/public/tools/bootStrap/tutorial.htm
(2) EvalTrans:
ShortDescription:A Tool for the automatic and manualevaluation of translations
URL:http://www-i6.informatik.rwth-aachen.de/web/Software/EvalTrans/index.html
(3) ROUGE:
ShortDescription:Recall-OrientedUnderstudy for Gisting Evaluation (ROUGE), is a set of metrics and a softwarepackage used for evaluating automatic summarization and machine translationsoftware in natural language processing. The metrics compare an automaticallyproduced summary or translation against a reference or a set of references(human-produced) summary or translation.
URL: http://berouge.com/default.aspx
(4) Hierson:
ShortDescription:It’s a tool for automatic error classificationbased on Levensthein distance, precision and recall
URL:http://www.dfki.de/~mapo02/hjerson/
(5) SymEval:
ShortDescription: SymEval is a translation evaluation toolkit that allowsyou to compare and score translations. All you need is source text andtranslated texts that you would like to evaluate.
URL:http://sourceforge.net/apps/mediawiki/symeval/index.php?title=Main_Page
(6) METEROR:
ShortDescription:It’s an automated Metricand Toolkit for MT Evaluation.
URL:http://www.cs.cmu.edu/~alavie/METEOR/
(7) TEROM:
ShortDescription:TERCOM is an implementation of the Translation Error Rate, which is an errormetric for machine translation that measures the number of edits required tochange a system output into one of the references.
URL: http://nlp.cs.qc.cuny.edu/snover/
(8) TERcpp:
ShortDescription:This tool is made toscore machine translation performance with the TER metric. This code is basedon Snover's algorithm.
URL:http://sourceforge.net/projects/tercpp/
(9) Mteval:
ShortDescription:Implementation of BLEU and NIST MTevaluation metrics.
URL:http://jaguar.ncsl.nist.gov/mt/resources/mteval-v13a-20091001.tar.gz
(10) MultiEval:
ShortDescription: Machine Translation Evaluation Toolkit(BLEU, METEOR, TER).
URL:https://github.com/jhclark/multeval
(11) NIST:
ShortDescription: Anopen MT evaluation.
URL:http://www.itl.nist.gov/iad/mig//tests/mt/
Language Models:
(1) IRSTLM:
Short Description:A collection of implemented algorithms anddata structures suitable to estimate, store, and access very large LMs
URL:http://hlt.fbk.eu/en/irstlm
(2) KenLM:
Short Description:KenLM is a library that loads language model files andreturns probabilities.
URL:http://kheafield.com/code/kenlm/
(3) RandLM:
ShortDescription:This projects dealswith space-efficient ngram-based language models built using randomizedrepresentations.
URL:http://randlm.sourceforge.net/
(4) SRILM:
ShortDescription:Atoolkit for building and applying statistical language models
URL:http://www.speech.sri.com/projects/srilm/
Part-of-Speech Taggers:
(1) MXPOST:
Short Description:MXPOST wasdeveloped by Adwait Ratnaparkhi as part of his PhD thesis. It is a Java implementation of a maximum entropy model. It can betrained for any languagepair for with annotatedPOS data exists.
URL: ftp://ftp.cis.upenn.edu/pub/adwait/jmx/jmx.tar.gz
(2) TreeTagger:
ShortDescription: TreeTagger is atool for annotating text with part-of-speech and lemma information.
URL:http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
SyntacticParsers:
(1) Berkeley Parser:
ShortDescription:The parser focuseson learning probabilistic context-free grammars (PCFGs) which assign a sequenceof words the most likely parse tree. The parser supports a variety of languagesand achieves state-of-the-art performance on most of them. The languages are: English, Bulgarian, Arabic, Chinese, French, and German.
URL:http://code.google.com/p/berkeleyparser/
(2) BitPar:
ShortDescription:BitPar is a parser forhighly ambiguous probabilistic context-free grammars (such as treebank grammars). BitPar usesbit-vector operations to speed up the basic parsing operations byparallelization.
URL: http://www.ims.uni-stuttgart.de/tcl/SOFTWARE/BitPar.html
(3) Collins:
Short Description: It’s the firststatistical parser as part of Michael Collins’s PhD thesis, which also requiresthe installation of MXPOST.
URL:http://www.cs.columbia.edu/~mcollins/
(4) GenPar:
ShortDescription:It provides an architecture, a design, and an implementation of an integratedsystem for statistical machine translation by parsing.
URL:http://nlp.cs.nyu.edu/GenPar/
(5) LoPar:
ShortDescription:LoPar is an implementation of a parser for head-lexicalized probabilisticcontext-free grammars, which can be also used for morphologicalanalysis.
URL:http://www.ims.uni-stuttgart.de/tcl/SOFTWARE/LoPar.html
Study Videos:
(1) Phrase-based and factored statistical machinetranslation videos:
Short Description:It’s a lectureprovided by Philipp Koehn.
URL:http://videolectures.net/aerfaiss08_koehn_pbfs/
(2) Video and Lectures (视频教程大全):
ShortDescription: It’s a collection ofdifferent videos on programming language.
URL: http://www.spjc8.com/
(3) Boobooke (播布客):
Short Description:There are manylanguage study lectures on this webpage.
URL:http://www.boobooke.com/index.html