Statistical Phrase-based Machine Translation: Experiments with Brazilian Portuguese

  • Wilker F. Aziz USP
  • Thiago A. S. Pardo USP
  • Ivandré Paraboni USP

Abstract


Statistical approaches have recently emerged as the main paradigm in Machine Translation (MT) research. In previous work we have shown that results of a simple statistical word-based MT system may be highly comparable to those produced by a rule-based approach for closely-related languages such as Brazilian Portuguese and European Spanish. In this work we take the discussion one step further and present evidence that a more sophisticated (namely, phrase-based) translation model may outperform rule-based translation for this language pair, and additional results of a first experiment in Portuguese/English phrase-based statistical MT.

References

Aziz, Wilker Ferreira, Thiago Alexandre Salgueiro Pardo e Ivandré Paraboni (2008) An Experiment in Portuguese-Spanish Statistical Machine Translation. 19th Brazilian Symposium on Artificial Intelligence (SBIA-2008). LNAI vol. 5249, pages 248-257. Springer-Verlag Berlin Heidelberg.

Brants, Thorsten; Ashok C. Popat; Peng Xu; Franz J. Och and Jeffrey Dean (2007) “Large language models in machine translation”. The 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-2007), June 28-30, Prague, pages 858-857.

Brown, P. E.; S. A. D. Pietra; V. J. D. Pietra and R. L. Mercer (1993) “The Mathematics of Statistical Machine Translation: Parameter Estimation”. Computational Linguistics, Vol. 16, N. 2 pages 79-85.

Clarkson, P. R. and R. Rosenfeld (1997) “Statistical Language Modeling Using the CMU-Cambridge Toolkit”. Proceedings of ESCA Eurospeech.

Corbí-Bellot, A.M.; M. L. Forcada; S. Ortiz-Rojas; J. A. Pérez-Ortiz; G. Ramírez-Sánchez; F. Sánchez-Martínez; I. Alegria; A. Mayor and K. Sarasola (2005) “An open-source shallow-transfer machine translation engine for the romance languages of Spain”. 10th Annual Conference of the European Association for Machine Translation, pages 79-86.

Germann, U.; M. Jahr; Kevin Knight; Daniel Marcu and K. Yamada (2001) “Fast Decoding and Optimal Decoding for Machine Translation”. Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics.

Koehn, Philipp (2005) “Europarl: A Parallel Corpus for Statistical Machine Translation”. MT Summit.

Koehn, Philipp et. al. (2007) “Moses: Open Source Toolkit for Statistical Machine Translation”. Annual Meeting of the Association for Computational Linguistics.

Koehn, Philipp; Franz Josef Och, and Daniel Marcu (2003) “Statistical phrase-based translation”. HLT-NAACL-2003, pages 48-54.

NIST (2002) “Automatic Evaluation of Machine Translation Quality using n-gram Co-occurrence Statistics”. [link]

Och, F.J. and H. Ney (2003) “A Systematic Comparison of Various Statistical Alignment Models”. Computational Linguistics, Vol. 29, nro.1, pages 19-51.

Och, F.J. and H. Ney (2004) “The Alignment Template Approach to Statistical Machine Translation”. Computational Linguistics, Vol. 30, nro.4, pages 417-449.

Papineni, K.; S. Roukos; T. Ward and W. Zhu (2002) “BLEU: a Method for Automatic Evaluation of Machine Translation”. 40th Annual Meeting of the Association for Computational Linguistics, pages 311-318.

Stolcke, A. (2002) “SRILM -- An Extensible Language Modeling Toolkit”. International Conference on Spoken Language Processing, vol. 2, Denver, pages 901-904.

Zhang Y., S. Vogel and A. Waibel (2004) “Interpreting BLEU/NIST Scores: How Much Improvement Do We Need to Have a Better System?” 4th International Conference on Language Resources and Evaluation (LREC), Lisbon, pages 2051-2054.

Zollmann, Andreas; Ashish Venugopal; Franz Och and Jay Ponte (2008) "A systematic comparison of phrase-based, hierarchical and syntax-augmented statistical MT". 22nd International Conference on Computational Linguistics, pages 1145-1152.
Published
2009-07-20
AZIZ, Wilker F.; PARDO, Thiago A. S.; PARABONI, Ivandré. Statistical Phrase-based Machine Translation: Experiments with Brazilian Portuguese. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 7. , 2009, Bento Gonçalves/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2009 . p. 142-151. ISSN 2763-9061.

Most read articles by the same author(s)

1 2 > >>