Post-processing of machine translation texts based on graph theory
Abstract
Machine translation is intrinsically associated with the study and development of computerized methodologies for idiomatic translations' production. The most common approaches are the statistical and methods based on neural networks. One of the deficiencies pointed out by these methods is the possible lack of coherence between the translated sentences. In this project, we propose using techniques based on Graph Theory to preserve the coherence in the translation of texts from English to Portuguese. The studied method presents large performance variability; however, some translations produce sentences 90\% better evaluated than the statistical translator Moses and 10\% superior to Google Translate.
References
Barzilay, R. and Lapata, M. (2008). Modeling local coherence: An entity-based approach. Computational Linguistics, 34(1):1–34.
Born, L., Mesgar, M., and Strube, M. (2017). Using a Graph-based Coherence Model in Document-Level Machine Translation. Proceedings of the Third Workshop on Discourse in Machine Translation, pages 26–35.
Chéragui, M. A. (2012). Theoretical overview of machine translation. In Proceedings of the International Conference on Web and Information Technologies, ICWIT, page 160.
Dierk, S. F. (1972). The SMART retrieval system: Experiments in automatic document processing — Gerard Salton, Ed. (Englewood Cliffs, N.J.: Prentice-Hall, 1971, 556 pp.). IEEE Transactions on Professional Communication, PC-15(1):17–17.
Evers, A. (2013). Processamento de Lı́ngua Natural e Nı́veis de Proficiência de Português: Um Estudo de Produções Textuais do Exame CELPE-BRAS. Especialização, Instituto de Letras, Universidade Federal do Rio Grande do Sul.
Graesser, A. C., McNamara, D. S., Louwerse, M. M., and Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior research methods, instruments, & computers, 36(2):193–202.
Guinaudeau, C. and Strube, M. (2013). Graph-based local coherence modeling. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, page 93–103.
Hoang, H. and Koehn, P. (2008). Design of the moses decoder for statistical machine translation. Software Engineering, Testing, and Quality Assurance for Natural Language Processing, page 58–65.
Hutchins, W. (1995). Machine Translation: A Brief History. Concise history of the language sciences: from the Sumerians to the cognitivists, pages 431–445.
Jabin, S., Samak, S., and Sokphyrum, K. (2013). Howto translate from english to khmer using moses. International Journal of Engineering Inventions, pages 71–81.
Koehn, P. (2013). Statistical Machine Translation System - User Manual and Code Guide.
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Frederico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Bojar, O., Constantin, A., and Herbst, E. (2007). Moses: Open Source Toolkit for Statistical Machine Translation. ACL ’07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pages 177–180.
Koehn, P. and Knowles, R. (2017). Six challenges for neural machine translation. Proceedings of the First Workshop on Neural Machine Translation, page 28–39.
Lavie, A. and Denkowski, M. J. (2009). The METEOR metric for automatic evaluation of machine translation. Machine Translation, 23(2-3):105–115.
Lopez, A. (2008). Statistical Machine Translation. ACM Computing Surveys, 40(3).
Menacer, M. A., Langlois, D., Mella, O., Fohr, D., Jouvet, D., and Smaı̈li, K. (2017). Is statistical machine translation approach dead? In ICNLSSP 2017–International Conference on Natural Language, Signal and Speech Processing, pages 1–5.
Mesgar, M. and Strube, M. (2015). Graph based coherence modeling for assessing readability. In Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, page 309–318.
Nunes, I. A. and Caseli, H. d. M. (2009). Primeiros Experimentos na Investigação e Avaliação da Tradução Automática Estatı́stica Inglês - Português. Anais do Workshop de Iniciação Cientı́fica em Tecnologia da Informação e da Linguagem Humana.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu : a method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL).
Parveen, D., Mesgar, M., and Strube, M. (2016). Generating coherent summaries of scientific articles using coherence patterns. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 772–783.
Pennington, J., Socher, R., and Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543.
Soares, F., Moreira, V. P., and Becker, K. (2019). A large parallel corpus of full-text scientific articles. arXiv preprint arXiv:1905.01852.
Travaglia, L. C. (1994). Contribuições do verbo à coesão e à coerências textuais.nos de Estudos Linguı́sticos, 27:71–84. Campinas, SP.
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al. (2016). Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
