Abstract
Automatic Text Simplification (ATS) has played a significant role in the Natural Language Processing (NLP) field. ATS is a sequence-to-sequence problem aiming to create a new version of the original text removing complex and domain-specific words. It can improve communication and understanding of documents from specific domains, as well as support second language learning. This paper presents an empirical study on the use of state-of-the-art ATS methods to simplify texts in Portuguese. It is important to remark that the literature reports the challenge in analyzing Portuguese texts due to the lack of resources compared to other languages (i.e., English). More specifically, this work evaluated different Neural Machine Translation (NMT) techniques for ATS in Portuguese. The experiments showed that NMT achieved promising results in Portuguese texts, obtaining 40.89 BLEU score using multiple parallel corpora and raising the overall readability score by more than 5 points.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
versions are available at: http://altamiro.comunidades.net/biblias.
- 2.
SARI and BLEU score implementation: https://github.com/feralvam/easse.
- 3.
BLEU score scale: https://cloud.google.com/translate/automl/docs/evaluate.
References
Al-Onaizan, Y., et al.: Statistical machine translation. In: Final Report, JHU Summer Workshop, vol. 30 (1999)
Al-Thanyyan, S.S., Azmi, A.M.: Automated text simplification: a survey. ACM Comput. Surv. (CSUR) 54(2), 1–36 (2021)
Aluisio, S., Specia, L., Gasperin, C., Scarton, C.: Readability assessment for text simplification. In: Proceedings of the NAACL HLT 2010 5th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 1–9 (2010)
AluĂsio, S.M., Gasperin, C.: Fostering digital inclusion and accessibility: the PorSimples project for simplification of Portuguese texts. In: Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas, pp. 46–53. Association for Computational Linguistics (2010)
Alva-Manchego, F., Martin, L., Scarton, C., Specia, L.: EASSE: easier automatic sentence simplification evaluation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, Hong Kong, China, pp. 49–54. Association for Computational Linguistics (November 2019). https://www.aclweb.org/anthology/D19-3009
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Castilhos, S., Woloszyn, V., Barno, D., Wives, L.K.: Pylinguistics: an open source library for readability assessment of texts written in Portuguese. Revista de Sistemas de InformaĂ§Ă£o da FSMA 18, 36–42 (2016)
Chu, C., Wang, R.: A survey of domain adaptation for neural machine translation. arXiv preprint arXiv:1806.00258 (2018)
Collantes, M., Hipe, M., Sorilla, J.L., Tolentino, L., Samson, B.: Simpatico: a text simplification system for senate and house bills. In: Proceedings of the 11th National Natural Language Processing Research Symposium, pp. 26–32 (2015)
Cooper, M., Shardlow, M.: CombiNMT: an exploration into neural text simplification models. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 5588–5594 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Freitag, M., Al-Onaizan, Y.: Fast domain adaptation for neural machine translation. arXiv preprint arXiv:1612.06897 (2016)
Gao, Y., et al.: IBM MASTOR system: multilingual automatic speech-to-speech translator. Technical report, IBM Thomas J Watson Research Center Yorktown Heights, NY (2006)
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., Aluisio, S.: Portuguese word embeddings: evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025 (2017)
Honnibal, M., Montani, I., Van Landeghem, S., Boyd, A.: spaCy: industrial-strength Natural Language Processing in Python (2020). https://doi.org/10.5281/zenodo.1212303
JosĂ©, M., Finatto, B.: Acessibilidade textual e terminolĂ³gica: promovendo a traduĂ§Ă£o intralinguĂstica. Estudos LinguĂsticos (SĂ£o Paulo. 1978) 49(1), 72–96 (2020). https://doi.org/10.21165/el.v49i1.2775
Kincaid, J.P., Fishburne, R.P., Jr., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975)
Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, Vancouver, Canada, pp. 67–72. Association for Computational Linguistics (July 2017). https://www.aclweb.org/anthology/P17-4012
Krishna, K., Wieting, J., Iyyer, M.: Reformulating unsupervised style transfer as paraphrase generation. arXiv preprint arXiv:2010.05700 (2020)
Liu, B., Lane, I.: Attention-based recurrent neural network models for joint intent detection and slot filling. arXiv preprint arXiv:1609.01454 (2016)
Liu, Y., et al.: Multilingual denoising pre-training for neural machine translation. Trans. Assoc. Comput. Linguist. 8, 726–742 (2020)
Martins, T.B., Ghiraldelo, C.M., das Graças Volpe Nunes, M., de Oliveira Junior, O.N.: Readability formulas applied to textbooks in Brazilian Portuguese. Icmsc-Usp (1996)
Nisioi, S., Štajner, S., Ponzetto, S.P., Dinu, L.P.: Exploring neural text simplification models. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short papers), pp. 85–91 (2017)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Park, S.H., Kim, B., Kang, C.M., Chung, C.C., Choi, J.W.: Sequence-to-sequence prediction of vehicle trajectory via LSTM encoder-decoder architecture. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1672–1678 (2018). https://doi.org/10.1109/IVS.2018.8500658
Qiang, J.: Improving neural text simplification model with simplified corpora. arXiv preprint arXiv:1810.04428 (2018)
Rescigno, A.A., Vanmassenhove, E., Monti, J., Way, A.: A case study of natural gender phenomena in translation a comparison of Google Translate, Bing Microsoft Translator and DeepL for English to Italian, French and Spanish. In: Association for Machine Translation in the Americas (AMTA): Workshop on the Impact of Machine Translation, iMpacT 2020, p. 62. Workshop on the Impact of Machine Translation (iMpacT 2020) at Association (2020)
Sikka, P., Singh, M., Pink, A., Mago, V.: A survey on text simplification. arXiv preprint arXiv:2008.08612 (2020)
Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for Brazilian Portuguese. In: 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, 20–23 October (2020, to appear)
Specia, L.: Translating from complex to simplified sentences. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 30–39. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12320-7_5
Sulem, E., Abend, O., Rappoport, A.: Semantic structural evaluation for text simplification. arXiv preprint arXiv:1810.05022 (2018)
Sulem, E., Abend, O., Rappoport, A.: Simple and effective text simplification using semantic and neural methods. arXiv preprint arXiv:1810.05104 (2018)
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Wang, T., Chen, P., Amaral, K., Qiang, J.: An experimental study of LSTM encoder-decoder model for text simplification. arXiv preprint arXiv:1609.03663 (2016)
Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016). https://cocoxu.github.io/publications/tacl2016-smt-simplification.pdf
Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016)
Yamada, M.: The impact of Google Neural Machine Translation on post-editing by student translators. J. Specialised Transl. 31, 87–106 (2019)
Yang, Z., Hu, Z., Dyer, C., Xing, E.P., Berg-Kirkpatrick, T.: Unsupervised text style transfer using language models as discriminators. arXiv preprint arXiv:1805.11749 (2018)
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
de Lima, T.B., Nascimento, A.C.A., Valença, G., Miranda, P., Mello, R.F., Si, T. (2021). Portuguese Neural Text Simplification Using Machine Translation. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-91699-2_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91698-5
Online ISBN: 978-3-030-91699-2
eBook Packages: Computer ScienceComputer Science (R0)