Portuguese Neural Text Simplification Using Machine Translation

de Lima, Tiago B.; Nascimento, André C. A.; Valença, George; Miranda, Pericles; Mello, Rafael Ferreira; Si, Tapas

doi:10.1007/978-3-030-91699-2_37

Tiago B. de Lima ORCID: orcid.org/0000-0002-0707-522X¹⁰,
André C. A. Nascimento¹⁰,
George Valença¹⁰,
Pericles Miranda¹⁰,
Rafael Ferreira Mello¹⁰ &
…
Tapas Si¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13074))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

1008 Accesses
1 Citations

Abstract

Automatic Text Simplification (ATS) has played a significant role in the Natural Language Processing (NLP) field. ATS is a sequence-to-sequence problem aiming to create a new version of the original text removing complex and domain-specific words. It can improve communication and understanding of documents from specific domains, as well as support second language learning. This paper presents an empirical study on the use of state-of-the-art ATS methods to simplify texts in Portuguese. It is important to remark that the literature reports the challenge in analyzing Portuguese texts due to the lack of resources compared to other languages (i.e., English). More specifically, this work evaluated different Neural Machine Translation (NMT) techniques for ATS in Portuguese. The experiments showed that NMT achieved promising results in Portuguese texts, obtaining 40.89 BLEU score using multiple parallel corpora and raising the overall readability score by more than 5 points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
versions are available at: http://altamiro.comunidades.net/biblias.
2.
SARI and BLEU score implementation: https://github.com/feralvam/easse.
3.
BLEU score scale: https://cloud.google.com/translate/automl/docs/evaluate.

References

Al-Onaizan, Y., et al.: Statistical machine translation. In: Final Report, JHU Summer Workshop, vol. 30 (1999)
Google Scholar
Al-Thanyyan, S.S., Azmi, A.M.: Automated text simplification: a survey. ACM Comput. Surv. (CSUR) 54(2), 1–36 (2021)
Article Google Scholar
Aluisio, S., Specia, L., Gasperin, C., Scarton, C.: Readability assessment for text simplification. In: Proceedings of the NAACL HLT 2010 5th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 1–9 (2010)
Google Scholar
Aluísio, S.M., Gasperin, C.: Fostering digital inclusion and accessibility: the PorSimples project for simplification of Portuguese texts. In: Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas, pp. 46–53. Association for Computational Linguistics (2010)
Google Scholar
Alva-Manchego, F., Martin, L., Scarton, C., Specia, L.: EASSE: easier automatic sentence simplification evaluation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, Hong Kong, China, pp. 49–54. Association for Computational Linguistics (November 2019). https://www.aclweb.org/anthology/D19-3009
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Castilhos, S., Woloszyn, V., Barno, D., Wives, L.K.: Pylinguistics: an open source library for readability assessment of texts written in Portuguese. Revista de Sistemas de Informação da FSMA 18, 36–42 (2016)
Google Scholar
Chu, C., Wang, R.: A survey of domain adaptation for neural machine translation. arXiv preprint arXiv:1806.00258 (2018)
Collantes, M., Hipe, M., Sorilla, J.L., Tolentino, L., Samson, B.: Simpatico: a text simplification system for senate and house bills. In: Proceedings of the 11th National Natural Language Processing Research Symposium, pp. 26–32 (2015)
Google Scholar
Cooper, M., Shardlow, M.: CombiNMT: an exploration into neural text simplification models. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 5588–5594 (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Freitag, M., Al-Onaizan, Y.: Fast domain adaptation for neural machine translation. arXiv preprint arXiv:1612.06897 (2016)
Gao, Y., et al.: IBM MASTOR system: multilingual automatic speech-to-speech translator. Technical report, IBM Thomas J Watson Research Center Yorktown Heights, NY (2006)
Google Scholar
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., Aluisio, S.: Portuguese word embeddings: evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025 (2017)
Honnibal, M., Montani, I., Van Landeghem, S., Boyd, A.: spaCy: industrial-strength Natural Language Processing in Python (2020). https://doi.org/10.5281/zenodo.1212303
José, M., Finatto, B.: Acessibilidade textual e terminológica: promovendo a tradução intralinguística. Estudos Linguísticos (São Paulo. 1978) 49(1), 72–96 (2020). https://doi.org/10.21165/el.v49i1.2775
Kincaid, J.P., Fishburne, R.P., Jr., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975)
Google Scholar
Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, Vancouver, Canada, pp. 67–72. Association for Computational Linguistics (July 2017). https://www.aclweb.org/anthology/P17-4012
Krishna, K., Wieting, J., Iyyer, M.: Reformulating unsupervised style transfer as paraphrase generation. arXiv preprint arXiv:2010.05700 (2020)
Liu, B., Lane, I.: Attention-based recurrent neural network models for joint intent detection and slot filling. arXiv preprint arXiv:1609.01454 (2016)
Liu, Y., et al.: Multilingual denoising pre-training for neural machine translation. Trans. Assoc. Comput. Linguist. 8, 726–742 (2020)
Article Google Scholar
Martins, T.B., Ghiraldelo, C.M., das Graças Volpe Nunes, M., de Oliveira Junior, O.N.: Readability formulas applied to textbooks in Brazilian Portuguese. Icmsc-Usp (1996)
Google Scholar
Nisioi, S., Štajner, S., Ponzetto, S.P., Dinu, L.P.: Exploring neural text simplification models. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short papers), pp. 85–91 (2017)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Park, S.H., Kim, B., Kang, C.M., Chung, C.C., Choi, J.W.: Sequence-to-sequence prediction of vehicle trajectory via LSTM encoder-decoder architecture. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1672–1678 (2018). https://doi.org/10.1109/IVS.2018.8500658
Qiang, J.: Improving neural text simplification model with simplified corpora. arXiv preprint arXiv:1810.04428 (2018)
Rescigno, A.A., Vanmassenhove, E., Monti, J., Way, A.: A case study of natural gender phenomena in translation a comparison of Google Translate, Bing Microsoft Translator and DeepL for English to Italian, French and Spanish. In: Association for Machine Translation in the Americas (AMTA): Workshop on the Impact of Machine Translation, iMpacT 2020, p. 62. Workshop on the Impact of Machine Translation (iMpacT 2020) at Association (2020)
Google Scholar
Sikka, P., Singh, M., Pink, A., Mago, V.: A survey on text simplification. arXiv preprint arXiv:2008.08612 (2020)
Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for Brazilian Portuguese. In: 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, 20–23 October (2020, to appear)
Google Scholar
Specia, L.: Translating from complex to simplified sentences. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 30–39. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12320-7_5
Chapter Google Scholar
Sulem, E., Abend, O., Rappoport, A.: Semantic structural evaluation for text simplification. arXiv preprint arXiv:1810.05022 (2018)
Sulem, E., Abend, O., Rappoport, A.: Simple and effective text simplification using semantic and neural methods. arXiv preprint arXiv:1810.05104 (2018)
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Wang, T., Chen, P., Amaral, K., Qiang, J.: An experimental study of LSTM encoder-decoder model for text simplification. arXiv preprint arXiv:1609.03663 (2016)
Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016). https://cocoxu.github.io/publications/tacl2016-smt-simplification.pdf
Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016)
Article Google Scholar
Yamada, M.: The impact of Google Neural Machine Translation on post-editing by student translators. J. Specialised Transl. 31, 87–106 (2019)
Google Scholar
Yang, Z., Hu, Z., Dyer, C., Xing, E.P., Berg-Kirkpatrick, T.: Unsupervised text style transfer using language models as discriminators. arXiv preprint arXiv:1805.11749 (2018)
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)

Download references

Author information

Authors and Affiliations

Universidade Federal Rural de Pernambuco, Rua Dom Manuel de Medeiros, Recife, Pernambuco, 52171-900, Brazil
Tiago B. de Lima, André C. A. Nascimento, George Valença, Pericles Miranda & Rafael Ferreira Mello
Bankura Unnayani Institute of Engineering, Subhankar Nagar, Bankura, Pohabagan, 722146, West Bengal, India
Tapas Si

Authors

Tiago B. de Lima
View author publications
You can also search for this author in PubMed Google Scholar
André C. A. Nascimento
View author publications
You can also search for this author in PubMed Google Scholar
George Valença
View author publications
You can also search for this author in PubMed Google Scholar
Pericles Miranda
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Ferreira Mello
View author publications
You can also search for this author in PubMed Google Scholar
Tapas Si
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Tiago B. de Lima , André C. A. Nascimento or Rafael Ferreira Mello .

Editor information

Editors and Affiliations

Universidade Federal de Sergipe, São Cristóvão, Brazil
André Britto
Universidade de São Paulo, São Paulo, Brazil
Karina Valdivia Delgado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Lima, T.B., Nascimento, A.C.A., Valença, G., Miranda, P., Mello, R.F., Si, T. (2021). Portuguese Neural Text Simplification Using Machine Translation. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-91699-2_37
Published: 28 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91698-5
Online ISBN: 978-3-030-91699-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics