Towards Fully Automated News Reporting in Brazilian Portuguese

  • João Campos Universidade de São Paulo
  • André Teixeira Federal University of Minas Gerais
  • Thiago Ferreira Federal University of Minas Gerais
  • Fábio Cozman Universidade de São Paulo
  • Adriana Pagano Universidade Federal de Minas Gerais

Resumo


We introduce robot journalists that cover two pressing topics in Brazilian society: COVID-19 spread and Legal Amazon deforestation. Our approach is able to automatically analyze structured domain data, select relevant content, generate news texts and publish them on the Web. We provide a thorough description of our system architecture, report on the results of automatic evaluation, discuss some of the advantages of robot-journalism in society, and point out further steps in our work. Corpus and code are publicly available.

Palavras-chave: Robot-Journalism, Natural Language Generation, Natural Language Processing, COVID-19, Amazon Deforestation

Referências

Braun, D., Reiter, E., and Siddharthan, A. (2018). Saferdrive: An NLG-based behaviour change support system for drivers. Natural Language Engineering, 24(4):551–588.

Campos, J. G. M. and Cozman, F. G. (2019). A review of Natural Language Generation: A corpus in Brazilian Portuguese. In VIII WPGEC - Workshop de Pós-Graduação de Engenharia Da Computação, São Paulo, Brasil.

Castro Ferreira, T. (2018). Advances in Natural Language Generation: Generating Varied Outputs from Semantic Inputs. PhD thesis, Tilburg University. Series: TiCC Ph.D. Series Volume: 64.

Clerwall, C. (2014). Enter the Robot Journalist: Users’ perceptions of automated content. Journalism Practice, 8(5):519–531.

DalBen, S. (2019). Robots in Brazilian journalism: Three case studies. In VI Seminário de Pesquisa Em Jornalismo Investigativo - ABRAJI, São Paulo, Brasil.

Diniz, C. G. and et al. (2015). Deter-b: The new amazon near real-time deforestation detection system. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8(7):3619–3628.

Ferreira, T. C., van der Lee, C., van Miltenburg, E., and Krahmer, E. (2019). Neural data-to-text generation: A comparison between pipeline and end-to-end architectures. In EMNLP/IJCNLP.

Gatt, A. and Krahmer, E. (2018). Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation. Journal of Artificial Intelligence Research, 61:65–170.

Graefe, A. (2016). Guide to Automated Journalism. Technical report, Tow Center for Digital Journalism, Columbia University, New York.

Krahmer, E. and van Deemter, K. (2012). Computational generation of referring expressions: A survey. Comput. Linguist., 38(1):173–218.

Leppänen, L., Munezero, M., Granroth-Wilding, M., and Toivonen, H. (2017). Datdriven news generation for automated journalism. In Proceedings of INLG.

McCarthy, P. M. and Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2):381–392.

Mei, H., Bansal, M., and Walter, M. R. (2016). What to talk about and how? selective generation using LSTMs with coarse-to-fine alignment. In Proceedings of NAACL, San Diego, California.

Mille, S., Dasiopoulou, S., and Wanner, L. (2019). A portable grammar-based NLG system for verbalization of structured data. In Proceedings of the 34th ACM/SIGAPP.

Moryossef, A., Goldberg, Y., and Dagan, I. (2019). Step-by-step: Separating planning from realization in neural data-to-text generation. In Proceedings of NAACL, Minneapolis, Minnesota.

Moussallem, D., Ferreira, T., Zampieri, M., Cavalcanti, M. C., Xexéo, G., Neves, M., and Ngonga Ngomo, A.-C. (2018). RDF2PT: Generating Brazilian Portuguese texts from RDF data. In Proceedings of LREC, Miyazaki, Japan.

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: A Method for Automatic Evaluation of Machine Translation. In Proceedings of ACL, Philadelphia, USA.

Popović, M. (2017). chrF++: Words helping character n-grams. In Proceedings of WMT, Copenhagen, Denmark.

Portet, F., Reiter, E., Gatt, A., Hunter, J., Sripada, S., Freer, Y., and Sykes, C. (2009). Automatic generation of textual summaries from neonatal intensive care data. Artificial Intelligence, 173(7–8):789 – 816.

Reiter, E. and Dale, R. (2000). Building Natural Language Generation Systems. Studies in Natural Language Processing. Cambridge University Press, Casmbridge, U.K.

Theune, M., Klabbers, E., De Pijper, J. R., Krahmer, E., and Odijk, J. (2001). From data to speech: a general approach. Natural Language Engineering, 7(1):47–86. van der Lee, C., Krahmer, E., and Wubben, S. (2017). PASS: A Dutch data-to-text system for soccer, targeted towards specific audiences. In Proceedings of INLG, INLG’2017, Santiago de Compostela, Spain.
Publicado
20/10/2020
Como Citar

Selecione um Formato
CAMPOS, João; TEIXEIRA, André; FERREIRA, Thiago; COZMAN, Fábio; PAGANO, Adriana. Towards Fully Automated News Reporting in Brazilian Portuguese. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 17. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 543-554. DOI: https://doi.org/10.5753/eniac.2020.12158.