Towards Fully Automated News Reporting in Brazilian Portuguese
Resumo
We introduce robot journalists that cover two pressing topics in Brazilian society: COVID-19 spread and Legal Amazon deforestation. Our approach is able to automatically analyze structured domain data, select relevant content, generate news texts and publish them on the Web. We provide a thorough description of our system architecture, report on the results of automatic evaluation, discuss some of the advantages of robot-journalism in society, and point out further steps in our work. Corpus and code are publicly available.
Referências
Campos, J. G. M. and Cozman, F. G. (2019). A review of Natural Language Generation: A corpus in Brazilian Portuguese. In VIII WPGEC - Workshop de Pós-Graduação de Engenharia Da Computação, São Paulo, Brasil.
Castro Ferreira, T. (2018). Advances in Natural Language Generation: Generating Varied Outputs from Semantic Inputs. PhD thesis, Tilburg University. Series: TiCC Ph.D. Series Volume: 64.
Clerwall, C. (2014). Enter the Robot Journalist: Users’ perceptions of automated content. Journalism Practice, 8(5):519–531.
DalBen, S. (2019). Robots in Brazilian journalism: Three case studies. In VI Seminário de Pesquisa Em Jornalismo Investigativo - ABRAJI, São Paulo, Brasil.
Diniz, C. G. and et al. (2015). Deter-b: The new amazon near real-time deforestation detection system. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8(7):3619–3628.
Ferreira, T. C., van der Lee, C., van Miltenburg, E., and Krahmer, E. (2019). Neural data-to-text generation: A comparison between pipeline and end-to-end architectures. In EMNLP/IJCNLP.
Gatt, A. and Krahmer, E. (2018). Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation. Journal of Artificial Intelligence Research, 61:65–170.
Graefe, A. (2016). Guide to Automated Journalism. Technical report, Tow Center for Digital Journalism, Columbia University, New York.
Krahmer, E. and van Deemter, K. (2012). Computational generation of referring expressions: A survey. Comput. Linguist., 38(1):173–218.
Leppänen, L., Munezero, M., Granroth-Wilding, M., and Toivonen, H. (2017). Datdriven news generation for automated journalism. In Proceedings of INLG.
McCarthy, P. M. and Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2):381–392.
Mei, H., Bansal, M., and Walter, M. R. (2016). What to talk about and how? selective generation using LSTMs with coarse-to-fine alignment. In Proceedings of NAACL, San Diego, California.
Mille, S., Dasiopoulou, S., and Wanner, L. (2019). A portable grammar-based NLG system for verbalization of structured data. In Proceedings of the 34th ACM/SIGAPP.
Moryossef, A., Goldberg, Y., and Dagan, I. (2019). Step-by-step: Separating planning from realization in neural data-to-text generation. In Proceedings of NAACL, Minneapolis, Minnesota.
Moussallem, D., Ferreira, T., Zampieri, M., Cavalcanti, M. C., Xexéo, G., Neves, M., and Ngonga Ngomo, A.-C. (2018). RDF2PT: Generating Brazilian Portuguese texts from RDF data. In Proceedings of LREC, Miyazaki, Japan.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: A Method for Automatic Evaluation of Machine Translation. In Proceedings of ACL, Philadelphia, USA.
Popović, M. (2017). chrF++: Words helping character n-grams. In Proceedings of WMT, Copenhagen, Denmark.
Portet, F., Reiter, E., Gatt, A., Hunter, J., Sripada, S., Freer, Y., and Sykes, C. (2009). Automatic generation of textual summaries from neonatal intensive care data. Artificial Intelligence, 173(7–8):789 – 816.
Reiter, E. and Dale, R. (2000). Building Natural Language Generation Systems. Studies in Natural Language Processing. Cambridge University Press, Casmbridge, U.K.
Theune, M., Klabbers, E., De Pijper, J. R., Krahmer, E., and Odijk, J. (2001). From data to speech: a general approach. Natural Language Engineering, 7(1):47–86. van der Lee, C., Krahmer, E., and Wubben, S. (2017). PASS: A Dutch data-to-text system for soccer, targeted towards specific audiences. In Proceedings of INLG, INLG’2017, Santiago de Compostela, Spain.