Um framework orientado a artigos para análise semântica automática de pesquisas sobre COVID-19

Antonio Alves; Antônio Pereira; Pablo Cecilio; Nayara Pena; Felipe Viegas; Elisa Tuler; Diego Dias; Leonardo Rocha

doi:10.5753/webmedia_estendido.2021.17616

Antonio Alves UFSJ
Antônio Pereira UFSJ
Pablo Cecilio UFSJ
Nayara Pena UFSJ
Felipe Viegas UFMG
Elisa Tuler UFSJ
Diego Dias UFSJ
Leonardo Rocha UFSJ

DOI: https://doi.org/10.5753/webmedia_estendido.2021.17616

Resumo

In this work, we propose a framework that automatically extracts semantic topics from scientific publications related to research on COVID-19. The framework has four main building blocks: (i) preprocessing, (ii) topic modeling, (iii) topic correlation with authors and institutions, and (iv) summarization interface. The first block corresponds to the application of pre-processing strategies in texts on the considered articles and the definition of their semantic representation. The topic modeling block aims to fi nd semantic topics in the publications used. The third block correlates these topics with the articles themselves and the authors, institutions, and countries related to each article. The summary interface provides an intuitive view for all these analyses. Our evaluation shows that our framework is capable of automatically extracting relevant characteristics from the articles, identifying the main themes addressed by them, as well as the correlation of researchers, institutions and countries for diff erent topics of research on COVID-19.

Palavras-chave: Word Embeddings, Topic Modeling, COVID-19

Referências

Allen Institute For AI. 2020. COVID-19 Open Research Dataset Challenge. https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

S. Fathalla, S. Vahdati, S. Auer, and C. Lange. 2018. Metadata Analysis of Scholarly Events of Computer Science, Physics, Engineering, and Mathematics. In TPDL.

Daniel D. Lee and H. Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 (1999), 788–791.

Washington Luiz, Felipe Viegas, Rafael Alencar, Fernando Mourão, Thiago Salles, Dárlinton Carvalho, Marcos Andre Gonçalves, and Leonardo Rocha. 2018. A Feature-Oriented Sentiment Rating for Mobile App Reviews. In Proceedings of the 2018 World Wide Web Conference . 1909–1918.

Smriti Mallapaty. 2020. Meet the scientists investigating the origins of the COVID pandemic. https://www.nature.com/articles/d41586-020-03402-1

David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticæ Investigationes 30, 1 (2007), 3–26. https://www.jbe-platform.com/content/journals/10.1075/li.30.1.03nad

Alper Kursat Uysal and Serkan Gunal. 2014. The impact of preprocessing on text classification. Information Processing & Management 50, 1 (2014), 04-112. http://www.sciencedirect.com/science/article/pii/S0306457313000964

Felipe Viegas, Sérgio Canuto, Christian Gomes, Washington Luiz, Thierson Rosa, Sabir Ribas, Leonardo Rocha, and Marcos André Gonçalves. 2019. CluWords: exploiting semantic word clustering representation for enhanced topic modeling. In Proceedings of the Twelfth ACM WSDM. 753.