FakeNewsSetGen - a Process to Build Datasets that Support Comparison Among Fake News Detection Methods

  • Flávio Roberto Matias da Silva IME
  • Paulo Márcio Souza Freire IME
  • Marcelo Pereira de Souza IME
  • Gustavo de A. B. Plenamente IME
  • Ronaldo Ribeiro Goldschmidt IME


Due to easy access and low cost, social media online news consumption has increased significantly for the last decade. Despite their benefits, some social media allow anyone to post news with intense spreading power, which amplifies an old problem: the dissemination of Fake News. In the face of this scenario, several machine learning-based methods to automatically detect Fake News (MLFN) have been proposed. All of them require datasets to train and evaluate their detection models. Although recent MLFN were designed to consider data regarding the news propagation on social media, most of the few available datasets do not contain this kind of data. Hence, comparing the performances amid those recent MLFN and the others is restricted to a very limited number of datasets. Moreover, all existing datasets with propagation data do not contain news in Portuguese, which impairs the evaluation of the MLFN in this language. Thus, this work proposes FakeNewsSetGen, a process that builds Fake News datasets that contain news propagation data and support comparison amid the state-of-the-art MLFN. FakeNewsSetGen's software engineering process was guided to include all kind of data required by the existing MLFN. In order to illustrate FakeNewsSetGen's viability and adequacy, a case study was carried out. It encompassed the implementation of a FakeNewsSetGen prototype and the application of this prototype to create a dataset called FakeNewsSet, with news in Portuguese. Five MLFN with different kind of data requirements (two of them demanding news propagation data) were applied to FakeNewsSet and compared, demonstrating the potential use of both the proposed process and the created dataset.
Palavras-chave: Dataset building process, Fake News detection, social media
Como Citar

Selecione um Formato
SILVA, Flávio Roberto Matias da; FREIRE, Paulo Márcio Souza; SOUZA, Marcelo Pereira de; PLENAMENTE, Gustavo de A. B.; GOLDSCHMIDT, Ronaldo Ribeiro. FakeNewsSetGen - a Process to Build Datasets that Support Comparison Among Fake News Detection Methods. In: SIMPÓSIO BRASILEIRO DE SISTEMAS MULTIMÍDIA E WEB (WEBMEDIA), 1. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 188-195.