A Heterogeneous Network-Based Positive and Unlabeled Learning Approach to Detect Fake News
Resumo
The dynamism of fake news evolution and dissemination plays a crucial role in influencing and confirming personal beliefs. To minimize the spread of disinformation approaches proposed in the literature, automatic fake news detection generally learns models through binary supervised algorithms considering textual and contextual information. However, labeling significant amounts of real news to build accurate classifiers is difficult and time-consuming due to their broad spectrum. Positive and unlabeled learning (PUL) can be a good alternative in this scenario. PUL algorithms learn models considering little labeled data of the interest class and use unlabeled data to increase classification performance. This paper proposes a heterogeneous network variant of the PU-LP algorithm, a PUL algorithm based on similarity networks. Our network incorporates different linguistic features to characterize fake news, such as representative terms, emotiveness, pausality, and average sentence size. Also, we considered two representations of the news to compute similarity: term frequency-inverse document frequency, and Doc2Vec, which creates fixed-sized document representations regardless of its length. We evaluated our approach in six datasets written in Portuguese or English, comparing its performance with a binary semi-supervised baseline algorithm, using two well-established label propagation algorithms: LPHN and GNetMine. The results indicate that PU-LP with heterogeneous networks can be competitive to binary semi-supervised learning. Also, linguistic features such as representative terms and pausality improved the classification performance, especially when there is a small amount of labeled news.
Palavras-chave:
Fake news, One-class learning, Positive and unlabeled learning, Transdutive semi-supervised learning, Graph-based learning
Publicado
29/11/2021
Como Citar
SOUZA, Mariana C. de; NOGUEIRA, Bruno M.; ROSSI, Rafael G.; MARCACINI, Ricardo M.; REZENDE, Solange O..
A Heterogeneous Network-Based Positive and Unlabeled Learning Approach to Detect Fake News. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 10. , 2021, Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2021
.
ISSN 2643-6264.