Positive and Unlabeled Graph Learning with Large Language Models for Fake News Detection
Resumo
Several automated approaches have been proposed to mitigate the spread of fake news, however, their performance heavily depends on the quality and representation of the data used for training, with the collection of fake news being particularly costly. Fact-checking websites emerge as viable alternatives, however, their texts often do not preserve the original format of fake news, incorporating journalistic commentary, irrelevant information and editorial biases, and generally presenting a much higher text volume than typical fake news. These inconsistencies hinder algorithm learning and compromise their generalization ability. In this work, we propose a framework to enhance the quality and representation of training data through the use of graphs. First, we propose the use of the Gemma3 language model, capable of automatically extracting the core content of fake news, removing irrelevant information and editorial biases. The generated fake content can be useful for composing new datasets and developing more robust content discrimination tools. Then, we propose the Yake-Graph algorithm, which constructs graphs by linking news pieces through keywords extracted from the fake news. We conduct an empirical analysis evaluating the performance of the curated dataset in positive and unlabeled learning (PUL) and positive and unlabeled graph learning (PUGL) scenarios, comparing different graph construction methods. The results show that the combination of graph-based learning models and the Yake-Graph outperforms established approaches, highlighting its potential to improve fake news detection.
Publicado
29/09/2025
Como Citar
MESSIAS, Guilherme Henrique; SOUZA, Mariana Caravanti de; GAMEIRO, Yanni Marcela; IASULAITIS, Sylvia; SENO, Eloize Rossi Marques; REZENDE, Solange Oliveira; VALEJO, Alan Demétrius Baria.
Positive and Unlabeled Graph Learning with Large Language Models for Fake News Detection. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 35. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 331-346.
ISSN 2643-6264.
