Positive and Unlabeled Graph Learning with Large Language Models for Fake News Detection

Guilherme Henrique Messias; Mariana Caravanti de Souza; Yanni Marcela Gameiro; Sylvia Iasulaitis; Eloize Rossi Marques Seno; Solange Oliveira Rezende; Alan Demétrius Baria Valejo

Guilherme Henrique Messias UFSCar
Mariana Caravanti de Souza UFMS
Yanni Marcela Gameiro UFSCar
Sylvia Iasulaitis UFSCar
Eloize Rossi Marques Seno IFSP
Solange Oliveira Rezende USP
Alan Demétrius Baria Valejo UFSCar

Resumo

Several automated approaches have been proposed to mitigate the spread of fake news, however, their performance heavily depends on the quality and representation of the data used for training, with the collection of fake news being particularly costly. Fact-checking websites emerge as viable alternatives, however, their texts often do not preserve the original format of fake news, incorporating journalistic commentary, irrelevant information and editorial biases, and generally presenting a much higher text volume than typical fake news. These inconsistencies hinder algorithm learning and compromise their generalization ability. In this work, we propose a framework to enhance the quality and representation of training data through the use of graphs. First, we propose the use of the Gemma3 language model, capable of automatically extracting the core content of fake news, removing irrelevant information and editorial biases. The generated fake content can be useful for composing new datasets and developing more robust content discrimination tools. Then, we propose the Yake-Graph algorithm, which constructs graphs by linking news pieces through keywords extracted from the fake news. We conduct an empirical analysis evaluating the performance of the curated dataset in positive and unlabeled learning (PUL) and positive and unlabeled graph learning (PUGL) scenarios, comparing different graph construction methods. The results show that the combination of graph-based learning models and the Yake-Graph outperforms established approaches, highlighting its potential to improve fake news detection.