Verificação de Fatos com Transformers: Um Estudo com DistilBERT no Benchmark FEVER

Beneilton Martins Leite; Fael Faray de Paiva; Anselmo Cardoso de Paiva

doi:10.5753/ercemapi.2025.17617

Beneilton Martins Leite UFMA
Fael Faray de Paiva UFMA
Anselmo Cardoso de Paiva UFMA

DOI: https://doi.org/10.5753/ercemapi.2025.17617

Resumo

A verificação automatizada de fatos é uma tarefa essencial na era digital para combater a desinformação. Este trabalho investiga a classificação da veracidade de afirmações no dataset FEVER, adotando um modelo supervisionado baseado em transformers. O modelo alcança 91,98% de acurácia, demonstrando que é possível obter alto desempenho com arquiteturas leves. Os resultados evidenciam que modelos compactos, como o DistilBERT, podem alcançar desempenho comparável a modelos maiores, reforçando o potencial de soluções eficientes para verificação de fatos em larga escala.

Palavras-chave: Ciência de Dados, Inteligência Artificial, redes Transformer

Referências

Casillas, R., Gómez-Adorno, H., Lomas-Barrie, V., and Ramos-Flores, O. (2022). Automatic fact checking using an interpretable bert-based architecture on covid-19 claims. Applied Sciences, 12(20):10644.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.

Guo, Z., Schlichtkrull, M., and Vlachos, A. (2022). A survey on automated fact-checking. Transactions of the association for computational linguistics, 10:178–206.

Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., and Xu, C. (2020). Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1580–1589.

Hanselowski, A., Zhang, H., Li, Z., Sorokin, D., Schiller, B., Schulz, C., and Gurevych, I. (2018). UKP-athene: Multi-sentence textual entailment for claim verification. In Thorne, J., Vlachos, A., Cocarascu, O., Christodoulopoulos, C., and Mittal, A., editors, Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pages 103–108, Brussels, Belgium. Association for Computational Linguistics.

Hu, X., Ru, D., Qiu, L., Guo, Q., Zhang, T., Xu, Y., Luo, Y., Liu, P., Zhang, Y., and Zhang, Z. (2024). Refchecker: Reference-based fine-grained hallucination checker and benchmark for large language models. arXiv preprint arXiv:2405.14486.

Jolicoeur-Martineau, A. (2025). Less is more: Recursive reasoning with tiny networks. arXiv preprint arXiv:2510.04871.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

Liu, Z., Hao, Z., Han, K., Tang, Y., and Wang, Y. (2024). Ghostnetv3: Exploring the training strategies for compact models. arXiv preprint arXiv:2404.11202.

Nie, Y., Chen, H., and Bansal, M. (2019). Combining fact extraction and verification with neural semantic matching networks. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’19/IAAI’19/EAAI’19. AAAI Press.

Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

Soleimani, A., Monz, C., and Worring, M. (2020). Bert for evidence retrieval and claim verification. In Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II, page 359–366, Berlin, Heidelberg. Springer-Verlag.

Thorne, J., Vlachos, A., Christodoulopoulos, C., and Mittal, A. (2018). FEVER: a large-scale dataset for fact extraction and VERification. In Walker, M., Ji, H., and Stent, A., editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819, New Orleans, Louisiana. Association for Computational Linguistics.