Using Retrieval-Augmented Generation to improve Performance of Large Language Models on the Brazilian University Admission Exam

Leonardo de Campos Taschetto; Renato Fileto

doi:10.5753/sbbd.2024.243137

Leonardo de Campos Taschetto Universidade Federal de Santa Catarina (UFSC)
Renato Fileto Universidade Federal de Santa Catarina (UFSC)

DOI: https://doi.org/10.5753/sbbd.2024.243137

Resumo

The Brazilian University Admission Exam (ENEM) presents a unique challenge for artificial intelligence. It requires deep mastering of knowledge from diverse fields. Recently, Language Models (LMs) with growing numbers of parameters have established the state-of-the-art performance on ENEM. However, techniques like Retrieval-Augmented Generation (RAG) can help further improvements, by exploiting trustfull knowledge bases to enhance contexts and reduce non-factual responses. This study investigates how RAG can improve LMs’ performance on ENEM. The experiments reported in this article use up-to-date versions of four popular LMs, with and without RAG, on text-only and multi-modal data. The results reveal consistent gains using RAG with both kinds of data, across diverse fields, demonstrating the potential of RAG to improve LMs’ performance on tasks requiring multidisciplinary knowledge

Palavras-chave: ENEM, Language Models, Retrieval Augmented Generation

Referências

Thales Sales Almeida, Hugo Abonizio, Rodrigo Nogueira, and Ramon Pires. Sabiá-2: A new generation of portuguese large language models, 2024.

Zhang et al. Siren’s song in the ai ocean: A survey on hallucination in large language models. ArXiv, abs/2309.01219, 2023.

Gautier Izacard and Edouard Grave. Leveraging passage retrieval with generative models for open domain question answering. In Paola Merlo, Jorg Tiedemann, and Reut Tsarfaty, editors, 16th Conf. of the European Chapter of the ACL, pages 874–880, Online, April 2021. Association for Computational Linguistics (ACL).

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.

Richard Meyes, Melanie Lu, Constantin Waubert de Puiseau, and Tobias Meisen. Ablation studies in artificial neural networks. arXiv preprint arXiv:1901.08644, 2019.

Yusuke Miyao and Ai Kawazoe. University entrance examinations as a benchmark resource for NLP-based problem solving. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 1357–1365, Nagoya, Japan, October 2013. Asian Federation of Natural Language Processing.

Desnes Nunes, Ricardo Primi, Ramon Pires, Roberto Lotufo, and Rodrigo Nogueira. Evaluating gpt-3.5 and gpt-4 models on brazilian university admission exams, 2023.

Ramon Pires, Hugo Abonizio, Thales Sales Almeida, and Rodrigo Nogueira. Sabiá: Portuguese large language models. In Brazilian Conference on Intelligent Systems, pages 226–240. Springer, 2023.

Ramon Pires, Thales Sales Almeida, Hugo Abonizio, and Rodrigo Nogueira. Evaluating gpt-4’s vision capabilities on brazilian university admission exams. Semantic Scholar, abs/2311.14169, 2023.

School of Electrical Engineering Sheikholeslami, Sina. KTH and Computer Science (EECS). Ablation programming for machine learning, 2019.

Igor Cataneo Silveira and Denis Deratani Mauá. University entrance exam as a guiding test for artificial intelligence. In 2017 Brazilian Conference on Intelligent Systems (BRACIS), pages 426–431. IEEE, 2017.

Igor Cataneo Silveira and Denis Deratani Mauá. Advances in automatically solving the enem. In 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pages 43–48. IEEE, 2018.

Jason Wei et al. Chain-of-thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.