Enhancing Large Language Model Performance on ENEM Math Questions Using Retrieval-Augmented Generation

João Superbi; Heitor Pinto; Emanoel Santos; Lucas Lattari; Bianca Castro

doi:10.5753/bresci.2024.243977

João Superbi Instituto Federal do Sudeste de Minas Gerais
Heitor Pinto Instituto Federal do Sudeste de Minas Gerais
Emanoel Santos Instituto Federal do Sudeste de Minas Gerais
Lucas Lattari Instituto Federal do Sudeste de Minas Gerais
Bianca Castro Instituto Federal do Sudeste de Minas Gerais

DOI: https://doi.org/10.5753/bresci.2024.243977

Resumo

In this study, we explore the use of Retrieval-Augmented Generation (RAG) to improve the performance of large language models (LLMs), such as GPT-3.5 Turbo and GPT-4o, in solving ENEM mathematics questions. Our experiments demonstrate that RAG potentially provides significant improvements in accuracy by introducing relevant contextual information. With RAG, GPT-4o consistently outperforms GPT-3.5 Turbo, underscoring the potential of this technique to enhance educational AI tools. This research illustrates the potential of RAG-enhanced LLMs to advance educational applications and encourages further exploration in this field.

Palavras-chave: artificial intelligence, machine learning, large language models, llms, rag, gpt, chatgpt, nlp, ai educational tools

Referências

Bordt, S. and von Luxburg, U. (2023). Chatgpt participates in a computer science exam. arXiv preprint arXiv:2303.09461.

Choi, J. H., Hickman, K. E., Monahan, A. B., and Schwarcz, D. (2021). Chatgpt goes to law school. J. Legal Educ., 71:387.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In North American Chapter of the Association for Computational Linguistics.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuttler, H., Lewis, ¨ M., Yih, W.-t., Rocktaschel, T., Riedel, S., and Kiela, D. (2020). Retrieval-augmented ¨ generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, N.IPS ’20, Red Hook, NY, USA. Curran Associates Inc

Mendonça, N. C. (2024). Evaluating chatgpt-4 vision on brazil’s national undergraduate computer science exam. ACM Trans. Comput. Educ. Just Accepted.

Mercan, O. B., Cavsak, S. N., Deliahmetoglu, A., and Tanberk, S. (2023). Abstractive text summarization for resumes with cutting edge nlp transformers and lstm. 2023 Innovations in Intelligent Systems and Applications Conference (ASYU), pages 1–6.

Nunes, D., Primi, R., Pires, R., Lotufo, R., and Nogueira, R. (2023). Evaluating gpt-3.5 and gpt-4 models on brazilian university admission exams. arXiv preprint arXiv:2303.17003.

Pires, R., Almeida, T. S., Abonizio, H., and Nogueira, R. (2023). Evaluating gpt-4’s vision capabilities on brazilian university admission exams. arXiv preprint arXiv:2311.14169

Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.

Silva, B., Nunes, L., Estevao, R., Aski, V., and Chandra, R. (2023). Gpt-4 as an agronomist assistant? answering agriculture exams using large language models. arXiv preprint arXiv:2310.06225.