Enhancing Large Language Model Performance on ENEM Math Questions Using Retrieval-Augmented Generation
Resumo
In this study, we explore the use of Retrieval-Augmented Generation (RAG) to improve the performance of large language models (LLMs), such as GPT-3.5 Turbo and GPT-4o, in solving ENEM mathematics questions. Our experiments demonstrate that RAG potentially provides significant improvements in accuracy by introducing relevant contextual information. With RAG, GPT-4o consistently outperforms GPT-3.5 Turbo, underscoring the potential of this technique to enhance educational AI tools. This research illustrates the potential of RAG-enhanced LLMs to advance educational applications and encourages further exploration in this field.
Referências
Choi, J. H., Hickman, K. E., Monahan, A. B., and Schwarcz, D. (2021). Chatgpt goes to law school. J. Legal Educ., 71:387.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In North American Chapter of the Association for Computational Linguistics.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuttler, H., Lewis, ¨ M., Yih, W.-t., Rocktaschel, T., Riedel, S., and Kiela, D. (2020). Retrieval-augmented ¨ generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, N.IPS ’20, Red Hook, NY, USA. Curran Associates Inc
Mendonça, N. C. (2024). Evaluating chatgpt-4 vision on brazil’s national undergraduate computer science exam. ACM Trans. Comput. Educ. Just Accepted.
Mercan, O. B., Cavsak, S. N., Deliahmetoglu, A., and Tanberk, S. (2023). Abstractive text summarization for resumes with cutting edge nlp transformers and lstm. 2023 Innovations in Intelligent Systems and Applications Conference (ASYU), pages 1–6.
Nunes, D., Primi, R., Pires, R., Lotufo, R., and Nogueira, R. (2023). Evaluating gpt-3.5 and gpt-4 models on brazilian university admission exams. arXiv preprint arXiv:2303.17003.
Pires, R., Almeida, T. S., Abonizio, H., and Nogueira, R. (2023). Evaluating gpt-4’s vision capabilities on brazilian university admission exams. arXiv preprint arXiv:2311.14169
Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
Silva, B., Nunes, L., Estevao, R., Aski, V., and Chandra, R. (2023). Gpt-4 as an agronomist assistant? answering agriculture exams using large language models. arXiv preprint arXiv:2310.06225.