Modelos Quantizados para Question Answering em Português Brasileiro: Um Estudo Experimental com Retrieval-Augmented Generation
Resumo
Este trabalho investiga o desempenho do modelo Sabiá-7B na tarefa de resposta à perguntas em língua portuguesa, aplicando quantização, ajuste fino e geração de recuperação aumentada. Resultados baseados nas métricas ROUGE indicam que, apesar de seu potencial, o ajuste fino com essa técnica reduziu o desempenho, destacando dificuldades na integração eficaz dos contextos recuperados, particularmente em modelos quantizadosReferências
Ali Mohamed Nabil Allam and Mohamed Hassan Haggag. The question answering systems: A survey. International Journal of Research and Reviews in Information Sciences (IJRRIS), 2(3), 2012.
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems, 36, 2024.
Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2019.
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL [link].
Liangming Pan, Wenqiang Lei, Tat-Seng Chua, and Min-Yen Kan. Recent advances in neural question generation. arXiv preprint arXiv:1905.08949, 2019.
Ramon Pires, Hugo Abonizio, Thales Sales Almeida, and Rodrigo Nogueira. Sabiá: Portuguese large language models. In Murilo C. Naldi and Reinaldo A. C. Bianchi, editors, Intelligent Systems, pages 226–240, Cham, 2023. Springer Nature Switzerland. ISBN 978-3-031-45392-2.
Paulo Pirozelli, Marcos M José, Igor Silveira, Flávio Nakasato, Sarajane M Peres, Anarosa AF Brandão, Anna HR Costa, and Fabio G Cozman. Benchmarks for pirá 2.0, a reading comprehension dataset about the ocean, the brazilian coast, and climate change. Data Intelligence, 6(1):29–63, 2024.
Hugo Touvron, Louis Martin, Kevin Stone, et al. Llama 2: Open foundation and fine-tuned chat models, 2023. URL [link].
Mert Yazan, Suzan Verberne, and Frederik Situmeang. The impact of quantization on retrieval-augmented generation: An analysis of small llms. 2024. URL [link].
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems, 36, 2024.
Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2019.
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL [link].
Liangming Pan, Wenqiang Lei, Tat-Seng Chua, and Min-Yen Kan. Recent advances in neural question generation. arXiv preprint arXiv:1905.08949, 2019.
Ramon Pires, Hugo Abonizio, Thales Sales Almeida, and Rodrigo Nogueira. Sabiá: Portuguese large language models. In Murilo C. Naldi and Reinaldo A. C. Bianchi, editors, Intelligent Systems, pages 226–240, Cham, 2023. Springer Nature Switzerland. ISBN 978-3-031-45392-2.
Paulo Pirozelli, Marcos M José, Igor Silveira, Flávio Nakasato, Sarajane M Peres, Anarosa AF Brandão, Anna HR Costa, and Fabio G Cozman. Benchmarks for pirá 2.0, a reading comprehension dataset about the ocean, the brazilian coast, and climate change. Data Intelligence, 6(1):29–63, 2024.
Hugo Touvron, Louis Martin, Kevin Stone, et al. Llama 2: Open foundation and fine-tuned chat models, 2023. URL [link].
Mert Yazan, Suzan Verberne, and Frederik Situmeang. The impact of quantization on retrieval-augmented generation: An analysis of small llms. 2024. URL [link].
Publicado
12/11/2025
Como Citar
JUNQUEIRA, Júlia da Rocha; FREITAS, Larissa A. de; CORRÊA, Ulisses Brisolara; MOREIRA, Viviane.
Modelos Quantizados para Question Answering em Português Brasileiro: Um Estudo Experimental com Retrieval-Augmented Generation. In: ESCOLA REGIONAL DE APRENDIZADO DE MÁQUINA E INTELIGÊNCIA ARTIFICIAL DA REGIÃO SUL (ERAMIA-RS), 1. , 2025, Porto Alegre/RS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 172-175.
DOI: https://doi.org/10.5753/eramiars.2025.16776.