SARA Project: Automated Resource Response System for Access to Information Requests
Abstract
The Controladoria Geral da União (CGU) faces challenges in managing and responding to an increasing volume of appeals related to information access requests. To address this issue, this paper presents the SARA project (Automated Response System for Appeals), a solution based on Natural Language Processing that utilizes Retrieval-Augmented Generation to identify similar appeals and requests, predict decisions, and generate automated responses to appeals. Preliminary experiments indicate that the SARA project has the potential to improve efficiency and response speed, suggesting a robust and scalable mechanism for handling appeals at the CGU.
References
Brandão, M., Silva, M., Oliveira, G., Hott, H., Lacerda, A., and Pappa, G. (2023). Impacto do Pré-processamento e Representação Textual na Classificação de Documentos de Licitações. In Anais do XXXVIII Simpósio Brasileiro de Bancos de Dados, pages 102–114, Porto Alegre, RS, Brasil. SBC.
Brasil (2011). Lei nº 12.527, de 18 de Novembro de 2011. Lei de Acesso à Informação.
Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P. S., Yang, Q., and Xie, X. (2024). A Survey on Evaluation of Large Language Models. ACM Trans. Intell. Syst. Technol.
Ding, Y., Fan, W., Ning, L., Wang, S., Li, H., Yin, D., Chua, T.-S., and Li, Q. (2024). A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models. arXiv preprint arXiv:2405.06211.
Eisenstein, J. (2019). Introduction to Natural Language Processing. MIT Press.
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., and Wang, H. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv pre-print arXiv:2312.10997.
Johnson, J., Douze, M., and Jégou, H. (2019). Billion-Scale Similarity Search with GPUs. IEEE Transactions on Big Data, 7(3):535–547.
Mangaravite, V., Carvalho, M., Cantelli, L., Ponce, L., Campoi, B., Nunes, G., Laender, A., and Gonçalves, M. (2022). DedupeGov: Uma Plataforma para Integração de Grandes Volumes de Dados de Pessoas Físicas e Jurídicas em Âmbito Governamental. In Anais do XXXVII SBBD, pages 90–102, Porto Alegre, RS, Brasil. SBC.
Muennighoff, N., Tazi, N., Magne, L., and Reimers, N. (2023). MTEB: Massive Text Embedding Benchmark. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2014–2037, Dubrovnik, Croatia. Association for Computational Linguistics.
Reimers, N. and Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. ACL.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al. (2023). Llama: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971.
Tunstall, L., Beeching, E., Lambert, N., Rajani, N., Rasul, K., Belkada, Y., Huang, S., von Werra, L., Fourrier, C., Habib, N., et al. (2023). Zephyr: Direct Distillation of LM Alignment. arXiv preprint arXiv:2310.16944.
