SARA Project: Automated Resource Response System for Access to Information Requests

  • Douglas Rolins Santana Federal University of Goiás
  • Livia Mancine Coelho Campos Federal University of Goiás
  • Kairo Antônio Lopes Silva Federal University of Goiás
  • Danilo Silva Ramos Federal University of Goiás
  • Valdemar Vicente Graciano Neto Federal University of Goiás
  • Leonardo Andrade Ribeiro Federal University of Goiás

Abstract


The Controladoria Geral da União (CGU) faces challenges in managing and responding to an increasing volume of appeals related to information access requests. To address this issue, this paper presents the SARA project (Automated Response System for Appeals), a solution based on Natural Language Processing that utilizes Retrieval-Augmented Generation to identify similar appeals and requests, predict decisions, and generate automated responses to appeals. Preliminary experiments indicate that the SARA project has the potential to improve efficiency and response speed, suggesting a robust and scalable mechanism for handling appeals at the CGU.

Keywords: retrieval augmented generation, natural language processing, machine learning, recommender systems, embeddings

References

Bonifacio, L., Abonizio, H., Fadaee, M., and Nogueira, R. (2022). Inpars: Unsupervised dataset generation for information retrieval. In Proceedings of the 45th International ACM SIGIR, SIGIR ’22, page 2387–2392, New York, NY, USA.

Brandão, M., Silva, M., Oliveira, G., Hott, H., Lacerda, A., and Pappa, G. (2023). Impacto do Pré-processamento e Representação Textual na Classificação de Documentos de Licitações. In Anais do XXXVIII Simpósio Brasileiro de Bancos de Dados, pages 102–114, Porto Alegre, RS, Brasil. SBC.

Brasil (2011). Lei nº 12.527, de 18 de Novembro de 2011. Lei de Acesso à Informação.

Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P. S., Yang, Q., and Xie, X. (2024). A Survey on Evaluation of Large Language Models. ACM Trans. Intell. Syst. Technol.

Ding, Y., Fan, W., Ning, L., Wang, S., Li, H., Yin, D., Chua, T.-S., and Li, Q. (2024). A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models. arXiv preprint arXiv:2405.06211.

Eisenstein, J. (2019). Introduction to Natural Language Processing. MIT Press.

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., and Wang, H. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv pre-print arXiv:2312.10997.

Johnson, J., Douze, M., and Jégou, H. (2019). Billion-Scale Similarity Search with GPUs. IEEE Transactions on Big Data, 7(3):535–547.

Mangaravite, V., Carvalho, M., Cantelli, L., Ponce, L., Campoi, B., Nunes, G., Laender, A., and Gonçalves, M. (2022). DedupeGov: Uma Plataforma para Integração de Grandes Volumes de Dados de Pessoas Físicas e Jurídicas em Âmbito Governamental. In Anais do XXXVII SBBD, pages 90–102, Porto Alegre, RS, Brasil. SBC.

Muennighoff, N., Tazi, N., Magne, L., and Reimers, N. (2023). MTEB: Massive Text Embedding Benchmark. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2014–2037, Dubrovnik, Croatia. Association for Computational Linguistics.

Reimers, N. and Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. ACL.

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al. (2023). Llama: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971.

Tunstall, L., Beeching, E., Lambert, N., Rajani, N., Rasul, K., Belkada, Y., Huang, S., von Werra, L., Fourrier, C., Habib, N., et al. (2023). Zephyr: Direct Distillation of LM Alignment. arXiv preprint arXiv:2310.16944.
Published
2024-10-14
SANTANA, Douglas Rolins; CAMPOS, Livia Mancine Coelho; SILVA, Kairo Antônio Lopes; RAMOS, Danilo Silva; GRACIANO NETO, Valdemar Vicente; RIBEIRO, Leonardo Andrade. SARA Project: Automated Resource Response System for Access to Information Requests. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 39. , 2024, Florianópolis/SC. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 862-868. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2024.242899.