Automation of Public Prosecutor's Office Document Classification in Accordance with Sustainable Development Goals Using Natural Language Processing

  • Pedro P. Berger Ministry of Public Affairs of the State of Espírito Santo (MPES)
  • Matheus W. Souza Ministry of Public Affairs of the State of Espírito Santo (MPES)
  • Heitor Quartezani Ministry of Public Affairs of the State of Espírito Santo (MPES)
  • Guilherme Merisio Ministry of Public Affairs of the State of Espírito Santo (MPES)
  • Iara A. Fazolo Ministry of Public Affairs of the State of Espírito Santo (MPES)
  • Luciana G. F. Andrade Ministry of Public Affairs of the State of Espírito Santo (MPES)
  • Sandro T. Silva Ministry of Public Affairs of the State of Espírito Santo (MPES)

Abstract


This paper addresses the classification of procedures of the Public Prosecutor’s Office of the State of Espírito Santo (MPES) according to the Sustainable Development Goals (SDGs) to promote transparency and efficiency. Using Natural Language Processing (NLP) techniques and data engineering, the proposed methodology involves four stages: initial classification, preprocessing, feature extraction, and classification. Preliminary results indicate good accuracy in document classification using simple application techniques.

Keywords: Document Classification, NLP

References

Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for Hyper-Parameter Optimization. In Proceedings of the 24th International Conference on Neural Information Processing Systems (pp. 2546-2554).

Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O'Reilly Media, Inc.

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794).

Conselho Nacional do Ministério Público. (2011). Resolução nº 74, de 19 de julho de 2011. Diário Oficial da União, Seção 1, 19 ago. 2011. Disponível em [link]

Fux, L., Santos, P. F. de O., Braga, A. C. D., Edokawa, P. S. D., & Castro, J. L. S. de. (2022). “Classificação de processos judiciais segundo Objetivos de Desenvolvimento Sustentável da Agenda ONU 2030”. Revista da CGU, 14(26), 173-189. DOI: 10.36428/revistadacgu.v14i26.548

Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11-21.

OpenAI. (2023). GPT-3.5 Turbo: Advanced Language Models for Various Applications. OpenAI. Disponível em [link]

Opitz, D., & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169-198.

Organização das Nações Unidas. (2015). Transformando Nosso Mundo: A Agenda 2030 para o Desenvolvimento Sustentável. Disponível em [link]

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.

Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). Butterworth-Heinemann.

Van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. Scotts Valley, CA: CreateSpace.
Published
2024-10-14
BERGER, Pedro P.; SOUZA, Matheus W.; QUARTEZANI, Heitor; MERISIO, Guilherme; FAZOLO, Iara A.; ANDRADE, Luciana G. F.; SILVA, Sandro T.. Automation of Public Prosecutor's Office Document Classification in Accordance with Sustainable Development Goals Using Natural Language Processing. In: WORKSHOP ON DATA SCIENCE AGAINST CORRUPTION IN THE PUBLIC SECTOR (DS-COPS) - BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 39. , 2024, Florianópolis/SC. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 302-307. DOI: https://doi.org/10.5753/sbbd_estendido.2024.244017.