Automation of Public Prosecutor's Office Document Classification in Accordance with Sustainable Development Goals Using Natural Language Processing
Abstract
This paper addresses the classification of procedures of the Public Prosecutor’s Office of the State of Espírito Santo (MPES) according to the Sustainable Development Goals (SDGs) to promote transparency and efficiency. Using Natural Language Processing (NLP) techniques and data engineering, the proposed methodology involves four stages: initial classification, preprocessing, feature extraction, and classification. Preliminary results indicate good accuracy in document classification using simple application techniques.
References
Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O'Reilly Media, Inc.
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794).
Conselho Nacional do Ministério Público. (2011). Resolução nº 74, de 19 de julho de 2011. Diário Oficial da União, Seção 1, 19 ago. 2011. Disponível em [link]
Fux, L., Santos, P. F. de O., Braga, A. C. D., Edokawa, P. S. D., & Castro, J. L. S. de. (2022). “Classificação de processos judiciais segundo Objetivos de Desenvolvimento Sustentável da Agenda ONU 2030”. Revista da CGU, 14(26), 173-189. DOI: 10.36428/revistadacgu.v14i26.548
Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11-21.
OpenAI. (2023). GPT-3.5 Turbo: Advanced Language Models for Various Applications. OpenAI. Disponível em [link]
Opitz, D., & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169-198.
Organização das Nações Unidas. (2015). Transformando Nosso Mundo: A Agenda 2030 para o Desenvolvimento Sustentável. Disponível em [link]
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). Butterworth-Heinemann.
Van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. Scotts Valley, CA: CreateSpace.
