Classification of Irregularity Communications in Public Ombudsmen Using Supervised Learning Algorithms

  • Fábio Cordeiro UFPI / TCE Piauí
  • Ricardo de Andrade Lira Rabelo UFPI
  • Raimundo Santos Moura UFPI

Abstract


The objective of this work is to evaluate Supervised Learning algorithms in the task of classifying irregularities in Public Ombudsman Offices of Courts of Auditors. We intend to contribute effectively to improving the analysis of these communications, enabling a faster response to the citizen. Due to the imbalance of the original releases, we apply data resizing techniques before training the models. Classical ML algorithms (Naive Bayes, Decision Tree, Random Forest, K Nearest Neighbor, and Support Vector Machine) were compared with the Deep Learning Bidirectional Encoder Representations from Transformers (BERT) model and variations of text representation with Word Embeddings. The best results were obtained by the BERT model with the resampling dataset, reaching 96% in the F1-Score metric.

References

Andrade, P. H. M. A. d. (2015). Aplicação de técnicas de mineração de textos para classificação de documentos: um estudo da automatização da triagem de denúncias na cgu. Dissertação de mestrado, Universidade de Brasília.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Gusmão, C., Figueiredo, K., and Brito, W. A. (2021). Técnicas de processamento de linguagem natural em denúncias criminais: Automatização e classificação de texto em português coloquial. In Anais do XLVIII Seminário Integrado de Software e Hardware, pages 172-182. SBC.

Lee, E., Lee, C., and Ahn, S. (2022). Comparative study of multiclass text classification in research proposals using pretrained language models. Applied Sciences, 12(9):4522.

Ouvidoria Geral da União (2018). Manual de Ouvidoria Pública. [link].

Palma, I., Ladeira, M., and Reis, A. C. (2021). Machine learning predictive model for the passive transparency at the brazilian ministry of mines and energy. pages 76-81. Association for Computing Machinery.

Rocha, A. C. P. (2019). Mineração de textos para classificação de processo judiciais trabalhistas. Dissertação de mestrado, Universidade de Brasília.

Souza, R. C. (2021). Uma comparação entre métodos e classificadores em documentos jurídicos de atividades processuais repetitivas na PGDF. Dissertação de mestrado, Universidade de Brasília.

Tang, X., Mou, H., Liu, J., and Du, X. (2021). Research on automatic labeling of imbalanced texts of customer complaints based on text enhancement and layer-by-layer semantic matching. Scientific Reports, 11(1):1-11.

Vajjala, S., Majumder, B., Gupta, A., and Surana, H. (2020). Practical natural language processing: A comprehensive guide to building real-world NLP systems. O'Reilly.
Published
2022-11-28
CORDEIRO, Fábio; RABELO, Ricardo de Andrade Lira; MOURA, Raimundo Santos. Classification of Irregularity Communications in Public Ombudsmen Using Supervised Learning Algorithms. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 19. , 2022, Campinas/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 704-715. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2022.227178.

Most read articles by the same author(s)

1 2 > >>