Automatic Classification of Public Expenses in the Fight against COVID-19: A Case Study of TCE/PI
ResumoContext: Social control is an act of exercising citizenship, which contributes to the strengthening of popular sovereignty. The Audit Courts analyze hiring data from official diaries. For there to be transparency in public administration and social control can be exercised, this data needs to be classified and presented in a friendly way. Problem: The large number of published contracts makes it difficult to process these data, and makes manual classification of the objects of these contracts almost impossible, resulting in damage to social control and consequently to the effectiveness of the public service. Solution: This paper explores automatic classification models for public procurement objects aimed at dealing with the COVID-19 pandemic, using classical and deep Machine Learning approaches. The models were trained with a set of data extracted from the manual classification of hiring published in the official diaries. IS Theory: The work was conceived following the Information Processing Theory, in particular, on the concept that compares information processing with the human learning model. Method: The research has a predictive character, and its evaluation was carried out through proof of concept. The analysis of the results was performed using a quantitative approach. Summary of Results: The obtained results achieved an accuracy of 96% using the BERTimbau model, which is a pre-trained BERT model for the Portuguese language. Additionally, the model that used deep learning outperformed the model with document embeddings by 5% and by more than 10% the models using the classical approaches. Contributions and Impact in the IS Area: The main contribution of the article is to make possible a model for automatic classification of public expenses to increase transparency and improve monitoring by Courts of Auditors and society in general.
Antonio Moreira da Silva Filho. 2021. Utilização de Aprendizagem Profunda para Classificação de Acórdãos no Âmbito do Controle Externo. Dissertação de Mestrado. CESAR School.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/arXiv.1810.04805 (2019).
Cláudia Lyra do Nascimento and Hertha Urquiza Baracho. 2017. CORRUPÇÃO E IMPROBIDADES NAS CONTRATAÇOES PÚBLICAS QUE PREJUDICAM O DESENVOLVIMENTO SUSTENTÁVEL. Direito e Desenvolvimento 6, 12 (jun. 2017), https://doi.org/10.26843/direitoedesenvolvimento.v6i12.288 (2017), 39–61.
Katti Faceli, Ana Carolina Lorena, João Gama, Tiago Agostinho de Almeida, and André Carlos Ponce de Leon Ferreira de Carvalho. 2021. Inteligência Artificial - Uma Abordagem de Aprendizado de Máquina (2 ed.). LTC - Livros Técnicos e Científicos Ltda., Rio de Janeiro, RJ, Brasil.
Raphael Silva Fontes. 2022. Avaliação experimental de um classificador para apoiar a detecção de fraudes em compras públicas. Dissertação de Mestrado. Universidade Federal de Sergipe.
Wesckley Faria Gomes and Methanias Colaço. 2022. Applications of Artificial Intelligence for Auditing and Classification of Incongruent Descriptions in Public Procurement. SBSI: XVIII Brazilian Symposium on Information Systems, https://doi.org/10.1145/3535511.3535551 (2022).
Mingshu He, Xiaojuan Wang, Chundong Zou, Bingying Dai, and Lei Jin. 2021. A Commodity Classification Framework Based on Machine Learning for Analysis of Trade Declaration. https://doi.org/10.3390/sym13060964 (2021).
Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, and Sivanesan Sangeetha. 2021. AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing. arXiv:2108.05542 [cs.CL] (2021).
Quoc V. Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. arXiv:1405.4053 [cs.CL] (2014).
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs.CL] (2013).
William Muir and Daniel Reich. 2021. Using Machine Learning to Improve Public Reporting on U.S. Government Contracts. https://doi.org/10.1287/inte.2021.1098 (2021).
Sebastian Raschka and Vahid Mirjalili. 2019. Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2 (3 ed.). Packt Publishing, Birmingham, UK.
Stuart Russell and Peter Norvig. 2020. Artificial Intelligence: A Modern Approach (4 ed.). Pearson Education, Inc., Upper Saddle River, NJ, USA.
Fábio Souza, Rodrigo Nogueira, and Roberto Lotufo. 2020. BERTimbau: Pretrained BERT Models for Brazilian Portuguese. BRACIS 2020 12319 (2020).
Margarita Spichakova and Hele-Mai Haav. 2020. Using Machine Learning for Automated Assessment of Misclassification of Goods for Fraud Detection. Databases and Information Systems. DB-IS 2020. Communications in Computer and Information Science 1243 (2020).
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA (2017).