Applications of Artificial Intelligence for Auditing and Classification of Incongruent Descriptions in Public Procurement


Context: Despite the advancement of technology, many services and information systems, especially in the public sector, still use unstructured natural language descriptions of products, services, or events, making their classification and analysis difficult. For efficient audits, it is necessary to classify and automatically totalize invoices issued for the purchase of products, considering their unique identification codes. Problem: The codes are not always registered correctly by the suppliers. Furthermore, if the product description is considered an alternative to the code, as aforementioned, this is not a uniform field, having free and variable writing. Solution: This work aimed to identify and characterize the approaches, techniques and intelligent algorithms used to classify incongruous textual descriptions present in the invoices issued. IS theory: General systems theory; Competitive strategy (Porter); Knowledge-based theory of the firm. Method: A systematic mapping was conducted to find the primary studies in the literature and collect evidence for directing future research. Summary of Results: 225 articles were identified, with Scopus and Web of Science being the bases with the most articles. Only 15 articles passed the inclusion and exclusion criteria. Among the approaches used, supervised machine learning stands out, present in 60% of the works. The most widely used techniques were Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), present in 40% of the articles. Contributions and Impacts in the IS area: The research showed that the use of artificial intelligence techniques helped to mitigate the problem of classification and analysis of invoices with incongruous codes and descriptions, which can help in the audit process, investigation, and fight against corruption. Finally, trends and gaps to be explored were also presented.
Palavras-chave: Invoices, investigation, audit, incongruous textual descriptions, artificial intelligence


Gustavo Almeida, Kate Revoredo, Claudia Cappelli, and Cristiano Maciel. 2018. Improvement of Transparency through Mining Techniques for Reclassification of Texts: The Case of Brazilian Transparency Portal. In Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Age (Delft, The Netherlands) (dg.o ’18). Association for Computing Machinery, New York, NY, USA, Article 31, 9 pages.

Fatma Altaheri and Khaled Shaalan. 2020. Exploring Machine Learning Models to Predict Harmonized System Code. 291–303.

Rodrigo Batista, Daniela Bagatini, and Rejane Frozza. 2018. Classificação Automática de Códigos NCM Utilizando o Algoritmo Naïve Bayes. iSys - Brazilian Journal of Information Systems 13 (06 2018), 4–29.

Shan Bergin and Paul Wraight. 2006. Silver based wound dressings and topical agents for treating diabetic foot ulcers (Review). Cochrane database of systematic reviews (Online) 2006 (01 2006), CD005082.

Capes. 2021. Portal de periódicos CAPES/MEC [Journal Portal CAPES/MEC]. Disponível em: Acesso em: 01 dezembro 2021.

W. A. Chapetta. 2006. Uma Infra-estrutura para Planejamento, Execução e Empacotamento de Estudos Experimentais em Engenharia de Software. Ph.D. Dissertation. Programa de Engenharia de Sistemas e Computação, COPPE/UFRJ, Universidade Federal do Rio de Janeiro. Rio de Janeiro, RJ, Brasil. 

Hao Chen, Ben van Rijnsoever, Marcel Molenhuis, Dennis van Dijk, Yao-hua Tan, and Boriana Rukanova. 2021. The use of machine learning to identify the correctness of HS Code for the customs import declarations. In 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA). 1–8.

Xi Chen, Stefano Bromuri, and Marko van Eekelen. 2021. Neural Machine Translation for Harmonized System Codes Prediction. In 2021 6th International Conference on Machine Learning Technologies (Jeju Island, Republic of Korea) (ICMLT 2021). Association for Computing Machinery, New York, NY, USA, 158–163.

Marco Aurelio O. S. Correa and Adriano Galindo Leal. 2018. Identification of Overpricing in the Purchase of Medication by the Federal Government of Brazil, Using Text Mining and Clustering Based on Ontology. In Proceedings of the 2018 2nd International Conference on Cloud and Big Data Computing(Barcelona, Spain) (ICCBDC’18). Association for Computing Machinery, New York, NY, USA, 66–70.

Liya Ding, ZhenZhen Fan, and DongLiang Chen. 2015. Auto-Categorization of HS Code Using Background Net Approach. Procedia Computer Science 60 (12 2015), 1462–1471.

Shaohua Du, Zhihao Wu, Huaiyu Wan, and YouFang Lin. 2021. HScodeNet: Combining Hierarchical Sequential and Global Spatial Information of Text for Commodity HS Code Classification. 676–689.

Mingshu He, Xiaojuan Wang, Chundong Zou, Bingying Dai, and Lei Jin. 2021. A Commodity Classification Framework Based on Machine Learning for Analysis of Trade Declaration. Symmetry 13 (05 2021), 964.

Brett Josephson, Ju-Yeon Lee, Babu John Mariadoss, and Jean Johnson. 2019. Uncle Sam Rising: Performance Implications of Business-to-Government Relationships. Journal of Marketing 83 (01 2019), 51–72.

Barbara Kitchenham. 2004. Procedures for Performing Systematic Reviews. Keele, UK, Keele Univ. 33 (08 2004). 

Cláudia Lyra do Nascimento and Hertha Urquiza Baracho. 2017. CORRUPÇÃO E IMPROBIDADES NAS CONTRATAÇOES PÚBLICAS QUE PREJUDICAM O DESENVOLVIMENTO SUSTENTÁVEL. Direito e Desenvolvimento 6, 12 (jun. 2017), 39 – 61.

Fábio Nunes, Methanias Júnior, José Junior, Luan Costa, and Everton Recchi. 2019. Galactus-Um ambiente inteligente para apoio à tomada de decisão no âmbito do Ministério Público de Sergipe. 153–156.

ONU. 2021. Conferência no Egito discute combate internacional à corrupção. Disponível em: [link]. Acesso em: 27 janeiro 2022.

Eduardo Paiva and Kate Revoredo. 2016. Big Data e Transparência: Utilizando Funções de Mapreduce para incrementar a transparência dos Gastos Públicos. 025–032.

I Gede Yudi Paramartha, Igi Ardiyanto, and Risanuri Hidayat. 2021. Developing Machine Learning Framework to Classify Harmonized System Code. Case Study: Indonesian Customs. In 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT). 254–259.

Kai Petersen, Sairam Vakkalanka, and Ludwik Kuzniarz. 2015. Guidelines for conducting systematic mapping studies in software engineering: An update. Information and Software Technology 64 (08 2015).

Leonardo Ribeiro, Wladmir Brandão, Ígor Marques, Patrícia Andrade, Roberto Júnior, Flávio Oliveira, and Regina Kelles. 2018. Reconhecimento de entidades nomeadas em itens de produto da nota fiscal eletrônica. 36 (01 2018), 116–126. 

Cristina Mamédio da Costa Santos, Cibele Andrucioli de Mattos Pimenta, and Moacyr Roberto Cuce Nobre. 2007. A estratégia PICO para a construção da pergunta de pesquisa e busca de evidências. Revista Latino-Americana de Enfermagem 15, 3 (jun. 2007), 508–511.

Scopus. 2021. Scopus - Elsevier Database. Disponível em: Acesso em: 01 dezembro 2021.

Margarita Spichakova and Hele-mai Haav. 2020. Using Machine Learning for Automated Assessment of Misclassification of Goods for Fraud Detection. 144–158.

Daniel Reich William A. Muir. 2021. Using Machine Learning to Improve Public Reporting on U.S. Government Contracts. INFORMS Journal on Applied Analytics 51, 6 (2021), 463–479.

Jianyang Yu, Yuanyuan Qiao, Nanfei Shu, Kewu Sun, Shenshen Zhou, and Jie Yang. 2019. Neural Network Based Transaction Classification System for Chinese Transaction Behavior Analysis. In 2019 IEEE International Congress on Big Data (BigDataCongress). 64–71.

Yongzeng Yue, Yuhong Zhang, Hu Xuegang, and Peipei Li. 2020. Extremely Short Chinese Text Classification Method Based on Bidirectional Semantic Extension. Journal of Physics: Conference Series 1437 (01 2020), 012026.
Como Citar

Selecione um Formato
GOMES, Wesckley Faria; COLAÇO, Methanias. Applications of Artificial Intelligence for Auditing and Classification of Incongruent Descriptions in Public Procurement. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 18. , 2022, Curitiba. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 .