skip to main content
10.1145/3592813.3592908acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbsiConference Proceedingsconference-collections
research-article

Machine Learning Applied to Open Government Data for the Detection of Improprieties in the Application of Public Resources

Published:26 June 2023Publication History

ABSTRACT

Context: Making government data publicly available is an important mechanism of transparency and social control. In this regard, inumerous laws have made it mandatory to divulgate government procurement data. Problem: The large volume of unstructured textual information available on government portals is an obstacle to effective social control. Making it difficult to do more in-depth analyzes of public spending. Solution: Use of Machine Learning algorithms to perform text mining and grouping items acquired by public administration. Labeling public purchases and grouping similar items, in order to facilitate the detection of improprieties in government purchases. IS Theory: This work is associated with the Theory of Computational Learning, which aims to understand the fundamental principles of learning and design better-automated methods. Method: The article is a case study, and its evaluation was executed with the support of specialists in the field. The results were analyzed based on a quantitative approach. Summary of Results: The results observed in the evaluated cases were promising, the resulting clusters from the application of the solution had sufficiently coherent semantic values, in order to allow more complex analyzes of government purchases. Contributions and Impact in the IS area: The results show that applying text mining and machine learning techniques can extract useful information from government purchases data and allowing to perform better analyzes of public spending.

References

  1. Gustavo Almeida, Kate Revoredo, Claudia Cappelli, and Cristiano Maciel. 2018. Improvement of transparency through mining techniques for reclassification of texts: the case of brazilian transparency portal. In Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Age. 1–9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. João Alberto Amaral and Jairson Barbosa Rodrigues. 2020. Alocação de Tópicos Latentes — Um Modelo para Segmentação de Dados de Auditoria do Governo de PE.Revista de Engenharia e Pesquisa Aplicada 5, 1 (2020), 40–49.Google ScholarGoogle Scholar
  3. Remis Balaniuk. 2010. A Mineração de Dados como apoio ao Controle Externo. Revista do TCU117 (2010), 79–86.Google ScholarGoogle Scholar
  4. Benjamin Bengfort, Rebecca Bilbro, and Tony Ojeda. 2018. Applied text analysis with Python: Enabling language-aware data products with machine learning. " O’Reilly Media, Inc.".Google ScholarGoogle Scholar
  5. Brasil. 1964. Lei nº 4.320, DE 17 DE MARÇO DE 1964. Estatui Normas Gerais de Direito Financeiro para elaboração e controle dos orçamentos e balanços da União, dos Estados, dos Municípios e do Distrito Federal. (1964). http://www.planalto.gov.br/ccivil_03/leis/l4320.htmGoogle ScholarGoogle Scholar
  6. Brasil. 1988. Constituição da República Federativa do Brasil. Senado Federal: Centro Gráfico (1988). http://www.planalto.gov.br/ccivil_03/constituicao/constituicao.htmGoogle ScholarGoogle Scholar
  7. Brasil. 1993. Lei nº 8.666, DE 21 DE JUNHO DE 1993. Regulamenta o art. 37, inciso XXI, da Constituição Federal, institui normas para licitações e contratos da Administração Pública e dá outras providências. Diário Oficial [da] República Federativa do Brasil (1993). http://www.planalto.gov.br/ccivil_03/_ato2011-2014/2011/lei/l12527.htmGoogle ScholarGoogle Scholar
  8. Brasil. 2002. Lei nº 10.520, DE 17 DE JULHO DE 2002. Institui, no âmbito da União, Estados, Distrito Federal e Municípios, nos termos do art. 37, inciso XXI, da Constituição Federal, modalidade de licitação denominada pregão, para aquisição de bens e serviços comuns, e dá outras providências. (2002). http://www.planalto.gov.br/ccivil_03/Leis/2002/L10520.htmGoogle ScholarGoogle Scholar
  9. Brasil. 2009. Lei complementar nº 131, DE 27 DE MAIO DE 2009. Acrescenta dispositivos à Lei Complementar nº 101, de 4 de maio de 2000, que estabelece normas de finanças públicas voltadas para a responsabilidade na gestão fiscal e dá outras providências. Diário Oficial [da] República Federativa do Brasil (2009). http://www.planalto.gov.br/ccivil_03/leis/lcp/lcp131.htmGoogle ScholarGoogle Scholar
  10. Brasil. 2011. Lei nº 12.527, de 18 de novembro de 2011. Lei de Acesso à Informação. Diário Oficial [da] República Federativa do Brasil (2011). http://www.planalto.gov.br/ccivil_03/_ato2011-2014/2011/lei/l12527.htmGoogle ScholarGoogle Scholar
  11. Brasil. 2021. Lei nº 14.133, DE 1º DE ABRIL DE 2021. Lei de Licitações e Contratos Administrativos. (2021). http://www.planalto.gov.br/ccivil_03/_ato2019-2022/2021/lei/L14133.htmGoogle ScholarGoogle Scholar
  12. Rommel Carvalho, Eduardo de Paiva, Henrique da Rocha, and Gilson Mendes. 2014. Using Clustering and Text Mining to Create a Reference Price Database. Learning and NonLinear Models 12, 2014 (2014), 38–52.Google ScholarGoogle ScholarCross RefCross Ref
  13. Rommel Novaes Carvalho. 2015. Categoria Profissionais 2° Lugar: Uso de mineração de dados e textos para cálculo de preços de referência em compras do governo brasileiro. (2015).Google ScholarGoogle Scholar
  14. Tatiana Escovedo and Adriano Koshiyama. 2020. Introdução a Data Science: Algoritmos de Machine Learning e métodos de análise. Casa do Código.Google ScholarGoogle Scholar
  15. Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, 1996. A density-based algorithm for discovering clusters in large spatial databases with noise.. In kdd, Vol. 96. 226–231.Google ScholarGoogle Scholar
  16. Ronen Feldman and James Sanger. 2006. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press. https://doi.org/10.1017/CBO9780511546914Google ScholarGoogle ScholarCross RefCross Ref
  17. Raphael Silva Fontes. 2022. Avaliação experimental de um classificador para apoiar a detecção de fraudes em compras públicas. (2022).Google ScholarGoogle Scholar
  18. Igual Laura and Seguí Santi. 2017. Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications.Google ScholarGoogle Scholar
  19. Eduardo Soares de Paiva. 2017. Geração de regras de identificação de produtos em descrições textuais de compras apresentadas em portais de transparência pública. Master’s thesis.Google ScholarGoogle Scholar
  20. Dipanjan Sarkar. 2019. Text analytics with Python: a practitioner’s guide to natural language processing. Springer.Google ScholarGoogle Scholar
  21. Colin Shearer. 2000. The CRISP-DM model: the new blueprint for data mining. Journal of data warehousing 5, 4 (2000), 13–22.Google ScholarGoogle Scholar
  22. Sholom M Weiss, Nitin Indurkhya, and Tong Zhang. 2015. Fundamentals of predictive text mining. Springer.Google ScholarGoogle Scholar

Index Terms

  1. Machine Learning Applied to Open Government Data for the Detection of Improprieties in the Application of Public Resources

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SBSI '23: Proceedings of the XIX Brazilian Symposium on Information Systems
      May 2023
      490 pages

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 June 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate181of557submissions,32%
    • Article Metrics

      • Downloads (Last 12 months)54
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format