ICPSet: A Structured Dataset of Public Procurement Items

  • Gabriel P. Oliveira Federal University of Minas Gerais (UFMG) http://orcid.org/0000-0002-7210-6408
  • Mariana O. Silva Federal University of Minas Gerais (UFMG)
  • Lucas G. L. Costa Federal University of Minas Gerais (UFMG)
  • Marco Túlio Dutra Federal University of Minas Gerais (UFMG) / Federal University of Ouro Preto (UFOP) https://orcid.org/0009-0000-3865-5799
  • Gisele L. Pappa Federal University of Minas Gerais

Abstract


Transparency and efficiency in public procurement management are essential to ensure the proper use of public resources. However, the complexity and diversity of procured items pose a significant challenge for analyzing and monitoring these purchases. This paper presents the ICPSet, a structured dataset designed to facilitate the analysis of public procurement data. Containing over 30 million standardized and structured items, the ICPSet provides a robust basis for various analyses and tool development.
Keywords: dados governamentais, compras públicas, governo eletrônico

References

Brum, P. P. V. et al. (2024). Unsupervised grouping of public procurement similar items: Which text representation should I use? In LREC-COLING, pages 17176–17185. ELRA and ICCL.

Costa, L. G. L. et al. (2024). Quanto Custa: Banco de Preços de Compras Públicas do Estado de Minas Gerais. In DS-CoPS. SBC.

da Mata, W. R. R. et al. (2019). JusBD: Um banco de dados para obtenção de informações do poder judiciário. In DSW, pages 398–407. SBC.

Davis, P. (2022). Indicadores e dados municipais: Um banco de dados para avaliar a eficiência das despesas públicas. In DSW, pages 79–90. SBC.

Ghani, R., Probst, K., Liu, Y., Krema, M., and Fano, A. E. (2006). Text mining for product attribute extraction. SIGKDD Explorations, 8(1):41–48.

Lucena, L. F. et al. (2022). Automatic recognition of units of measurement in product descriptions from tax invoices using neural networks. In PROPOR, volume 13208, pages 156–165. Springer.

Oliveira, G. P. et al. (2022). Detecting inconsistencies in public bids: An automated and data-based approach. In WebMedia, pages 182–190. ACM.

Oliveira, G. P. et al. (2023). Assessing data quality inconsistencies in brazilian governmental data. Journal of Information and Data Management (JIDM), 14(1).

Silva, F. et al. (2021). Named entity recognition for brazilian portuguese product titles. In BRACIS, volume 13074, pages 526–541. Springer.

Silva, M. O. et al. (2022). LiPSet: Um conjunto de dados com documentos rotulados de licitações públicas. In DSW, pages 13–24. SBC.

Silva, M. O. et al. (2023). Análise de sobrepreço em itens de licitações públicas. In WCGE, pages 118–129. SBC.

Silva, M. O. et al. (2024). Overpricing analysis in brazilian public bidding items. Journal on Interactive Systems (JIS), 15(1):130–142.

Silva Junior, D. et al. (2022). Criação de conjuntos de dados textuais jurídicos em português a partir de processo de extração e heurística. In DSW, pages 91–100. SBC.

Sousa, A. W. and Del Fabro, M. D. (2019). Iudicium textum dataset uma base de textos jurıdicos para NLP. In DSW, pages 1–11. SBC.

Yang, L. et al. (2022). MAVE: A product dataset for multi-source attribute value extraction. In WSDM, pages 1256–1265. ACM.
Published
2024-10-14
OLIVEIRA, Gabriel P.; SILVA, Mariana O.; COSTA, Lucas G. L.; DUTRA, Marco Túlio; PAPPA, Gisele L.. ICPSet: A Structured Dataset of Public Procurement Items. In: DATASET SHOWCASE WORKSHOP (DSW), 6. , 2024, Florianópolis/SC. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 103-113. DOI: https://doi.org/10.5753/dsw.2024.243826.