Open-source Data Quality Tools for Public Bids: A Comparative Analysis
Abstract
Data have been increasingly used as decision support in different contexts. For these decisions to be reliable, it is necessary to ensure data quality. In this context, this work presents a brief comparison of eight open-source data quality tools. We then choose one tool for analyzing an actual data warehouse formed by public bids. Finally, our analyses show that the Great Expectations tool has relevant characteristics to generate good data quality indicators, thus ensuring that public bidding data can help in the decision-making process.
Keywords:
data quality, public bids, data warehouses, big data
References
Altendeitering, M. and Tomczyk, M. (2022). A functional taxonomy of data quality tools: Insights from science and practice. In Wirtschaftsinformatik.
Ballou, D. P. and Pazer, H. L. (1985). Modeling data and process quality in multi-input, multi-output information systems. Management Science, 31(2):150-162.
Chrisman, N. R. (1983). The role of quality information in the long-term functioning of a geographic information system. In Auto-Carto, pages 303-312.
Cichy, C. and Rass, S. (2019). An overview of data quality frameworks. IEEE Access, 7:24634-24648.
Ehrlinger, L. and Wöß, W. (2018). A novel data quality metric for minimality. QUAT, 1:1-15.
Ehrlinger, L. and Wöß, W. (2022). A survey of data quality measurement and monitoring tools. Front. Big Data, 5.
Elmagarmid, A. K., Ipeirotis, P. G., and Verykios, V. S. (2007). Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng., 19(1):1-16.
Foidl, H., Felderer, M., and Ramler, R. (2022). Data smells: Categories, causes and consequences, and detection of suspicious data in ai-based systems. In arXiv.
Gao, J. Z., Xie, C., and Tao, C. (2016). Big data validation and quality assurance-issuses, challenges, and needs. In SOSE, pages 433-441. IEEE Computer Society.
Goudar, S. et al. (2015). Data quality monitoring and performance metrics of a prospective, population-based observational study of maternal and newborn health in low resource settings. Reproductive Health, 12(2):1-10.
Junior, C. S. and Dorneles, C. F. (2021). Avaliação de dimensões de qualidade de dados para o agronegócio. In SBBD, pages 283-288. SBC.
Laranjeiro, N., Soydemir, S. N., and Bernardino, J. (2015). A survey on data quality: Classifying poor data. PRDC, pages 179-188.
Lee, Y. W. et al. (2002). Aimq: a methodology for information quality assessment. Information & Management, 40(2):133-146.
Medeiros, G. F. d., Degrossi, L. C., and Holanda, M. (2020). Qualiosm: Melhorando a qualidade dos dados na ferramenta de mapeamento colaborativo openstreetmap. In SBBD, pages 77-82. SBC.
Pipino, L. L. et al. (2002). Data quality assessment. Commun. ACM, 45(4):211-218.
Pushkarev, V. et al. (2010). An overview of open source data quality tools. In IKE, pages 370-376. CSREA Press.
Scannapieco, M. and Catarci, T. (2002). Data quality under a computer science perspective. Journal of The ACM-JACM, 2:1-12.
Sessions, V. and Valtorta, M. (2006). The effects of data quality on machine learning algorithms. In ICIQ, pages 485-498. MIT.
Zöllner, F. et al. (2016). An open source software for analysis of dynamic contrast enhanced magnetic resonance images: Ummperfusion revisited. BMC Med Imaging, 16(7):1-13.
Ballou, D. P. and Pazer, H. L. (1985). Modeling data and process quality in multi-input, multi-output information systems. Management Science, 31(2):150-162.
Chrisman, N. R. (1983). The role of quality information in the long-term functioning of a geographic information system. In Auto-Carto, pages 303-312.
Cichy, C. and Rass, S. (2019). An overview of data quality frameworks. IEEE Access, 7:24634-24648.
Ehrlinger, L. and Wöß, W. (2018). A novel data quality metric for minimality. QUAT, 1:1-15.
Ehrlinger, L. and Wöß, W. (2022). A survey of data quality measurement and monitoring tools. Front. Big Data, 5.
Elmagarmid, A. K., Ipeirotis, P. G., and Verykios, V. S. (2007). Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng., 19(1):1-16.
Foidl, H., Felderer, M., and Ramler, R. (2022). Data smells: Categories, causes and consequences, and detection of suspicious data in ai-based systems. In arXiv.
Gao, J. Z., Xie, C., and Tao, C. (2016). Big data validation and quality assurance-issuses, challenges, and needs. In SOSE, pages 433-441. IEEE Computer Society.
Goudar, S. et al. (2015). Data quality monitoring and performance metrics of a prospective, population-based observational study of maternal and newborn health in low resource settings. Reproductive Health, 12(2):1-10.
Junior, C. S. and Dorneles, C. F. (2021). Avaliação de dimensões de qualidade de dados para o agronegócio. In SBBD, pages 283-288. SBC.
Laranjeiro, N., Soydemir, S. N., and Bernardino, J. (2015). A survey on data quality: Classifying poor data. PRDC, pages 179-188.
Lee, Y. W. et al. (2002). Aimq: a methodology for information quality assessment. Information & Management, 40(2):133-146.
Medeiros, G. F. d., Degrossi, L. C., and Holanda, M. (2020). Qualiosm: Melhorando a qualidade dos dados na ferramenta de mapeamento colaborativo openstreetmap. In SBBD, pages 77-82. SBC.
Pipino, L. L. et al. (2002). Data quality assessment. Commun. ACM, 45(4):211-218.
Pushkarev, V. et al. (2010). An overview of open source data quality tools. In IKE, pages 370-376. CSREA Press.
Scannapieco, M. and Catarci, T. (2002). Data quality under a computer science perspective. Journal of The ACM-JACM, 2:1-12.
Sessions, V. and Valtorta, M. (2006). The effects of data quality on machine learning algorithms. In ICIQ, pages 485-498. MIT.
Zöllner, F. et al. (2016). An open source software for analysis of dynamic contrast enhanced magnetic resonance images: Ummperfusion revisited. BMC Med Imaging, 16(7):1-13.
Published
2022-09-19
How to Cite
OLIVEIRA, Gabriel P. et al.
Open-source Data Quality Tools for Public Bids: A Comparative Analysis. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 37. , 2022, Búzios.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2022
.
p. 116-127.
ISSN 2763-8979.
DOI: https://doi.org/10.5753/sbbd.2022.224351.
