Analyzing Query Execution for Integrity Constraint Violation Detection

  • Alessandro Neves dos Santos Universidade Tecnológica Federal do Paraná (UTFPR)
  • Eduardo H. M. Pena Universidade Tecnológica Federal do Paraná (UTFPR)

Resumo


Data consistency ensures the validity and integrity of data representing real-world entities. Denial constraints (DCs) generalize various integrity constraints, providing a powerful way to define rules that ensure data consistency. This work analyzes the capabilities of relational database management systems (RDBMSs) to detect DC violations in different metrics. We explore various SQL patterns for measuring DC violations and evaluate the performance of multiple RDBMSs with extensive experiments, highlighting potential performance improvements, choke points, and limitations when using them.

Palavras-chave: data quality, data cleaning, data consistency, denial constraints, query execution, query optimization

Referências

Chu, X., Ilyas, I. F., Krishnan, S., and Wang, J. (2016). Data cleaning: Overview and emerging challenges. In SIGMOD, page 2201–2206.

Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A., Ilyas, I. F., Ouzzani, M., and Tang, N. (2013). Nadeef: a commodity data cleaning system. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13, page 541–552, New York, NY, USA. Association for Computing Machinery.

Fan, W., Geerts, F., Jia, X., and Kementsietsidis, A. (2008). Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst., 33(2).

Kersten, T., Leis, V., Kemper, A., Neumann, T., Pavlo, A., and Boncz, P. (2018). Everything you always wanted to know about compiled and vectorized queries but were afraid to ask. Proc. VLDB Endow., 11(13):2209–2222.

Livshits, E., Kochirgan, R., Tsur, S., Ilyas, I. F., Kimelfeld, B., and Roy, S. (2021). Properties of inconsistency measures for databases. In Proceedings of the 2021 International Conference on Management of Data, SIGMOD ’21, page 1182–1194, New York, NY, USA. Association for Computing Machinery.

Neumann, T. and Freitag, M. J. (2020). Umbra: A disk-based system with in-memory performance. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings. [link].

Pena, E. H. M., de Almeida, E. C., and Naumann, F. (2021). Fast detection of denial constraint violations. Proc. VLDB Endow., 15(4):859–871.

Pena, E. H. M., Porto, F., and Naumann, F. (2022). Fast algorithms for denial constraint discovery. Proc. VLDB Endow., 16(4):684–696.

Raasveldt, M. and Mühleisen, H. (2019). Duckdb: an embeddable analytical database. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD ’19, page 1981–1984, New York, NY, USA. Association for Computing Machinery.

Rekatsinas, T., Chu, X., Ilyas, I. F., and Ré, C. (2017). HoloClean: Holistic data repairs with probabilistic inference. Proc. VLDB Endow., 10(11):1190–1201.
Publicado
14/10/2024
DOS SANTOS, Alessandro Neves; PENA, Eduardo H. M.. Analyzing Query Execution for Integrity Constraint Violation Detection. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 39. , 2024, Florianópolis/SC. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 666-672. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2024.242785.