Efficient Projection of Denial Constraints Violations
Abstract
The visualization of data quality rule violations is highly useful in data cleaning. A widely used operation for this visualization is the projection of tuple combinations that violate the rules. However, this operation is costly when considering the state-of-the-art formalisms in data cleaning, such as denial constraints. In the worst case, all combinations of tuple pairs in the table violate the rule, resulting in quadratic complexity regarding the number of records. This paper presents and experimentally evaluates various techniques for efficiently implementing projection for denial constraint violations.
References
Grefen, P. and de By, R. (1994). A multi-set extended relational algebra: a formal approach to a practical issue. In Proceedings of 1994 IEEE 10th International Conference on Data Engineering, pages 80–88.
Pena, E. H. M., de Almeida, E. C., and Naumann, F. (2022). Fast detection of denial constraint violations. Proc. VLDB Endow., 15(4):859–871.
Pena, E. H. M., Lucas Filho, E. R., de Almeida, E. C., and Naumann, F. (2020). Efficient detection of data dependency violations. In 29th ACM CIKM, page 1235–1244.
Rekatsinas, T., Chu, X., Ilyas, I. F., and Ré, C. (2017). Holoclean: Holistic data repairs with probabilistic inference. Proc. VLDB Endow., 10(11):1190–1201.
