Efficient Projection of Denial Constraints Violations

  • Leonardo F. Luciano Federal University of Paraná
  • Wendel C. Moro Federal University of Paraná
  • Eduardo C. de Almeida Federal University of Paraná https://orcid.org/0000-0002-6644-956X
  • Eduardo H. M. Pena Federal University of Technology – Paraná

Abstract


The visualization of data quality rule violations is highly useful in data cleaning. A widely used operation for this visualization is the projection of tuple combinations that violate the rules. However, this operation is costly when considering the state-of-the-art formalisms in data cleaning, such as denial constraints. In the worst case, all combinations of tuple pairs in the table violate the rule, resulting in quadratic complexity regarding the number of records. This paper presents and experimentally evaluates various techniques for efficiently implementing projection for denial constraint violations.

Keywords: Data Cleaning, Data Rules, Projection

References

Chu, X., Ilyas, I. F., and Papotti, P. (2013). Discovering denial constraints. Proc. VLDB Endow., 6(13):1498–1509.

Grefen, P. and de By, R. (1994). A multi-set extended relational algebra: a formal approach to a practical issue. In Proceedings of 1994 IEEE 10th International Conference on Data Engineering, pages 80–88.

Pena, E. H. M., de Almeida, E. C., and Naumann, F. (2022). Fast detection of denial constraint violations. Proc. VLDB Endow., 15(4):859–871.

Pena, E. H. M., Lucas Filho, E. R., de Almeida, E. C., and Naumann, F. (2020). Efficient detection of data dependency violations. In 29th ACM CIKM, page 1235–1244.

Rekatsinas, T., Chu, X., Ilyas, I. F., and Ré, C. (2017). Holoclean: Holistic data repairs with probabilistic inference. Proc. VLDB Endow., 10(11):1190–1201.
Published
2023-09-25
LUCIANO, Leonardo F.; MORO, Wendel C.; C. DE ALMEIDA, Eduardo; PENA, Eduardo H. M.. Efficient Projection of Denial Constraints Violations. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 38. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 390-395. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2023.233388.