Using Data Instances and Workload for Integrity Constraint Mining

  • Eduardo Henrique Monteiro Pena Federal University of Technology – Paraná (UTFPR) / Federal University of Paraná (UFPR)
  • Eduardo Cunha de Almeida Federal University of Paraná (UFPR)

Abstract


Functional dependencies (FDs) are integrity constraints widely studied in the context of data profiling. In this work, we explore the automatic discovery of FDs and describe a method for selecting relevant ones regarding workload semantics. The experimental evaluation shows that the selected dependencies exhibit expressive properties compared to the search space, which demonstrates the effectiveness of the presented approach.
Keywords: Functional Dependencies, Workload Semantics

References

Abedjan, Z., Golab, L., and Naumann, F. (2015). Profiling relational data: A survey. The VLDB Journal, 24(4):557–581.

Beeri, C., Dowd, M., Fagin, R., and Statman, R. (1984). On the structure of armstrong relations for functional dependencies. J. ACM, 31(1):30–46.

Bohannon, P., Fan, W., Flaster, M., and Rastogi, R. (2005). A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD, 2005, pages 143–154.

Chaudhuri, S., Ganesan, P., and Narasayya, V. (2003). Primitives for workload summarization and implications for sql. In VLDB.

Chung, Y., Krishnan, S., and Kraska, T. (2017). A data quality metric (DQM): how to estimate the number of undetected errors in data sets. PVLDB, 10(10):1094–1105.

Frey, B. J. and Dueck, D. (2007). Clustering by passing messages between data points. Science, 315:972–976.

Kimura, H., Huo, G., Rasin, A., Madden, S., and Zdonik, S. B. (2009). Correlation maps: A compressed access method for exploiting soft functional dependencies. PVLDB, 2(1):1222–1233.

Liu, J., Li, J., Liu, C., and Chen, Y. (2012). Discover dependencies from data - a review. IEEE Trans. on Knowl. and Data Eng., 24(2):251–264.

Mazuran, M., Quintarelli, E., Tanca, L., and Ugolini, S. (2016). Semi-automatic support for evolving functional dependencies. In EDBT, 2016.

Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J.-P., Schönberg, M., Zwiener, J., and Naumann, F. (2015). Functional dependency discovery: An experimental evaluation of seven algorithms. PVLDB, 8(10):1082–1093.

Papenbrock, T. and Naumann, F. (2016). A hybrid approach to functional dependency discovery. In SIGMOD, 2016, pages 821–833.

Szlichta, J., Godfrey, P., Gryz, J., and Zuzarte, C. (2013). Expressiveness and complexity of order dependencies. PVLDB, 6:1858–1869.
Published
2017-10-02
PENA, Eduardo Henrique Monteiro; ALMEIDA, Eduardo Cunha de. Using Data Instances and Workload for Integrity Constraint Mining. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 32. , 2017, Uberlândia/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2017 . p. 312-317. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2017.174079.