Managing semantic evolution in databases: from theory to implementation
Resumo
Semantic heterogeneity in long-term datasets emerges as categories, groupings, and units change over time, requiring users to manually adapt queries and results. The master’s thesis proposes a formal framework to address this issue through two strategies: query rewriting and data preprocessing. It introduces storage models and algorithms that manage semantic evolution via discrete, time-stamped operations (translation, merging, and splitting), enabling queries to be written as if the data were homogeneous. A prototype, MellowDB, was evaluated using Brazilian mortality data (1979–2021). Results show both approaches are production-ready, with data preprocessing generally outperforming query rewriting except in highly write-heavy scenarios.
Referências
Brahmia, Z., Grandi, F., and Oliboni, B. (2024). A literature review on schema evolution in databases. Computing Open, 2:2430001:1–54.
Brazilian Health Ministry. Tabnet public health information. [link]. Accessed: 2025-01-23.
Chillón, A. H., Klettke, M., Ruiz, D. S., and Molina, J. G. (2024). A generic schema evolution approach for NoSQL and relational databases. IEEE Transactions on Knowledge and Data Engineering, 36(7):2774–2789.
Curino, C., Moon, H. J., Deutsch, A., and Zaniolo, C. (2013). Automating the database schema evolution process. The VLDB Journal, 22(1):73–98.
Golfarelli, M., Lechtenbörger, J., Rizzi, S., and Vossen, G. (2006). Schema versioning in data warehouses: Enabling cross-version querying via schema augmentation. Data & Knowledge Engineering, 59(2):435–459.
Hakimpour, F. and Geppert, A. (2005). Resolution of semantic heterogeneity in database schema integration using formal ontologies. Information Technology and Management, 6:97–122.
Herrmann, K., Voigt, H., Behrend, A., Rausch, J., and Lehner, W. (2017). Living in parallel realities: Co-existing schema versions with a bidirectional database evolution language. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD/PODS, pages 1101–1116.
Instituto Brasileiro de Geografia e Estatística - IBGE. Alterações topomínicas - 2022. Jagadish, H. V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J. M., Ramakrishnan, R., and Shahabi, C. (2014). Big data and its technical challenges. Commun. ACM, 57(7):86–94.
Klettke, M., Storl, U., Shenavai, M., and Scherzinger, S. (2016). NoSQL schema evolution and big data migration at scale. In 2016 IEEE International Conference on Big Data, Big Data, pages 2764–2774.
Li, X., Madnick, S. E., and Zhu, H. (2013). A context-based approach to reconciling data interpretation conflicts in web services composition. ACM Transactions on Internet Technology (TOIT), 13(1):1–27.
Mergen, S. L. S. and Heuser, C. A. (2006). Data translation between taxonomies. In International Conference on Advanced Information Systems Engineering, pages 111–124. Springer.
Moller, M. L., Klettke, M., Hillenbrand, A., and Störl, U. (2019). Query rewriting for continuously evolving NoSQL databases. In International Conference on Conceptual Modeling, ER, pages 213–221.
MongoDB, I. (2025). Mongodb. MongoDB, Inc. (2025). db.collection.stats(). [link]. Accessed: 2025-04-27.
Moon, H. J., Curino, C. A., Deutsch, A., Hou, C.-Y., and Zaniolo, C. (2008). Managing and querying transaction-time databases under schema evolution. Proceedings of the VLDB Endowment, 1(1):882–895.
Nepomuceno, P. I. S. and Braghetto, K. R. (2023). Managing semantic evolutions in semi-structured data. In International Conference on Database and Expert Systems Applications, pages 179–185. Springer.
Nepomuceno, P. I. S. and Braghetto, K. R. (2026). Managing semantic evolution in databases: From theory to implementation. Future Generation Computer Systems, 177:108257.
Roddick, J. F. (1995). A survey of schema versioning issues for database systems. Information and Software Technology, 37(7):383–393.
Secretaria de Estado de Saúde de Minas Gerais (2014). Mortalidade cid-10 – lista de tabulação cid-br. [link]. Accessed: July 24, 2025.
Störl, U., Klettke, M., and Scherzinger, S. (2020). NoSQL schema evolution and data migration: State-of-the-art and opportunities. In EDBT, volume 20, pages 655–658.
Ventrone, V. (1991). Semantic heterogeneity as a result of domain evolution. ACM SIGMOD Record, 20(4):16–20.
World Health Organization (2019). International Classification of Diseases, 11th Revision (ICD-11). World Health Organization, Geneva.
