An Approach for Schema Extraction of NoSQL Columnar Databases: the HBase Case Study

Authors

  • Angelo Augusto Frozza Instituto Federal Catarinense - Universidade Federal de Santa Catarina
  • Eduardo Dias Defreyn Universidade Federal de Santa Catarina
  • Ronaldo dos Santos Mello Universidade Federal de Santa Catarina https://orcid.org/0000-0003-4262-5474

DOI:

https://doi.org/10.5753/jidm.2021.1966

Keywords:

Columnar, HBase, JSON Schema, NoSQL, Schema extraction

Abstract

Although NoSQL databases do not require a schema a priori, being aware of the database schema is essential for activities like data integration, data validation, or data interoperability. This paper presents a process for the extraction of columnar NoSQL database schemas. We adopt JSON as a canonical format for data representation, and we validate the proposed process through a prototype tool that is able to extract schemas from the HBase columnar NoSQL database system. HBase was chosen as a case study because it is one of the most popular columnar NoSQL solutions. When compared to related work, we innovate by proposing a simple solution for the inference of column data types for columnar NoSQL databases that store only byte arrays as column values, and a resulting schema that follows the JSON Schema format.

Downloads

Download data is not yet available.

References

Atzeni, P., Bugiotti, F., and Rossi, L. Uniform access to NoSQL systems. Information Systems vol. 43, pp. 117–133, 07, 2014.

Elmasri, R. and Navathe, S. B. Fundamentals of Database Systems. Pearson, Boston, 2016.

Frozza, A. A., Defreyn, E. D., and Mello, R. d. S. A Process for Inference of Columnar NoSQL Database Schemas. In Anais Principais do Simpósio Brasileiro de Banco de Dados (SBBD). SBC, Porto Alegre, pp. 175–180, 2020.

Frozza, A. A., Jacinto, S. R., and Mello, R. d. S. An Approach for Schema Extraction of NoSQL Graph Databases. In Proceedings - 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science, IRI 2020. IEEE, Las Vegas, NV (USA), 2020.

Frozza, A. A., Mello, R. d. S., and da Costa, F. d. S. An Approach for Schema Extraction of JSON and Extended JSON Document Collections. In XIX Int. Conf. on Information Reuse and Integration. IEEE, Salt Lake City, Utah (USA), pp. 356–363, 2018.

Frozza, A. A., Schreiner, G. A., Machado, B. R. L., and Mello, R. d. S. REx - NoSQL Redis Schema Extraction Module. In Anais da Escola Regional de Banco de Dados (ERBD). Sociedade Brasileira de Computacao - SB, Chapecó (SC), pp. 81–90, 2019.

Han, J., Haihong, E., Le, G., and Du, J. Survey on NoSQL database. In VI International Conference on Pervasive Computing and Applications. IEEE, Port Elizabeth, South Africa, pp. 363–366, 2011.

Hewitt, E. Cassandra: The Definitive Guide. O’Reilly Media, Sebastopol, CA, 2010.

Kiran, V. K. and Vijayakumar, R. Ontology-based data integration of NoSQL datastores. In IX Int. Conf. on Industrial and Information Systems. IEEE, Gwalior, India, pp. 1–6, 2014.

Lee, C.-H. and Zheng, Y.-L. SQL-To-NoSQL Schema Denormalization and Migration: A Study on Content Management Systems. In Proceedings - 2015 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2015. IEEE, Hong Kong, pp. 2022–2026, 2015.

Ruiz, D. S., Morales, S. F., and Molina, J. G. Inferring Versioned Schemas from NoSQL Databases and its Applications. LNCS vol. 9381, pp. 467–480, 2015.

Sadalage, P. J. and Fowler, M. NoSQL Distilled : A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley, New Jersey, USA, 2013.

Schreiner, G., Duarte, D., and Dos Santos Mello, R. SQLtoKeyNoSQL: A layer for relational to key-based NoSQL database mapping. In 17th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2015 - Proceedings. ACM, Brussels, Belgium, 2015.

Shriparv, S. Learning HBase. Packt Publishing, Birmigham, UK, 2010.

Tudorica, B. G. and Bucur, C. A. A Comparison between Several NoSQL Databases with Comments and Notes. In Proc. RoEduNet IEEE Intern. Conference. IEEE, Iasi, Romania, 2011.

Zhao, G., Lin, Q., Li, L., and Li, Z. Schema conversion model of SQL database to NoSQL. In Proc. 9th Intern. Conference 3PGCIC. IEEE, Washington, DC (USA), pp. 355–362, 2014.

Downloads

Published

2021-11-19

How to Cite

Frozza, A. A., Dias Defreyn, E., & dos Santos Mello, R. (2021). An Approach for Schema Extraction of NoSQL Columnar Databases: the HBase Case Study. Journal of Information and Data Management, 12(5). https://doi.org/10.5753/jidm.2021.1966

Issue

Section

SBBD 2020 Short papers - Extended Papers