A Process for Inference of Columnar NoSQL Database Schemas


Although NoSQL Databases do not require a schema a priori, to be aware of the database schema is essential for activities like data integration, data validation or data interoperability. This paper presents a process for inference of columnar NoSQL DB schemas. We validate the proposed process through a prototype tool that is able to extract schemas from the HBase columnar NoSQL database system. HBase was chosen as a case study because it is one of the most popular columnar NoSQL solutions. When compared to related work, we novel by proposing a simple solution for the inference of column data types for columnar NoSQL databases that store only byte arrays as column values, as well as a generated schema that follows the JSON Schema format.

Palavras-chave: NoSQL, Schema inference, Columnar, Database, JSON Schema


Frozza, A. A., Mello, R. d. S., and da Costa, F. d. S. (2018). An Approach for Schema Extraction of JSON and Extended JSON Document Collections. In XIX IEEE Int. Conf. on Information Reuse and Integration, pages 356–363.

Han, J., Haihong, E., Le, G., and Du, J. (2011). Survey on NoSQL database. In VI International Conference on Pervasive Computing and Applications, pages 363–366.

Hewitt, E. (2010). Cassandra: The Definitive Guide. O’Reilly Media.

Kiran, V. K. and Vijayakumar, R. (2014). Ontology-based data integration of NoSQL datastores. In IX Int. Conf. on Industrial and Information Sysvems, pages 1–6.

Ruiz, D. S., Morales, S. F., and Molina, J. G. (2015). Inferring Versioned Schemas from NoSQL Databases and its Applications. LNCS, 9381:467–480.

Sadalage, P. J. and Fowler, M. (2013). NoSQL Distilled : A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley.

Shriparv, S. (2010). Learning HBase. Packt Publishing.

Tudorica, B. G. and Bucur, C. A. (2011). A Comparison between Several NoSQL Databases with Comments and Notes. In Proc. RoEduNet IEEE Intern. Conference.

Zhao, G., Lin, Q., Li, L., and Li, Z. (2014). Schema conversion model of SQL database to NoSQL. In Proc. 9th Intern. Conference 3PGCIC, pages 355–362. IEEE.
Como Citar

Selecione um Formato
FROZZA, Angelo Augusto; DEFREYN, Eduardo Dias; MELLO, Ronaldo dos Santos. A Process for Inference of Columnar NoSQL Database Schemas. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 35. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 175-180. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2020.13637.