A Process for Inference of Columnar NoSQL Database Schemas
Although NoSQL Databases do not require a schema a priori, to be aware of the database schema is essential for activities like data integration, data validation or data interoperability. This paper presents a process for inference of columnar NoSQL DB schemas. We validate the proposed process through a prototype tool that is able to extract schemas from the HBase columnar NoSQL database system. HBase was chosen as a case study because it is one of the most popular columnar NoSQL solutions. When compared to related work, we novel by proposing a simple solution for the inference of column data types for columnar NoSQL databases that store only byte arrays as column values, as well as a generated schema that follows the JSON Schema format.
Han, J., Haihong, E., Le, G., and Du, J. (2011). Survey on NoSQL database. In VI International Conference on Pervasive Computing and Applications, pages 363–366.
Hewitt, E. (2010). Cassandra: The Definitive Guide. O’Reilly Media.
Kiran, V. K. and Vijayakumar, R. (2014). Ontology-based data integration of NoSQL datastores. In IX Int. Conf. on Industrial and Information Sysvems, pages 1–6.
Ruiz, D. S., Morales, S. F., and Molina, J. G. (2015). Inferring Versioned Schemas from NoSQL Databases and its Applications. LNCS, 9381:467–480.
Sadalage, P. J. and Fowler, M. (2013). NoSQL Distilled : A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley.
Shriparv, S. (2010). Learning HBase. Packt Publishing.
Tudorica, B. G. and Bucur, C. A. (2011). A Comparison between Several NoSQL Databases with Comments and Notes. In Proc. RoEduNet IEEE Intern. Conference.
Zhao, G., Lin, Q., Li, L., and Li, Z. (2014). Schema conversion model of SQL database to NoSQL. In Proc. 9th Intern. Conference 3PGCIC, pages 355–362. IEEE.