An Approach for Schema Extraction of NoSQL Columnar Databases: the HBase Case Study


  • Angelo Augusto Frozza Instituto Federal Catarinense - Universidade Federal de Santa Catarina
  • Eduardo Dias Defreyn Universidade Federal de Santa Catarina
  • Ronaldo dos Santos Mello Universidade Federal de Santa Catarina



Columnar, HBase, JSON Schema, NoSQL, Schema extraction


Although NoSQL databases do not require a schema a priori, being aware of the database schema is essential for activities like data integration, data validation, or data interoperability. This paper presents a process for the extraction of columnar NoSQL database schemas. We adopt JSON as a canonical format for data representation, and we validate the proposed process through a prototype tool that is able to extract schemas from the HBase columnar NoSQL database system. HBase was chosen as a case study because it is one of the most popular columnar NoSQL solutions. When compared to related work, we innovate by proposing a simple solution for the inference of column data types for columnar NoSQL databases that store only byte arrays as column values, and a resulting schema that follows the JSON Schema format.


Download data is not yet available.


SBBD 2020 Short papers - Extended Papers