TY - JOUR AU - Frozza, Angelo Augusto AU - Dias Defreyn, Eduardo AU - dos Santos Mello, Ronaldo PY - 2021/11/19 Y2 - 2024/03/29 TI - An Approach for Schema Extraction of NoSQL Columnar Databases: the HBase Case Study JF - Journal of Information and Data Management JA - JIDM VL - 12 IS - 5 SE - SBBD 2020 Short papers - Extended Papers DO - 10.5753/jidm.2021.1966 UR - https://sol.sbc.org.br/journals/index.php/jidm/article/view/1966 SP - AB - <p>Although NoSQL databases do not require a schema a priori, being aware of the database schema is essential for activities like data integration, data validation, or data interoperability. This paper presents a process for the extraction of columnar NoSQL database schemas. We adopt JSON as a canonical format for data representation, and we validate the proposed process through a prototype tool that is able to extract schemas from the HBase columnar NoSQL database system. HBase was chosen as a case study because it is one of the most popular columnar NoSQL solutions. When compared to related work, we innovate by proposing a simple solution for the inference of column data types for columnar NoSQL databases that store only byte arrays as column values, and a resulting schema that follows the JSON Schema format.</p> ER -