An Approach for Improving DBpedia as a Research Data Hub

  • Jean Gabriel Nguema Ngomo UFRJ
  • Giseli Rabello Lopes UFRJ
  • Maria Luiza Machado Campos UFRJ
  • Maria Cláudia Reis Cavalcanti IME

Resumo


Extracted from Wikipedia content, DBpedia is considered one of the most important knowledge bases of the Semantic Web, which has editions in several languages, among which those in English (DBpedia EN) and Portuguese (DBpedia PT). All DBpedia editions are subject to quality issues, more especially DBpedia PT suffers from inconsistencies and lack of data in several domains. This paper describes a semi-automatic and incremental process for publishing data on DBpedia, coming from reliable external sources, while seeking to improve aspects of its quality. In an open science context, the proposal aims at consolidating DBpedia as a reference hub for research data, so that research from any area supported by the Semantic Web data can use its data reliably. Although the approach is independent from a specific DBpedia edition, the supporting prototype tool, named ETL4DBpedia, was built for DBpedia PT, based on ETL workflows (Extract, Transform, Load). This paper also describes the assessment of the approach, applying the tool in a real-usage scenario involving data from the field of botany. This application resulted in an increase by 127% in the completeness of species of medicinal plants in DBpedia PT, besides showing satisfactory performance for ETL4Bpedia components.
Palavras-chave: Semantic Web, DBpedia, ETL, Open Science, Data Quality
Publicado
30/11/2020
NGOMO, Jean Gabriel Nguema; LOPES, Giseli Rabello; CAMPOS, Maria Luiza Machado; CAVALCANTI, Maria Cláudia Reis. An Approach for Improving DBpedia as a Research Data Hub. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 1. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 41-49.