Semantic Enrichment and Exploration of Open Dataset Tags

  • Bruno P. C. de Castro UFRJ
  • Henrique F. Rodrigues UFRJ
  • Giseli R. Lopes UFRJ
  • Maria Luiza M. Campos UFRJ


This paper proposes an approach for semantic enrichment of dataset tags through the assignment of terms extracted from the dataset content and the association with meaningful external resources complementing existing tags originally attributed. In this approach, a RDF summary graph is generated to support datasets retrieval through the tags graph exploration. The motivation of this study is the need to improve datasets findability on Open Data Portals through the generation of a richer set of interlinked tags. The semantic enrichment approach is divided in four main steps, comprising cleaning, terms extraction and ranking, linking to associated ontologies or vocabularies terms, and the summarization in graph form, providing tag exploration to find other relevant datasets through tag connections. For the process we developed the Relevant Tag Extractor (RTagE), a semi-automatic software that extracts terms from a dataset, ranks and associates them with external resources. We exemplify the approach with datasets from a Web portal about the use of agrochemicals in agriculture, assigning enriched terms from the AGROVOC thesaurus as dataset tags.
