A Proposal for Profiling Datasets on the Web with Semantic Enrichment

  • Natacha Targino Federal University of Pernambuco
  • Damires Souza Federal Institute of Education, Science and Technology of Paraíba
  • Ana Carolina Salgado Federal University of Pernambuco

Abstract


The lack of metadata to describe datasets published on the Web makes their location and access by search engines or applications more difficult. Providing a dataset profile facilitates communication between publishers and consumers and also the integrated use of datasets. This paper proposes an approach that describes datasets on the Web by the generation of a semantically enriched descriptive and structural metadata profile. The enrichment occurs by means of the knowledge domain identification of the dataset at hand and a vocabulary recommendation in order to semantically reference the data. This work presents some accomplished experiments that indicate the relevance of this enrichment.
Keywords: Web Data, Metadata, Semantic Enrichment

References

Abele, A. (2016) “Linked Data Profiling: Identifying the Domain of Datasets Based on Data Content and Metadata”, In: 25th International Conference Companion on World Wide Web. Canada, p. 287-291.

Assaf, A., Troncy, R. and Senart, A. (2015) “Roomba: An extensible framework to validate and build dataset profiles”, In: 24th International Conference on World Wide Web, Italy, p. 159-162.

Clarke, M. and Harley, P. (2014) “How smart is your content? Using semantic enrichment to improve your user experience and your bottom line”, Science Editor, v. 37, n. 2, p. 40–44.

Ellefi, M. B., Bellahsene, Z., Scharffe, F. and Todorov, K. (2014) “Towards semantic dataset profiling”, In: International Workshop on Dataset Profiling & Federated Search for Linked Data co-located with the 11th Extended Semantic Web Conference. Greece.

Ellefi, M. B., Bellahsene, Z. and Todorov, K. (2015) “Datavore: a vocabulary recommender tool assisting Linked Data modeling”, In: 14th International Semantic Web Conference Posters & Demonstrations Track a Track. United States.

Lóscio, B. F., Burle, C., Calegari, N. (2017) “Data on the web best practices. The World Wide Web Consortium”, https://www.w3.org/TR/dwbp/ Último Acesso: 20 de maio de 2017.

Maali, F., Erickson, J., and Archer, P. (2014). “Data catalog vocabulary (DCAT). W3C recommendation, The World Wide Web Consortium”, https://www.w3.org/TR/vocab-dcat/ Último Acesso: 20 de maio de 2017.

Oliveira, M. I. S., Oliveira, L. A., Lima, G. F. B. and Lóscio, B. F. (2016). “Enabling a unified view of open data catalogs”, In: 18th International Conference on Enterprise Information Systems (ICEIS). Italy, p. 230-239.

Ouksili, H., Kedad, Z. and Lopes, S. (2014) “Theme Identification in RDF Graphs”, In: 4th International Conference on Model and Data Engineering (MEDI). Cyprus, p. 321-329

Schaible, J., Gottron, T., Scheglmann, S. and Scherp, A. (2013) “LOVER: support for modeling data using linked open vocabularies”, In: EDBT/ICDT 2013 Joint Conference. Italy, p. 89–92.

Shahi, D. (2015) Apache Solr: a practical approach to enterprise search. Apress, Primeira Edição, p. 82–85. ISBN: 978-1-4842-1071-0
Published
2017-10-02
TARGINO, Natacha; SOUZA, Damires; SALGADO, Ana Carolina. A Proposal for Profiling Datasets on the Web with Semantic Enrichment. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 32. , 2017, Uberlândia/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2017 . p. 172-183. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2017.171425.