Collaborative Refinement of Web Data Based on Social Coding

  • Helton Douglas A. dos Santos UFPE
  • Marcelo Iury S. Oliveira UFRPE
  • Bernadette Farias Lóscio UFPE

Abstract


The Web has emerged as an important platform for sharing information, enabling the publishing and consumption of datasets from different domains. In this contexto, dataset refinement is a primary activity mainly related to data cleansing and enrichment. Usually, refinement is performed by publishers, although consumers often clean and enrich datasets in their consumption activities. However, in general, consumer’s effort is lost, since most times the result of the refinement is not shared back with the publisher or other consumers. In this context, this work proposes a refinement strategy based on the principles of social coding to allow the refinement of datasets published on the Web in a collaborative way.

Keywords: Dataset refinement, data cleaning, social coding, collaborative refinement

References

Chapman, A. D. (2005). Principles of data quality. Report for the Global Biodiversity Information Facility, Copenhagen.

Clarke, M. and Harley, P. (2014). How smart is your content? using semantic enrichment to improve your user experience and your bottom line. Science Editor, 37(2):41.

da Silva, K. M. (2019). Um modelo de ciclo de vida dos dados na web. Master’s thesis, Universidade Federal de Pernambuco, Centro de Informática, Curso de Pós-Graduação em Ciências da Computação, Recife.

dos Santos, H. D. A., Oliveira, M. I. S., Glória de Fátima, A., da Silva, K. M., Muniz, R. I. V. C. S., and Lóscio, B. F. (2018). Investigations into data published and consumed on the web: a systematic mapping study. Journal of the Brazilian Computer Society, 24(1):14. https://doi.org/10.1186/s13173-018-0077-z

Fileto, R., Bogorny, V., May, C., and Klein, D. (2015). Semantic enrichment and analysis of movement data: probably it is just starting! SIGSPATIAL Special, 7(1):11–18. https://doi.org/10.1145/2782759.2782763

Gousios, G., Pinzger, M., and Deursen, A. v. (2014). An exploratory study of the pullbased software development model. In Proceedings of the 36th International Conference on Software Engineering, pages 345–355. ACM. https://doi.org/10.1145/2568225.2568260

Levine, S. S. and Prietula, M. J. (2013). Open collaboration for innovation: Principles and performance. Organization Science, 25(5):1414–1433. https://doi.org/10.1287/orsc.2013.0872

Lóscio, B. F., Oliveira, M. I. S., and Bittencourt, I. I. (2015). Publicação e Consumo de Dados na Web: Conceitos e Desafios. Tópicos em Gerenciamento de Dados e

Maletic, J. I. and Marcus, A. (2000). Data cleansing: Beyond integrity analysis. In Proceedings of the 2000 Conference on Information Quality, pages 200–209.

Maletic, J. I. and Marcus, A. (2000). Data cleansing: Beyond integrity analysis. In Iq, pages 200–209. DOI: https://doi.org/10.1145/3209281.3209355

Oliveira, L. E. R., Oliveira, M. I. S., Santos, W. C. d. R., and Lóscio, B. F. (2018). Data on the web management system: a reference model. In Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Age, page 2. ACM.

Rahm, E. and Do, H. H. (2000). Data cleaning: Problems and current approaches. IEEE Data Eng. Bull., 23(4):3–13.

Wang, R. Y. and Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of management information systems, 12(4):5–33.
Published
2019-10-07
DOS SANTOS, Helton Douglas A.; OLIVEIRA, Marcelo Iury S.; LÓSCIO, Bernadette Farias. Collaborative Refinement of Web Data Based on Social Coding. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 34. , 2019, Fortaleza. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 49-60. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2019.8807.