Generating a Quality Profile for Dynamic Data Sources
Abstract
Nowadays, a massive volume of data has been produced by a variety of data sources. The easy access to these data presents new opportunities. In this sense, choosing the most suitable data sources for a specific use has become a challenge. The literature contains many works that perform quality assessment in data sources as a mean of solving this issue. However, only few works take into account the dynamicity of sources. In this work, we address the problem of performing data quality assessment in dynamic data sources. Furthermore, we propose the establishment of a Quality Profile, which consists in a set of metadata that provides information about the quality of a data source. The experiments performed on real-world scenarios have demonstrated that our strategy produces satisfactory results.
Keywords:
Quality assessment, dynamic data sources
References
Baba, R. K., Vaz, M. S. M. G., and Costa, J. (2014). Correção de dados agrometeorológicos utilizando métodos estatísticos. Revista Brasileira de Meteorologia, 29(4).
Dong, X. L., Saha, B., and Srivastava, D. (2013). Less is more: selecting sources wisely for integration. In Proceedings of the 39th international conference on Very Large Data Bases, PVLDB’13, pages 37–48. VLDB Endowment.
Duquennoy, S., Grimaud, G., and Vandewalle, J. J. (2009). The web of things: Interconnecting devices with high usability and performance. In Embedded Software and Systems, 2009. ICESS ’09. International Conference on, pages 323–330.
Dustdar, S., Pichler, R., Savenkov, V., and Truong, H.-L. (2012). Quality-aware service-oriented data integration: Requirements, state of the art and open challenges. SIGMOD Rec., 41(1):11–19.
Lóscio, B. F., Batista, M. C. M., Souza, D., and Salgado, A. C. (2012). Using information quality for the identification of relevant web data sources: A proposal. In Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services, IIWAS ’12, pages 36–44, New York, NY, USA. ACM.
Lóscio, B. F., Burle, C., and Calegari, N. (2016). Data on the web best practices.
Malaverri, J. E. G., Santanche, A., and Medeiros, C. B. (2014). A provenance-based approach to evaluate data quality in escience. Int. J. Metadata Semant. Ontologies, 9(1):15–28.
Mihaila, G. A., Raschid, L., and Vidal, M. (2000). Using quality of data metadata for source selection and ranking. In Proceedings of the Third International Workshop on the Web and Databases, WebDB, pages 93–98.
Naumann, F. and Freytag, J. C. (2000). Completeness of information sources. Technical report, Humboldt University of Berlin.
Pipino, L. L., Lee, Y. W., and Wang, R. Y. (2002). Data quality assessment. Commun. ACM, 45(4):211–218.
Rekatsinas, T., Dong, X. L., and Srivastava, D. (2014). Characterizing and selecting fresh data sources. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD’14, pages 919–930, Snowbird, Utah, USA. ACM.
Wang, R. Y. and Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. J. Manage. Inf. Syst., 12(4):5–33.
Xian, X.-F., Zhao, P.-P., Fang, W., Xin, J., and Cui, Z.-M. (2009). Quality-based data source selection for web-scale deep web data integration. In 2009 International Conference on Machine Learning and Cybernetics, volume 1, pages 427–432.
Dong, X. L., Saha, B., and Srivastava, D. (2013). Less is more: selecting sources wisely for integration. In Proceedings of the 39th international conference on Very Large Data Bases, PVLDB’13, pages 37–48. VLDB Endowment.
Duquennoy, S., Grimaud, G., and Vandewalle, J. J. (2009). The web of things: Interconnecting devices with high usability and performance. In Embedded Software and Systems, 2009. ICESS ’09. International Conference on, pages 323–330.
Dustdar, S., Pichler, R., Savenkov, V., and Truong, H.-L. (2012). Quality-aware service-oriented data integration: Requirements, state of the art and open challenges. SIGMOD Rec., 41(1):11–19.
Lóscio, B. F., Batista, M. C. M., Souza, D., and Salgado, A. C. (2012). Using information quality for the identification of relevant web data sources: A proposal. In Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services, IIWAS ’12, pages 36–44, New York, NY, USA. ACM.
Lóscio, B. F., Burle, C., and Calegari, N. (2016). Data on the web best practices.
Malaverri, J. E. G., Santanche, A., and Medeiros, C. B. (2014). A provenance-based approach to evaluate data quality in escience. Int. J. Metadata Semant. Ontologies, 9(1):15–28.
Mihaila, G. A., Raschid, L., and Vidal, M. (2000). Using quality of data metadata for source selection and ranking. In Proceedings of the Third International Workshop on the Web and Databases, WebDB, pages 93–98.
Naumann, F. and Freytag, J. C. (2000). Completeness of information sources. Technical report, Humboldt University of Berlin.
Pipino, L. L., Lee, Y. W., and Wang, R. Y. (2002). Data quality assessment. Commun. ACM, 45(4):211–218.
Rekatsinas, T., Dong, X. L., and Srivastava, D. (2014). Characterizing and selecting fresh data sources. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD’14, pages 919–930, Snowbird, Utah, USA. ACM.
Wang, R. Y. and Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. J. Manage. Inf. Syst., 12(4):5–33.
Xian, X.-F., Zhao, P.-P., Fang, W., Xin, J., and Cui, Z.-M. (2009). Quality-based data source selection for web-scale deep web data integration. In 2009 International Conference on Machine Learning and Cybernetics, volume 1, pages 427–432.
Published
2016-10-04
How to Cite
SILVA NETO, Everaldo Costa; LÓSCIO, Bernadette Farias; SALGADO, Ana Carolina.
Generating a Quality Profile for Dynamic Data Sources. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 31. , 2016, Salvador/BA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2016
.
p. 52-63.
ISSN 2763-8979.
DOI: https://doi.org/10.5753/sbbd.2016.24308.
