Garantindo a Qualidade de Dados na Fusão de Dados Conectados: Um caso de uso de SHACL em dados abertos de Mobilidade e Educação de Curitiba
Abstract
Smart cities are a context which can gain great advantage in the format and growth of data in the semantic web, as volume and connection increase the quality of data analysis. However, the quantitative growth of data must happen with quality assurance. This work aims to verify the quality of data in the fusion of connected data, through the dimensions of quality accuracy, consistency and conciseness. For the quality constraints to be verified, the SHACL language (Shapes Constraint Language) was used, and a Python script was created to perform the verification. The tests were performed on a set of connected open data from the domain of urban mobility and education in the city of Curitiba.References
Belizario, M, G., Rita Cristina G. Berardi. Linked Open Data in Smart Cities: An application in the domains of Mobility and Education. Anais do XVII Escola Regional de Banco de Dados. SBC, 2022.
Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. IEEE intelligent systems, 24(2), 8-12..
Heath, T.; Bizer, C. (2011) "Linked data: Evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology", Morgan & Claypool Publishers, v. 1, n. 1, p. 1-136, 2011.
Mendes, P. N., Mühleisen, H., & Bizer, C. (2012, March). Sieve: linked data quality assessment and fusion. In Proceedings of the 2012 joint EDBT/ICDT workshops (pp. 116-123).
Naphade, M., Banavar, G., Harrison, C., Paraszczak, J., & Morris, R. (2011). Smarter cities and their innovation challenges. Computer, 44(6), 32-39.
Pandit, H. J., O'Sullivan, D., & Lewis, D. (2018, October). Using Ontology Design Patterns To Define SHACL Shapes. In WOP@ ISWC (pp. 67-71).
Paulheim, H., & Stuckenschmidt, H. (2016). Fast approximate a-box consistency checking using machine learning. In The Semantic Web. Latest Advances and New Domains: 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29--June 2, 2016, Proceedings 13 (pp. 135-150). Springer International Publishing.
Rabbani, K., Lissandrini, M., & Hose, K. (2022, April). SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption. In Companion Proceedings of the Web Conference 2022 (pp. 260-263).
Rietveld, L., Beek, W., & Schlobach, S. (2015). LOD lab: Experiments at LOD scale. In The Semantic Web-ISWC 2015: 14th International Semantic Web Conference, Bethlehem, PA, USA, October 11-15, 2015, Proceedings, Part II 14 (pp. 339-355). Springer International Publishing.
Spahiu, B., Maurino, A., & Palmonari, M. (2018, October). Towards Improving the Quality of Knowledge Graphs with Data-driven Ontology Patterns and SHACL. In ISWC (Best Workshop Papers) (pp. 103-117).
Zaveri, A., Kontokostas, D., Sherif, M. A., Bühmann, L., Morsey, M., Auer, S., & Lehmann, J. (2013, September). User-driven quality evaluation of dbpedia. In Proceedings of the 9th International Conference on Semantic Systems (pp. 97-104).
Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of management information systems, 12(4), 5-33.
Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. IEEE intelligent systems, 24(2), 8-12..
Heath, T.; Bizer, C. (2011) "Linked data: Evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology", Morgan & Claypool Publishers, v. 1, n. 1, p. 1-136, 2011.
Mendes, P. N., Mühleisen, H., & Bizer, C. (2012, March). Sieve: linked data quality assessment and fusion. In Proceedings of the 2012 joint EDBT/ICDT workshops (pp. 116-123).
Naphade, M., Banavar, G., Harrison, C., Paraszczak, J., & Morris, R. (2011). Smarter cities and their innovation challenges. Computer, 44(6), 32-39.
Pandit, H. J., O'Sullivan, D., & Lewis, D. (2018, October). Using Ontology Design Patterns To Define SHACL Shapes. In WOP@ ISWC (pp. 67-71).
Paulheim, H., & Stuckenschmidt, H. (2016). Fast approximate a-box consistency checking using machine learning. In The Semantic Web. Latest Advances and New Domains: 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29--June 2, 2016, Proceedings 13 (pp. 135-150). Springer International Publishing.
Rabbani, K., Lissandrini, M., & Hose, K. (2022, April). SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption. In Companion Proceedings of the Web Conference 2022 (pp. 260-263).
Rietveld, L., Beek, W., & Schlobach, S. (2015). LOD lab: Experiments at LOD scale. In The Semantic Web-ISWC 2015: 14th International Semantic Web Conference, Bethlehem, PA, USA, October 11-15, 2015, Proceedings, Part II 14 (pp. 339-355). Springer International Publishing.
Spahiu, B., Maurino, A., & Palmonari, M. (2018, October). Towards Improving the Quality of Knowledge Graphs with Data-driven Ontology Patterns and SHACL. In ISWC (Best Workshop Papers) (pp. 103-117).
Zaveri, A., Kontokostas, D., Sherif, M. A., Bühmann, L., Morsey, M., Auer, S., & Lehmann, J. (2013, September). User-driven quality evaluation of dbpedia. In Proceedings of the 9th International Conference on Semantic Systems (pp. 97-104).
Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of management information systems, 12(4), 5-33.
Published
2023-04-11
How to Cite
BERTUCINI, Otávio Thomas; BERARDI, Rita C. G.; BELIZARIO, Mateus G.; KOZIEVITCH, Nadia.
Garantindo a Qualidade de Dados na Fusão de Dados Conectados: Um caso de uso de SHACL em dados abertos de Mobilidade e Educação de Curitiba. In: REGIONAL DATABASE SCHOOL (ERBD), 18. , 2023, Palmas/PR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2023
.
p. 31-40.
ISSN 2595-413X.
DOI: https://doi.org/10.5753/erbd.2023.229429.
