Profiling for Confidence: Debugging Relationships among Urban Spatio-Temporal Datasets

Laís M. A. Rocha; Mirella M. Moro; Juliana Freire

doi:10.5753/ctd.2020.11375

Laís M. A. Rocha UFMG
Mirella M. Moro UFMG
Juliana Freire NYU

DOI: https://doi.org/10.5753/ctd.2020.11375

Resumo

We aim to help users identify potential issues in spatio-temporal data and thus gain trust in the results they derive from such data -- a crucial benefit in the era of data science and big data. We propose a framework for profiling spatio-temporal relationships that automatically identifies data slices that deviate from what is expected, which can be further analyzed for quality issues and/or potential effects on analysis' results. We describe the profiling methodology and present cases studies using real urban datasets, then emphasizing the need for spatio-temporal profiling to build trust on data analysis' results.

Palavras-chave: Urban Data, Complex Relationships, Profiling

Referências

Alin, A. (2010). Simpson’s paradox. Wiley Interdisciplinary Reviews: Computational Statistics, 2(2):247-250.

Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289-300.

Kalpic, D., Hlupic, N., and Lovric, M. (2011). Student’s t-tests. International encyclopedia of statistical science, pages 1559-1563.

Rocha, L. M., Bessa, A., Chirigati, F., OFriel, E., Moro, M. M., and Freire, J. (2019). Understanding spatio-temporal urban processes. In IEEE Big Data, pages 563-572.