Profiling for Confidence: Debugging Relationships among Urban Spatio-Temporal Datasets
Resumo
We aim to help users identify potential issues in spatio-temporal data and thus gain trust in the results they derive from such data -- a crucial benefit in the era of data science and big data. We propose a framework for profiling spatio-temporal relationships that automatically identifies data slices that deviate from what is expected, which can be further analyzed for quality issues and/or potential effects on analysis' results. We describe the profiling methodology and present cases studies using real urban datasets, then emphasizing the need for spatio-temporal profiling to build trust on data analysis' results.
Referências
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289-300.
Kalpic, D., Hlupic, N., and Lovric, M. (2011). Student’s t-tests. International encyclopedia of statistical science, pages 1559-1563.
Rocha, L. M., Bessa, A., Chirigati, F., OFriel, E., Moro, M. M., and Freire, J. (2019). Understanding spatio-temporal urban processes. In IEEE Big Data, pages 563-572.