Preprocessing and Analysis of Taxi Data
Abstract
The study of large amounts of data is a current challenge and we must be prepared to treat and analyze them. In this task, pre-processing is essential for verifying data, identifying inconsistencies, possible errors and incompleteness. In this work, two datasets with more than thirty million records of the movement of taxis in the cities of San Francisco and Rome were analyzed. We propose an algorithm to treat anomalous speeds identified in the preprocessing step of these datasets. We present the analysis of the datasets before and after the application of the algorithm, showing its relevance and pertinence. The results show specific characteristics of the taxi service in the two metropolises.
References
Bracciale, L., Bonola, M., Loreti, P., Bianchi, G., Amici, R., and Rabuffi, A. (2014). CRAWDAD dataset roma/taxi (v. 2014-07-17). Downloaded from [link].
Ganti, R., Srivatsa, M., Ranganathan, A., and Han, J. (2013). Inferring Human Mobility Patterns from Taxicab Location Traces. In Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, pages 459–468. ACM.
Gibbons, J. D. and Chakraborti, S. (2003). Nonparametric Statistical Inference. Marcel Dekker, New York.
Jones, K., Liu, L., and Alizadeh-Shabdiz, F. (2007). Improving Wireless Positioning with Look-Ahead Map-Matching. In Fourth Annual International Conference on Mobile and Ubiquitous Systems: Networking & Services (MobiQuitous), pages 1–8. IEEE.
Júnior, A. M. S., Sousa, M. L., Xavier, F. Z., Xavier, W. Z., Almeida, J. M., Ziviani, A., Rangel, F., Avila, C., and Marques-Neto, H. T. (2016). Caracterização do Serviço de Táxi a partir de Corridas Solicitadas por um Aplicativo de Smartphone. In XXXIV Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC). SBC.
Kozievitch, N. P., Gadda, T. M. C., Fonseca, K. V. O., Rosa, M. O., Gomes-Jr, L. C., and Akbar, M. (2016). Exploratory Analysis of Public Transportation Data in Curitiba. In 43o. Seminário Integrado de Software e Hardware (SEMISH). SBC.
Monteiro, C. M. (2016). Padrões de Mobilidade Urbana em Serviços de Táxi. Mestrado em Modelagem Matemática e Computacional, Centro Federal de Educação Tecnológica de Minas Gerais – CEFET-MG, Belo Horizonte.
Monteiro, C. M., Silva, F. R., and Murta, C. D. (2016). Análise de Padrões Espaciais e Temporais da Mobilidade de Táxis em San Francisco e Roma. In 43o. Seminário Integrado de Software e Hardware (SEMISH). SBC.
Oliveira, A., Souza, M., de A. Pereira, M., Reis, F. A. L., Almeida, P. E. M., Silva, E. J., and Crepalde, D. S. (2015). Optimization of Taxi Cabs Assignment in Geographical Location-based Systems. In XVI Brazilian Symposium on GeoInformatics, pages 92–104. SBC.
Piorkowski, M., Sarafijanovic-Djukic, N., and Grossglauser, M. (2009). CRAWDAD dataset epfl/mobility (v. 2009-02-24). Downloaded from [link].
Rossi, L., Walker, J., and Musolesi, M. (2015). Spatio-temporal techniques for user identification by means of GPS mobility data. EPJ Data Science, 4(1):1–16.
Valero, B., Luis, J., Julián, A., Belén, A., Villén, G., and Natalia (2014). GNSS. GPS: Fundamentos y Aplicaciones en Geomática. Editorial de la Universidad Politécnica de Valencia, Valencia.
