Um Método para Detecção e Diagnóstico de Outliers em Dados Urbanos via Análise Multidimensional
Resumo
Desde 2007, pela primeira vez na História, mais pessoas vivem nas cidades do que no campo e este número só tende a crescer. Mais pessoas nas cidades significa maior estresse nas infraestruturas urbanas, maior demanda por serviços públicos e também uma taxa de geração de dados heterogêneos cada vez maior. Dados são essenciais para implementação de políticas públicas baseadas em evidências. Neste artigo, propomos um método para detecção e diagnóstico de outliers em dados urbanos via análise multidimensional em 4 passos sequenciais: (i) modelagem dos dados matriciais em um tensor 3D; (ii) decomposição Tucker3 para extração dos fatores latentes; (iii) estatísticas de detecção de outliers, e (iv) técnicas diagnósticas na inspeção das causas dos outliers. A partir de dados reais da plataforma Smart Citizen, nosso método permite identificar as variáveis ambientais que mais impactam os outliers. Além disso, as curvas ROC indicaram um ganho de acurácia de 20% com relaçãoá abordagem multivariada.
Referências
Aquino, A. L. L., Junior, O. S., Frery, A. C., Albuquerque, E. L., and Mini, R. A. F. (2014). Musa: Multivariate sampling algorithm for wireless sensor networks. IEEE Transactions on Computers, 63:968–978.
Babar, M. and Arif, F. (2017). Smart urban planning using big data analytics to contend with the interoperability in internet of things. Knowledge-Based Systems, 77:65–76.
Camacho, J., Villegas, A. P., Teodoro, P. G., and Fernandez, G. M. (2016). Pca-based multivariate statistical network monitoring for anomaly detection. Computers and Security, 59:118–137.
Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41:1–58.
Chen, D., Li, X., Wang, L., Khan, S., Wang, J., Zeng, K., and Cai, C. (2015). Fast and scalable multi-way analysis of massive neural data. IEEE Transactions on Computers, 64:707–719.
Dong, J.-D., Zhang, Y.-Y., Zhang, S., Wang, Y.-S., Yang, Z.-H., and Wu, M.-L. (2010). Identication of temporal and spatial variations of water quality in sanya bay, china by three-way principal component analysis. Environmental Earth Sciences, 60:1673– 1682.
Fanaee-T, H. and Gama, J. (2016). Tensor-based anomaly detection: An interdisciplinary survey. Knowledge-Based Systems, 98:130–147.
Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27:861–874.
Guardiola, I. G., Leon, T., and Mallor, F. (2014). A functional approach to monitor and recognize patterns of daily trafc proles. Transportation Research Part B, 65:119– 136.
Hotelling, H. (1947). Multivariate quality control. In: Techniques of statistical analysis. NewYork: McGraw-Hill.
Ibrahim, A. T. H., Victor, C., Nor, B. A., Kayode, A., Ibrar, Y., Abdullah, G., Ejaz, A., and Haruna, C. (2016). The role of big data in smart city. International Journal of Information Management, 36:748–758.
Khatib, E. J., Barco, R., Munoz, P., Bandera, I., and Serrano, I. (2016). Self-healing in mobile networks with big data. IEEE Communications Magazine, 54:114–120.
Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications. Society for Industrial and Applied Mathematics, 51:455–500.
Kroonenberg, P. M. (2008). Applied Multiway Data Analysis. John Wiley and Sons.
Li, J., Han, G., Wen, J., and Gao, X. (2011). Robust tensor subspace learning for anomaly detection. International Journal of Machine Learning and Cybernetics, 2:89–98.
Osanaiye, O., Choo, K.-K. R., and Dlodlo, M. (2016). Distributed denial of service (ddos) resilience in cloud: Review and conceptual cloud ddos mitigation framework. Journal of Network and Computer Applications, 67:147–165.
Programme, U. N. H. S. (2016). World Cities Report 2016: Urbanization and Development : Emerging Futures. UN Habitat.
Rathore, M. M., Ahmad, A., Paul, A., and Rho, S. (2016). Urban planning and building smart cities based on the internet of things using big data analytics. Knowledge-Based Systems, 101:63–80.
Singh, K. P., Malik, A., Singh, V. K., and Sinha, S. (2006). Multiway data analysis of soils irrigated with wastewater-a case study. Chemometrics and Intelligent Laboratory Systems, 83:1–12.
Slavakis, K., Giannakis, G. B., and Mateos, G. (2014). Modeling and optimization for big data analytics. IEEE Signal Processing Magazine, 31:18–31.
Souza, T. I. A., Magalhães, D. M. V., and Gomes, D. G. (2017). Aplicando estatística multivariada para detecção e diagnóstico de anomalias em dados urbanos. Anais do I Workshop de Computação Urbana (CoUrb), 1:72–85.
Steed, C. A., Ricciuto, D. M., Shipman, G., Smith, B., Thornton, P. E., Wang, D., Shi, X., and Williams, D. N. (2013). Big data visual analytics for exploratory earth system simulation analysis. Computers And Geosciences, 61:71–82.
Suzhi, B., Rui, Z., Zhi, D., and Shuguang, C. (2015). Wireless communications in the era of big data. IEEE Communications Magazine, 53:190–199.
United Nations, D. o. E. and Social Affairs, P. D. (2015). World urbanization prospects: The 2014 revision, highlights.
Xu, Z., Yan, F., and Qi, Y. (2015). Bayesian nonparametric models for multiway data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37:475– 487.
Zhang, K., Ni, J., Yang, K., Liang, X., Ren, J., and Shen, X. (2017). Security and privacy in smart city applications: Challenges and solutions. IEEE Communications Magazine, 17:122–129.
Zhang, L., Zhang, L., Tao, D., and Huang, X. (2011). A multifeature tensor for remotesensing target recognition. IEEE Geoscience and Remote Sensing Letters, 8:374–378.