Um Método para Detecção e Diagnóstico de Outliers em Dados Urbanos via Análise Multidimensional

  • Thiago I. A. Souza UFC
  • Deborah Magalhaes UFC
  • Andre L. L. Aquino UFAL
  • Danielo G. Gomes UFC

Abstract


Since 2007, for the first time in history, more people live in cities than in the countryside and this number only tends to grow. More people in the cities mean more stress on urban infrastructures, greater demand for public services, and an ever-increasing rate of heterogeneous (multidimensional) data generation. Data are essential for the implementation of evidence-based public policies. In this paper, we propose a method for detecting and diagnosing multidimensional urban data outliers in 4 sequential stages: (i) modeling the matrix data in a 3D tensor; (ii) Tucker3 decomposition to extract latent factors; (iii) outliers detection statistics, and (iv) diagnostic techniques in the inspection of outliers causes. Using real data from the Smart Citizen platform, our method allows us to identify the environmental variables that most impact the outliers. Moreover, ROC curves indicated an accuracy gain of 20% over the multivariate approach.

References

Alcala, C. F. and Qin, S. J. (2011). Analysis and generalization of fault diagnosis methods for process monitoring. Journal of Process Control, 21:322–330.

Aquino, A. L. L., Junior, O. S., Frery, A. C., Albuquerque, E. L., and Mini, R. A. F. (2014). Musa: Multivariate sampling algorithm for wireless sensor networks. IEEE Transactions on Computers, 63:968–978.

Babar, M. and Arif, F. (2017). Smart urban planning using big data analytics to contend with the interoperability in internet of things. Knowledge-Based Systems, 77:65–76.

Camacho, J., Villegas, A. P., Teodoro, P. G., and Fernandez, G. M. (2016). Pca-based multivariate statistical network monitoring for anomaly detection. Computers and Security, 59:118–137.

Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41:1–58.

Chen, D., Li, X., Wang, L., Khan, S., Wang, J., Zeng, K., and Cai, C. (2015). Fast and scalable multi-way analysis of massive neural data. IEEE Transactions on Computers, 64:707–719.

Dong, J.-D., Zhang, Y.-Y., Zhang, S., Wang, Y.-S., Yang, Z.-H., and Wu, M.-L. (2010). Identication of temporal and spatial variations of water quality in sanya bay, china by three-way principal component analysis. Environmental Earth Sciences, 60:1673– 1682.

Fanaee-T, H. and Gama, J. (2016). Tensor-based anomaly detection: An interdisciplinary survey. Knowledge-Based Systems, 98:130–147.

Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27:861–874.

Guardiola, I. G., Leon, T., and Mallor, F. (2014). A functional approach to monitor and recognize patterns of daily trafc proles. Transportation Research Part B, 65:119– 136.

Hotelling, H. (1947). Multivariate quality control. In: Techniques of statistical analysis. NewYork: McGraw-Hill.

Ibrahim, A. T. H., Victor, C., Nor, B. A., Kayode, A., Ibrar, Y., Abdullah, G., Ejaz, A., and Haruna, C. (2016). The role of big data in smart city. International Journal of Information Management, 36:748–758.

Khatib, E. J., Barco, R., Munoz, P., Bandera, I., and Serrano, I. (2016). Self-healing in mobile networks with big data. IEEE Communications Magazine, 54:114–120.

Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications. Society for Industrial and Applied Mathematics, 51:455–500.

Kroonenberg, P. M. (2008). Applied Multiway Data Analysis. John Wiley and Sons.

Li, J., Han, G., Wen, J., and Gao, X. (2011). Robust tensor subspace learning for anomaly detection. International Journal of Machine Learning and Cybernetics, 2:89–98.

Osanaiye, O., Choo, K.-K. R., and Dlodlo, M. (2016). Distributed denial of service (ddos) resilience in cloud: Review and conceptual cloud ddos mitigation framework. Journal of Network and Computer Applications, 67:147–165.

Programme, U. N. H. S. (2016). World Cities Report 2016: Urbanization and Development : Emerging Futures. UN Habitat.

Rathore, M. M., Ahmad, A., Paul, A., and Rho, S. (2016). Urban planning and building smart cities based on the internet of things using big data analytics. Knowledge-Based Systems, 101:63–80.

Singh, K. P., Malik, A., Singh, V. K., and Sinha, S. (2006). Multiway data analysis of soils irrigated with wastewater-a case study. Chemometrics and Intelligent Laboratory Systems, 83:1–12.

Slavakis, K., Giannakis, G. B., and Mateos, G. (2014). Modeling and optimization for big data analytics. IEEE Signal Processing Magazine, 31:18–31.

Souza, T. I. A., Magalhães, D. M. V., and Gomes, D. G. (2017). Aplicando estatística multivariada para detecção e diagnóstico de anomalias em dados urbanos. Anais do I Workshop de Computação Urbana (CoUrb), 1:72–85.

Steed, C. A., Ricciuto, D. M., Shipman, G., Smith, B., Thornton, P. E., Wang, D., Shi, X., and Williams, D. N. (2013). Big data visual analytics for exploratory earth system simulation analysis. Computers And Geosciences, 61:71–82.

Suzhi, B., Rui, Z., Zhi, D., and Shuguang, C. (2015). Wireless communications in the era of big data. IEEE Communications Magazine, 53:190–199.

United Nations, D. o. E. and Social Affairs, P. D. (2015). World urbanization prospects: The 2014 revision, highlights.

Xu, Z., Yan, F., and Qi, Y. (2015). Bayesian nonparametric models for multiway data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37:475– 487.

Zhang, K., Ni, J., Yang, K., Liang, X., Ren, J., and Shen, X. (2017). Security and privacy in smart city applications: Challenges and solutions. IEEE Communications Magazine, 17:122–129.

Zhang, L., Zhang, L., Tao, D., and Huang, X. (2011). A multifeature tensor for remotesensing target recognition. IEEE Geoscience and Remote Sensing Letters, 8:374–378.
Published
2018-05-10
SOUZA, Thiago I. A.; MAGALHAES, Deborah; AQUINO, Andre L. L.; GOMES, Danielo G.. Um Método para Detecção e Diagnóstico de Outliers em Dados Urbanos via Análise Multidimensional. In: BRAZILIAN SYMPOSIUM ON COMPUTER NETWORKS AND DISTRIBUTED SYSTEMS (SBRC), 36. , 2018, Campos do Jordão. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 295-308. ISSN 2177-9384. DOI: https://doi.org/10.5753/sbrc.2018.2423.

Most read articles by the same author(s)

1 2 3 4 > >>