Visual Analytics e Outlying Aspect Mining: contextualização de anomalias considerando questões temporais e multidimensionais

  • Felipe Marx Benghi UTFPR
  • Luiz Gomes-Jr UTFPR

Resumo


Outlying Aspect Mining (OAM) is a new way of handling outliers that, instead of focusing solely on the detection, also provides an explanation. This is done by presenting a subspace of attributes that had the most abnormal behavior. Acknowledging this group of attributes is important but only listing them is not sufficient for a human specialist to comprehend the situation and take the necessary actions. A higher-level, visual approach can improve the process, providing better cognitive clues to experts. Here we describe a Visual Analytics platform developed to present data and OAM outputs in a human-friendly interface. A novelty available on this platform is a parallel coordinates plot that also display temporal multidimensional data. Such representation overcome human visual system limitations and helps in the outlier investigation. To explore the applicability of the developed tool, a locomotive operation user case is employed with focus on fault analysis in an OAM point of view.

Referências

Barlow, N. and Stuart, L. J. (2004). Animator: A tool for the animation of parallel coordinates. In Proceedings. Eighth International Conference on Information Visualisation, 2004. IV 2004., pages 725–730. IEEE.

Blaas, J., Botha, C., and Post, F. (2008). Extensions of parallel coordinates for interactive exploration of large multi-timepoint data sets. IEEE Transactions on Visualization and Computer Graphics, 14(6):1436–1451.

Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. (2000). Lof: identifying densitybased local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 93–104.

Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):1–58.

Chen, M., Trefethen, A., Banares-Alcantara, R., Jirotka, M., Coecke, B., Ertl, T., and Schmidt, A. (2011). From data analysis and visualization to causality discovery. Computer, (10):84–87.

Cielen, D., Meysman, A., and Ali, M. (2016). Introducing data science: big data, machine learning, and more, using Python tools. Manning Publications Co.

Glendenning, K., Wischgoll, T., Harris, J., Vickery, R., and Blaha, L. (2016). Parameter space visualization for large-scale datasets using parallel coordinate plots. Electronic Imaging, 2016(1):1–8.

Hawkins, D. (1980). Identification of Outliers. Springer Netherlands, 1st edition.

Heinrich, J. and Weiskopf, D. (2013). State of the art of parallel coordinates. In Eurographics (STARs), pages 95–116.

Inselberg, A. (1985). The plane with parallel coordinates. The visual computer, 1(2):69– 91.

Johansson, J., Ljung, P., and Cooper, M. D. (2007). Depth cues and density in temporal parallel coordinates. In EuroVis, volume 7, pages 35–42.

Keim, D. A., Mansmann, F., and Thomas, J. (2010). Visual analytics: How much visualization and how much analytics? SIGKDD Explor. Newsl., 11(2):5–8.

Liu, F. T., Ting, K. M., and Zhou, Z.-H. (2008). Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, pages 413–422. IEEE.

Novotny, M. and Hauser, H. (2006). Outlier-preserving focus+context visualization in IEEE Transactions on Visualization and Computer Graphics, parallel coordinates. 12(5):893–900.

Sansen, J., Richer, G., Jourde, T., Lalanne, F., Auber, D., and Bourqui, R. (2017a). Visual exploration of large multidimensional data using parallel coordinates on big data infrastructure. In Informatics, volume 4, page 21. Multidisciplinary Digital Publishing Institute.

Sansen, J., Richer, G., Jourde, T., Lalanne, F., Auber, D., and Bourqui, R. (2017b). Visual exploration of large multidimensional data using parallel coordinates on big data infrastructure. In Informatics, volume 4, page 21. Multidisciplinary Digital Publishing Institute.

Tadeja, S. K., Kipouros, T., and Kristensson, P. O. (2019). Exploring parallel coordinates plots in virtual reality. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, pages 1–6.

Thomas, J. and Cook, K. A. (2005). Illuminating the path: The r&d agenda for visual analytics national visualization and analytics center. National Visualization and Analytics Center.

Tong, C., Zhang, J., Chowdhury, A., and Trost, S. G. (2019). An interactive visualization tool for sensor-based physical activity data analysis. In Proceedings of the Australasian Computer Science Week Multiconference, pages 1–4.

Vinh, N. X., Chan, J., Romano, S., Bailey, J., Leckie, C., Ramamohanarao, K., and Pei, J. (2016). Discovering outlying aspects in large datasets. Data Mining and Knowledge Discovery, 30(6):1520–1555.

Webga, K. and Lu, A. (2015). Discovery of rating fraud with real-time streaming visual analytics. In 2015 IEEE Symposium on Visualization for Cyber Security (VizSec), pages 1–8. IEEE.

Zhonghua, Y. and Lingda, W. (2016). 3d-parallel coordinates: Visualization for time varying multidimensional data. In 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), pages 655–658. IEEE.
Publicado
13/09/2021
BENGHI, Felipe Marx; GOMES-JR, Luiz. Visual Analytics e Outlying Aspect Mining: contextualização de anomalias considerando questões temporais e multidimensionais. In: ESCOLA REGIONAL DE BANCO DE DADOS (ERBD), 16. , 2021, Santa Maria. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 21-30. ISSN 2595-413X. DOI: https://doi.org/10.5753/erbd.2021.17235.