A novel visual approach for enhanced attribute analysis and selection
Resumo
As a consequence of the current capabilities of collecting and storing data, a data set of many attributes frequently reflects more than one phenomenon. Understanding the role of attribute subsets and their impact on the organization and structure of a data set under study is paramount to many exploratory and analytical tasks. Example applications range from medicine to financial markets, whereby one wishes to locate subsets of variables that impact the prediction of target categorical attributes. The user is essential in this context since automated techniques are not currently capable of embedding user knowledge in attribute selections. In this work, we propose an approach to deal with the analysis and selection of attributes in a data set based on three principles: firstly, we center the analysis of the relationships on categorical attributes or labels, because they usually summarize important state variables in the application; secondly, we express the relationship between target attributes and all others in the data set within a single visualization, providing understanding of a large number of correlations in the same visual frame; thirdly, we propose an interactive dual-visual approach whereby changes and selections in attribute space reflect visually on the configuration of data layouts, conceived to support immediate analysis of the impact of selected subsets of attributes in the organization of the data set. We validate our approach by means of a number of case studies, illustrating distinct scenarios of knowledge acquisition and feature selection.
Referências
P. Hoffman, G. Grinstein, K. Marx, I. Grosse, E. Stanley. Dna visual and analytic data mining. Proceedings of the visualization ’97 (Cat. No. 97CB36155) (1997), pp. 437-441, 10.1109/VISUAL.1997.663916
P. Hoffman, G. Grinstein, D. Pinkney. Dimensional anchors: a graphic primitive for multidimensional multivariate information visualizations. Proceedings of the workshop on new paradigms in information visualization and manipulation in conjunction with the eighth ACM international conference on information and knowledge management, NPIVM ’99, 1-58113-254-9, ACM, New York, NY, USA (1999), pp. 9-16, 10.1145/331770.331775
S. Ingram, T. Munzner, V. Irvine, M. Tory, S. Bergner, T. Möller. Dimstiller: workflows for dimensional analysis and reduction. Proceedings of the IEEE symposium on visual analytics science and technology (2010), pp. 3-10, 10.1109/VAST.2010.5652392
J. Choo, H. Lee, J. Kihm, H. Park. iVisClassifier: an interactive visual analytics system for classification based on supervised dimension reduction. Proceedings of the IEEE symposium on visual analytics science and technology (2010), pp. 27-34, 10.1109/VAST.2010.5652443
S. Cheng, K. Mueller. The data context map: fusing data and attributes into a unified display. IEEE Trans Vis Comput Graph, 22 (1) (2016), pp. 121-130, 10.1109/TVCG.2015.2467552
J.G. Dy, C.E. Brodley. Visualization and interactive feature selection for unsupervised data. Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’00, 1-58113-233-6, ACM, New York, NY, USA (2000), pp. 360-364, 10.1145/347090.347168
J. Yang, M.O. Ward, E.A. Rundensteiner, S. Huang. Visual hierarchical dimension reduction for exploration of high dimensional datasets. Proceedings of the symposium on data visualisation 2003, VISSYM ’03, 1-58113-698-6, Eurographics Association, Aire-la-Ville, Switzerland, Switzerland (2003), pp. 19-28
S. Johansson, J. Johansson. Interactive dimensionality reduction through user-defined combinations of quality metrics. IEEE Trans Vis Comput Graph, 15 (6) (2009), pp. 993-1000, 10.1109/TVCG.2009.153
J. Seo, B. Shneiderman. A rank-by-feature framework for interactive exploration of multidimensional data
Inf Vis, 4 (2) (2005), pp. 96-113, 10.1057/palgrave.ivs.9500091
A. Tatu, G. Albuquerque, M. Eisemann, P. Bak, H. Theisel, M. Magnor, et al. Automated analytical methods to support visual exploration of high-dimensional data. IEEE Trans Vis Comput Graph, 17 (5) (2011), pp. 584-597, 10.1109/TVCG.2010.242
F. Zhou, J. Li, W. Huang, Y. Zhao, X. Yuan, X. Liang, et al. Dimension reconstruction for visual exploration of subspace clusters in high-dimensional data. Proceedings of the IEEE pacific visualization symposium (PacificVis) (2016), pp. 128-135, 10.1109/PACIFICVIS.2016.7465260
S. McKenna, M. Meyer, C. Gregg, S. Gerber. S-corrplot: an interactive scatterplot for exploring correlation
J Comput Graph Stat, 25 (2) (2016), pp. 445-463, 10.1080/10618600.2015.1021926
M. Gleicher. Explainers: expert explorations with crafted projections. IEEE Trans Visual Comput Graph, 19 (12) (2013), pp. 2042-2051, 10.1109/TVCG.2013.157
D. Guo. Coordinating computational and visual approaches for interactive feature selection and multivariate clustering. Inf Vis, 2 (4) (2003), pp. 232-246, 10.1057/palgrave.ivs.9500053
A. Tatu, F. Maaß, I. Färber, E. Bertini, T. Schreck, T. Seidl, et al. Subspace search and visualization to make sense of alternative clusterings in high-dimensional data. Proceedings of the IEEE conference on visual analytics science and technology (VAST) (2012), pp. 63-72, 10.1109/VAST.2012.6400488
S. Liu, B. Wang, J.J. Thiagarajan, P.-T. Bremer, V. Pascucci. Visual exploration of high-dimensional data through subspace analysis and dynamic projections. Comput Graph Forum, 34 (3) (2015), pp. 271-280, 10.1111/cgf.12639
D. Jäckle, M. Hund, M. Behrisch, D.A. Keim, T. Schreck. Pattern trails: visual analysis of pattern transitions in subspaces. Proceedings of the IEEE Conference on Visual Analytics Science and Technology (VAST) (2017), pp. 1-12, 10.1109/VAST.2017.8585613
J. Krause, A. Perer, E. Bertini. Infuse: interactive feature selection for predictive modeling of high dimensional data. IEEE Trans Vis Comput Graph, 20 (12) (2014), pp. 1614-1623, 10.1109/TVCG.2014.2346482
T. Mühlbacher, H. Piringer. A partition-based framework for building and validating regression models
IEEE Trans Vis Comput Graph, 19 (12) (2013), pp. 1962-1971, 10.1109/TVCG.2013.125
P. Klemm, K. Lawonn, S. Glaßer, U. Niemann, K. Hegenscheid, H. Völzke, et al. 3D regression heat map analysis of population study data. IEEE Transs Vis Comput Graph, 22 (1) (2016), pp. 81-90, 10.1109/TVCG.2015.2468291
J. Bernard, M. Steiger, S. Widmer, H. Lücke-Tieke, T. May, J. Kohlhammer. Visual-interactive exploration of interesting multivariate relations in mixed research data sets. Comput Graph Forum, 33 (3) (2014), pp. 291-300, 10.1111/cgf.12385
T. May, A. Bannach, J. Davey, T. Ruppert, J. Kohlhammer. Guiding feature subset selection with an interactive visualization. Proceedings of the IEEE conference on visual analytics science and technology (VAST) (2011), pp. 111-120, 10.1109/VAST.2011.6102448
Y. Wang, J. Li, F. Nie, H. Theisel, M. Gong, D.J. Lehmann. Linear discriminative star coordinates for exploring class and cluster separation of high dimensional data. Comput Graph Forum, 36 (3) (2017), pp. 401-410, 10.1111/cgf.13197
A. Sanchez, C. Soguero-Ruiz, I. Mora-Jimnez, F. Rivas-Flores, D. Lehmann, M. Rubio-Sánchez. Scaled radial axes for interactive visual feature selection: a case study for analyzing chronic conditions. Expert Syst Appl, 100 (2018), pp. 182-196, 10.1016/j.eswa.2018.01.054
C. Turkay, P. Filzmoser, H. Hauser. Brushing dimensions - A dual visual analysis model for high-dimensional data
IEEE Trans Vis Comput Graph, 17 (12) (2011), pp. 2591-2599, 10.1109/TVCG.2011.178
C. Turkay, A. Lex, M. Streit, H. Pfister, H. Hauser. Characterizing cancer subtypes using dual analysis in Caleydo stratomex. IEEE Comput Graph Appl, 34 (2) (2014), pp. 38-47, 10.1109/MCG.2014.1
X. Yuan, D. Ren, Z. Wang, C. Guo. Dimension projection matrix/tree: interactive subspace visual exploration and analysis of high dimensional data. IEEE Trans Vis Comput Graph, 19 (12) (2013), pp. 2625-2633, 10.1109/TVCG.2013.150
P.E. Rauber, A.X. Falcao, A.C. Telea. Projections as visual aids for classification system design. Inf Vis (2017), 10.1177/1473871617713337
G.M. Draper, Y. Livnat, R.F. Riesenfeld. A survey of radial methods for information visualization. IEEE Trans Vis Comput Graph, 15 (5) (2009), pp. 759-776, 10.1109/TVCG.2009.23
S. Diehl, F. Beck, M. Burch. Uncovering strengths and weaknesses of radial visualizations—An empirical approach
IEEE Trans Vis Comput Graph, 16 (6) (2010), pp. 935-942, 10.1109/TVCG.2010.209
J. Sharko, G. Grinstein, K.A. Marx. Vectorized Radviz and its application to multiple cluster datasets. IEEE Trans Vis Comput Graph, 14 (6) (2008), pp. 1427-1444, 10.1109/TVCG.2008.173
J.H.P. Ono, F. Sikansi, D.C. Corrł a, F.V. Paulovich, A. Paiva, L.G. Nonato. Concentric Radviz: visual exploration of multi-task classification. Proceedings of the twenty-eighth SIBGRAPI conference on graphics, patterns and images (2015), pp. 165-172, 10.1109/SIBGRAPI.2015.38
F. Zhou, W. Huang, J. Li, Y. Huang, Y. Shi, Y. Zhao. Extending dimensions in Radviz based on mean shift. Proceedings of the IEEE pacific visualization symposium (PacificVis) (2015), pp. 111-115, 10.1109/PACIFICVIS.2015.7156365
S. Cheng, W. Xu, K. Mueller. Radviz deluxe: an attribute-aware display for multivariate data. Processes, 5 (4) (2017), p. 75, 10.3390/pr5040075
L.v.d. Maaten, G. Hinton. Visualizing data using t-sne. J Mach Learn Res, 9 (Nov) (2008), pp. 2579-2605
P. Joia, D. Coimbra, J.A. Cuminato, F.V. Paulovich, L.G. Nonato. Local affine multidimensional projection. IEEE Trans Vis Comput Graph, 17 (12) (2011), pp. 2563-2571, 10.1109/TVCG.2011.220
I. Jolliffe. Principal Component Analysis, 978-3-642-04898-2, Springer Berlin Heidelberg, Berlin, Heidelberg (2011), pp. 1094-1096, 10.1007/978-3-642-04898-2_455
Asuncion A., Newman D.. UCI machine learning repository. 2007.
Yi Liu, Y.F. Zheng. One-against-all multi-class SVM classification using reliability measures. Proceedings of the IEEE International Joint Conference on Neural Networks, 2 (2005), pp. 849-854 vol.2, 10.1109/IJCNN.2005.1555963
G. Tsoumakas, I. Katakis. Multi-label classification: an overview. Int J Data Warehous Min, 3 (3) (2006), 10.4018/jdwm.2007070101
B.G. Tabachnick, L.S. Fidell. Using multivariate statistics (5th ed.), 0205459382, Allyn & Bacon, Inc., Needham Heights, MA, USA (2006)
Z. Zhang, K.T. McDonnell, E. Zadok, K. Mueller. Visual correlation analysis of numerical and categorical data on the correlation map. IEEE Trans Vis Comput Graph, 21 (2) (2015), pp. 289-303, 10.1109/TVCG.2014.2350494
J. Mackinlay. Automating the design of graphical presentations of relational information. ACM Trans Graph, 5 (2) (1986), pp. 110-141, 10.1145/22949.22950
S.G. Kobourov. Spring embedders and force directed graph drawing algorithms. CoRR, abs/1201.3011 (2012)
G.A.P. Júnior, S. Scarpelini, A. Basile-Filho, J.I. de Andrade. Índices de trauma Medicina (Ribeirao Preto Online), 32 (3) (1999), pp. 237-250, 10.11606/issn.2176-7262.v32i3p237-250
C. Domingues, R.M.C.d. Sousa, L.d.S. Nogueira, R.S. Poggetti, B. Fontes, D. Muñoz. The role of the new trauma and injury severity score (NTRISS) for survival prediction. Revista da Escola de Enfermagem da USP, 45 (6) (2011), pp. 1353-1358, 10.1590/S0080-62342011000600011
C. Domingues, R. Coimbra, R.S. Poggetti, L. de Souza Nogueira, R.M.C. Sousa. Performance of new adjustments to the TRISS equation model in developed and developing countries. World J Emerg Surg, 12 (1) (2017), p. 17, 10.1186/s13017-017-0129-2
J. Li, J.Z. Wang. Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell, 25 (9) (2003), pp. 1075-1088, 10.1109/TPAMI.2003.1227984
D. Anguita, A. Ghio, L. Oneto, X. Parra, J.L. Reyes-Ortiz. A public domain dataset for human activity recognition using smartphones. Proceedings of the ESANN, 978-2-87419-081-0 (2013), pp. 437-442
D. Lin, X. Tang. Conditional infomax learning: an integrated framework for feature extraction and fusion
Proceedings of the computer vision – ECCV 2006, 978-3-540-33833-8, Springer Berlin Heidelberg, Berlin, Heidelberg (2006), pp. 68-82, 10.1007/11744023_6
F. Fleuret. Fast binary feature selection with conditional mutual information. J Mach Learn Res, 5 (2004), pp. 1531-1555
A. El Akadi, A. El Ouardighi, D. Aboutajdine. A powerful feature selection approach based on mutual information
Int J Comput Sci Netw Secur, 8 (4) (2008), pp. 116-121
H.H. Yang, J. Moody. Data visualization and feature selection: new algorithms for non-Gaussian data
Proceedings of the twelfth international conference on neural information processing systems, NIPS’99, MIT Press, Cambridge, MA, USA (1999), pp. 687-693
R. Battiti. Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw, 5 (4) (1994), pp. 537-550, 10.1109/72.298224
Hanchuan Peng, Fuhui Long, C. Ding. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell, 27 (8) (2005), pp. 1226-1238, 10.1109/TPAMI.2005.159
R.O. Duda, P.E. Hart, D.G. Stork. Pattern classification ((2nd Ed.)), 0471056693, Wiley-Interscience, New York, NY, USA (2000)
M. Robnik-Šikonja, I. Kononenko. Theoretical and empirical analysis of Relieff and RRelieff. Mach Learn, 53 (1–2) (2003), pp. 23-69, 10.1023/A:1025667309714
F. Nie, S. Xiang, Y. Jia, C. Zhang, S. Yan. Trace ratio criterion for feature selection. Proceedings of the twenty-third national conference on artificial intelligence, AAAI’08, 2, 978-1-57735-368-3, AAAI Press (2008), pp. 671-676
Huan Liu, R. Setiono. Chi2: feature selection and discretization of numeric attributes. Proceedings of the seventh IEEE international conference on tools with artificial intelligence (1995), pp. 388-391, 10.1109/TAI.1995.47978
S. Wright. The interpretation of population structure by f-statistics with special regard to systems of mating
Evolution, 19 (3) (1965), pp. 395-420, 10.1111/j.1558-5646.1965.tb01731.x
C. Gini. Variabilità e mutabilità. Libreria Eredi Virgilio Veschi, Rome (1912). Reprinted in Memorie di metodologia statistica (Edited by Pizetti, E. Salvemini, T.)
J. Li, K. Cheng, S. Wang, F. Morstatter, R.P. Trevino, J. Tang, et al. Feature selection: a data perspective
ACM Comput Surv, 50 (6) (2017), pp. 94:1-94:45, 10.1145/3136625