Visual Exploration of an Ensemble of Classifiers
Resumo
Inspecting the outputs of classification algorithms is becoming progressively difficult due to the increase in both scale and complexity of both the data and the algorithms. This has led to research efforts to develop new techniques to interpret the behavior of these algorithms and to facilitate the understanding of their results. A common classification approach is the ``ensemble of classifiers'', where a set of classifiers $c \in C$ is trained on the input data set and the final classification is computed by ``voting'', i.e., ranking their results. One of the issues with this approach, however, is that instead of having only one classifier to analyze, now there are $|C|$, each with its own characteristics. Thus, there is a demand for methods that provide insights into the results of an ensemble of classifiers and at the same time allow a detailed analysis of each classifier in the ensemble. Our work proposes to draw on dimensionality reduction techniques to provide visual tools to interpret the results of an ensemble of classifiers, while also giving insights into how each classifier contributes to the final results. Our approach also presents a measure of classification uncertainty by highlighting regions where there is a divergence among the classifiers in the ensemble, allowing one to focus their analysis on these regions. We tested our approach using the Digits MNIST and Fashion MNIST data sets. Through the use of maps that provide an overview of a classifier behavior to instance-based visualizations, we show how our approach can assist in the interpretation of why a specific decision (classification) was made.
Referências
Bokeh Development Team. Bokeh: python library for interactive visualization; 2018. URL: https://bokeh.pydata.org/en/latest/.
D. Caragea, D. Cook, H. Wickham, V. Honavar. Springer Berlin Heidelberg, Berlin, Heidelberg (2008), pp. 136-153
doi: 10.1007/978-3-540-71080-6_10.
Choo J., H. Lee, Kihm J., Park H. iVisClassifier: an interactive visual analytics system for classification based on supervised dimension reduction. Proceedings of the 2010 IEEE symposium on visual analytics science and technology, IEEE (2010), pp. 1-8, 10.1109/VAST.2010.5652443. URL: https://ieeexplore.ieee.org/document/5652443/.
P. Cunningham, S.J. Delany. k-nearest neighbour classifiers. Tech. Rep., University College Dublin (2007)
F. Doshi-Velez, Kim B. Towards a rigorous science of interpretable machine learning. 1702.08608 (2017)
P. Joia, D. Coimbra, J.A. Cuminato, F.V. Paulovich, L.G. Nonato. Local affine multidimensional projection
IEEE Trans Vis Comput Graph, 17 (12) (2011), pp. 2563-2571, 10.1109/TVCG.2011.220
M. Kahng, Fang D., Chau D.H. Visual exploration of machine learning results using data cube analysis
Proceedings of the workshop on human-in-the-loop data analytics, ACM, New York, NY, USA (2016), pp. 1:1-1:6, 10.1145/2939502.2939503. 978-1-4503-4207-0
H. Lakkaraju, S.H. Bach, J. Leskovec. Interpretable decision sets: a joint framework for description and prediction
Proceedings of the twenty-second ACM SIGKDD international conference on knowledge discovery and data mining, ACM (2016), pp. 1675-1684
LeCun Y, Cortes C. URL: http://yann.lecun.com/exdb/mnist/; 1999. Consulted in 2019.
Li F-F, Andreetto M, Ranzato M. http://www.vision.caltech.edu/Image_Datasets/Caltech101/, Consulted in 2019; 2003.
A.M. MacEachren, R.E. Roth, J. O’Brien, B. Li, D. Swingley, M. Gahegan. Visual semiotics & uncertainty visualization: an empirical study. IEEE Trans Vis Comput Graph, 18 (12) (2012), pp. 2496-2505, 10.1109/TVCG.2012.279
M.A. Migut, M. Worring, C.J. Veenman. Visualizing multi-dimensional decision boundaries in 2d Data Min Knowl Discov, 29 (1) (2015), pp. 273-295, 10.1007/s10618-013-0342-x
Ming Y., Qu H., E. Bertini. Rulematrix: visualizing and understanding classifiers with rules. IEEE Trans Vis Comput Graph, 25 (1) (2019), pp. 342-352, 10.1109/tvcg.2018.2864812
F. Paulovich, C. Silva, L. Nonato. User-centered multidimensional projection techniques. Comput Sci Eng, 14 (4) (2012), pp. 74-81, 10.1109/MCSE.2012.85. https://www.ieeexplore.ieee.org/document/6241366.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, et al. Scikit-learn: machine learning in python. J Mach Learn Res, 12 (2012)
P. Perner. How to interpret decision trees? Advances in data mining. Applications and theoretical aspects, Springer Berlin Heidelberg, Berlin, Heidelberg (2011), pp. 40-55
D. Ren, S. Amershi, B. Lee, J. Suh, J.D. Williams. Squares: supporting interactive performance analysis for multiclass classifiers. IEEE Trans Vis Comput Graph, 23 (1) (2016), pp. 61-70, 10.1109/TVCG.2016.2598828
M.T. Ribeiro, S. Singh, C. Guestrin. “Why Should I Trust You?”: explaining the predictions of any classifier
Proceedings of the twenty-second ACM SIGKDD international conference on knowledge discovery and data mining. San Francisco, CA, USA (2016), pp. 1135-1144
P.C. Ribeiro, H. Lopes. Inverse projection of vector fields. Proceedings of the conference on graphics, patterns and images (SIBGRAPI) (2018), pp. 1-8
F.C.M. Rodrigues, R. Hirata Jr, A.C. Telea. Image-based visualization of classifier decision boundaries
Proceedings of the 2018 thirty-first SIBGRAPI conference on graphics, patterns and images (SIBGRAPI) (2018), pp. 1-8, 10.1109/SIBGRAPI.2018.00052. https://ieeexplore.ieee.org/document/8614349.
A. Rosenfeld, A. Richardson. Explainability in human–agent systems. Autonomous Agents and Multi-Agent Systems (2019), 10.1007/s10458-019-09408-y
D. Shepard. Two-dimensional interpolation function for irregularly-spaced data. Proceedings of the 1968 twenty-third ACM national conference. ACM ’68, ACM, New York, NY, USA (1968), pp. 517-524, 10.1145/800186.810616
S. Sperandei. Understanding logistic regression analysis. Biochem Med (Zagreb), 24 (1) (2014), pp. 12-18, 10.11613/BM.2014.003
J. Talbot, B. Lee, A. Kapoor, D. Tan. EnsembleMatrix: Interactive visualization to support machine learning with multiple classifiers. CHI’09 Proceedings of the SIGCHI conference on human factors in computing systems, ACM Press (2009), pp. 1283-1292
P. Tamagnini, J. Krause, A. Dasgupta, E. Bertini. Interpreting black-box classifiers using instance-level visual explanations. Proceedings of the second workshop on human-in-the-loop data analytics. HILDA’17, ACM, New York, NY, USA (2017), pp. 6:1-6:6, 10.1145/3077257.3077260
Teoh S.T., Ma K.-L. PaintingClass: interactive construction, visualization and exploration of decision trees
Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA (2003), pp. 667-672, 10.1145/956750.956837
S. van den Elzen, J.J. van Wijk. BaobabView: interactive construction and analysis of decision trees. Proceedings of the 2011 IEEE conference on visual analytics science and technology, VAST 2011. Providence, Rhode Island, USA (2011), pp. 151-160, 10.1109/VAST.2011.6102453
Xiao H., K. Rasul, R. Vollgraf. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms