Pseudo Labeling and Classification of High-Dimensional Data using Visual Analytics
Resumo
Machine learning (ML) works with data consisting of tens up to tens of thousands of measurements (dimensions) per sample. As the number of dimensions and/or samples grow, so does the difficulty of understanding such data and its ML pipelines. Visualization, and in particular Visual Analytics (VA) has emerged as one of the key approaches that helps practitioners with the understanding of high-dimensional data and with ML engineering tasks. In this paper, we investigate several novel approaches by which VA can help ML (and conversely). Our work focuses on a visualization technique called dimensionality reduction, or projection, and the task of training a classifier when only a small amount of ground-truth labels is available. As result, experiments show that projections can capture very well the data structure present in high dimensions to support the design of high-performance feature and classifier learning models. Also, experiments relate projection quality to data separation and classifier performance. Finally, we combine these two observations to assist users in manual labeling samples to show that both algorithms and humans can exploit projections to build better classifiers. We argue that the ability of pseudo labels in retain information from 2D projected spaces is the key idea that links all these contributions.Referências
N. Andrienko, G. Andrienko, G. Fuchs, A. Slingsby, C. Turkay, and S. Wrobel, Visual Analytics for Data Scientists. Springer, 2020.
K. A. Cook and J. J. Thomas, “Illuminating the path: The research and development agenda for visual analytics,” Pacific Northwest National Laboratory (PNNL), Richland, WA (US), Tech. Rep., 2005.
C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonable effectiveness of data in deep learning era.” in Proc. ICCV, 2017, pp. 843–852.
B. C. Benato, J. F. Gomes, A. C. Telea, and A. X. Falcão, “Semi-supervised deep learning based on label propagation in a 2D embedded space,” in Proc. CIARP. Springer, 2021, pp. 371–381.
B. C. Benato, A. C. Telea, and A. X. Falcao, “Iterative pseudo-labeling with deep feature annotation and confidence-based sampling,” in Proc. SIBGRAPI. IEEE, 2021, pp. 192–198.
B. C. Benato, A. C. Telea, and A. X. Falcão, “Deep feature annotation by iterative meta-pseudo-labeling on 2d projections,” Pattern Recognition, vol. 141, p. 109649, 2023.
W. Amorim, A. Falcão, J. Papa, and M. Carvalho, “Improving semi-supervised learning through optimum connectivity,” Pattern Recognit., vol. 60, pp. 72–85, 2016.
L. van der Maaten, “Accelerating t-SNE using tree-based algorithms,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 3221–3245, 2014.
Y. LeCun and C. Cortes, “MNIST handwritten digit database,” 2010, [link].
C. Suzuki, J. Gomes, A. Falcão, S. Shimizu, and J.Papa, “Automated diagnosis of human intestinal parasites using optical microscopy images,” in Proc. Symp. Biomedical Imaging, April 2013, pp. 460–463.
J. E. Vargas-Muñoz, P. Zhou, A. X. Falcão, and D. Tuia, “Interactive co-conut tree annotation using feature space projections,” in Proc. IGARSS, 2019, pp. 5718–5721.
M. E. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M. A. Kadir, Z. B. Mahbub, K. R. Islam, M. S. Khan, A. Iqbal, N. Al Emadi et al., “Can ai help in screening viral and covid-19 pneumonia?” IEEE Access, vol. 8, pp. 132 665–132 676, 2020.
T. Rahman, A. Khandakar, Y. Qiblawey, A. Tahir, S. Kiranyaz, S. B. A. Kashem, M. T. Islam, S. Al Maadeed, S. M. Zughaier, M. S. Khan et al., “Exploring the effect of image enhancement techniques on covid-19 detection using chest x-ray images,” Comput. Biol. Med., vol. 132, p. 104319, 2021.
L. Nonato and M. Aupetit, “Multidimensional projection for visual analytics: Linking techniques with distortions, tasks, and layout enrichment,” IEEE Trans. Vis. Comput. Graph, 2018.
M. Espadoto, R. Martins, A. Kerren, N. Hirata, and A. Telea, “Toward a quantitative survey of dimension reduction techniques,” IEEE TVC, vol. 27, no. 3, pp. 2153–2173, 2019.
J. Venna and S. Kaski, “Visualizing gene interaction graphs with local multidimensional scaling,” in Proc. ESANN, vol. 6, 2006, pp. 557–562.
P. Joia, D. Coimbra, J. A. Cuminato, F. V. Paulovich, and L. G. Nonato, “Local affine multidimensional projection,” in Proc. IEEE TVCG, 2011, pp. 2563–2571.
F. V. Paulovich, L. G. Nonato, R. Minghim, and H. Levkowitz, “Least square projection: A fast high-precision multidimensional projection technique and its application to document mapping,” IEEE TVCG, pp. 564–575, 2008.
B. C. Benato, A. X. Falcão, and A.-C. Telea, “Linking data separation, visual separation, and classifier performance using pseudo-labeling by contrastive learning,” in Proc. VISAPP, 2023.
J. L. Fleiss and J. Cohen, “The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability,” Educ. Psychol. Meas., vol. 33, no. 3, pp. 613–619, 1973.
B. C. Benato, J. F. Gomes, A. C. Telea, and A. X. Falcão, “Semi-automatic data annotation guided by feature space projection,” Pattern Recognit., vol. 109, p. 107612, 2021.
M. F. C. Rodrigues, R. Hirata, and A. Telea, “Image-based visualization of classifier decision boundaries,” in Proc. SIBGRAPI, 2018, pp. 353–360.
B. C. Benato, A. X. Falcao, and A. C. Telea, “Linking data separation, visual separation, and classifier performance using pseudo-labeling by contrastive learning,” in Proc. VISAPP. SciTePress, 2023.
B. C. Benato, A. X. Falcão, and A. C. Telea, “Measuring the quality of projections of high-dimensional labeled data,” Computers & Graphics, vol. 116, pp. 287–297, 2023.
B. C. Benato, C. Grosu, A. X. Falcao, and A. C. Telea, “Human-in-the-loop: Using classifier decision boundary maps to improve pseudo labels,” in submitted, 2024.
M. Roder, L. A. Passos, L. C. F. Ribeiro, B. C. Benato, A. X. Falcão, and J. P. Papa, “Intestinal parasites classification using deep belief networks,” in International Conference on Artificial Intelligence and Soft Computing. Springer, 2020, pp. 242–251.
B. C. Benato, I. E. de Souza, F. L. Galvão, , and A. X. Falcão, “Convolutional neural networks from image markers,” in Beyond back-propagation: novel ideas for training neural architectures, Workshop at NeurIPS, 2020.
I. E. de Souza, B. C. Benato, and A. X. Falcão, “33rd conference on graphics, patterns and images. sibgrapi,” in Feature learning from image markers for object delineation, 2020.
L. M. João, M. C. Abrantes, B. C. Benato, and A. X. Falcão, “19th international joint conference on computer vision, imaging and computer graphics theory and applications. visapp (to appear),” in Understanding marker-based normalization for FLIM Networks, 2024.
K. A. Cook and J. J. Thomas, “Illuminating the path: The research and development agenda for visual analytics,” Pacific Northwest National Laboratory (PNNL), Richland, WA (US), Tech. Rep., 2005.
C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonable effectiveness of data in deep learning era.” in Proc. ICCV, 2017, pp. 843–852.
B. C. Benato, J. F. Gomes, A. C. Telea, and A. X. Falcão, “Semi-supervised deep learning based on label propagation in a 2D embedded space,” in Proc. CIARP. Springer, 2021, pp. 371–381.
B. C. Benato, A. C. Telea, and A. X. Falcao, “Iterative pseudo-labeling with deep feature annotation and confidence-based sampling,” in Proc. SIBGRAPI. IEEE, 2021, pp. 192–198.
B. C. Benato, A. C. Telea, and A. X. Falcão, “Deep feature annotation by iterative meta-pseudo-labeling on 2d projections,” Pattern Recognition, vol. 141, p. 109649, 2023.
W. Amorim, A. Falcão, J. Papa, and M. Carvalho, “Improving semi-supervised learning through optimum connectivity,” Pattern Recognit., vol. 60, pp. 72–85, 2016.
L. van der Maaten, “Accelerating t-SNE using tree-based algorithms,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 3221–3245, 2014.
Y. LeCun and C. Cortes, “MNIST handwritten digit database,” 2010, [link].
C. Suzuki, J. Gomes, A. Falcão, S. Shimizu, and J.Papa, “Automated diagnosis of human intestinal parasites using optical microscopy images,” in Proc. Symp. Biomedical Imaging, April 2013, pp. 460–463.
J. E. Vargas-Muñoz, P. Zhou, A. X. Falcão, and D. Tuia, “Interactive co-conut tree annotation using feature space projections,” in Proc. IGARSS, 2019, pp. 5718–5721.
M. E. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M. A. Kadir, Z. B. Mahbub, K. R. Islam, M. S. Khan, A. Iqbal, N. Al Emadi et al., “Can ai help in screening viral and covid-19 pneumonia?” IEEE Access, vol. 8, pp. 132 665–132 676, 2020.
T. Rahman, A. Khandakar, Y. Qiblawey, A. Tahir, S. Kiranyaz, S. B. A. Kashem, M. T. Islam, S. Al Maadeed, S. M. Zughaier, M. S. Khan et al., “Exploring the effect of image enhancement techniques on covid-19 detection using chest x-ray images,” Comput. Biol. Med., vol. 132, p. 104319, 2021.
L. Nonato and M. Aupetit, “Multidimensional projection for visual analytics: Linking techniques with distortions, tasks, and layout enrichment,” IEEE Trans. Vis. Comput. Graph, 2018.
M. Espadoto, R. Martins, A. Kerren, N. Hirata, and A. Telea, “Toward a quantitative survey of dimension reduction techniques,” IEEE TVC, vol. 27, no. 3, pp. 2153–2173, 2019.
J. Venna and S. Kaski, “Visualizing gene interaction graphs with local multidimensional scaling,” in Proc. ESANN, vol. 6, 2006, pp. 557–562.
P. Joia, D. Coimbra, J. A. Cuminato, F. V. Paulovich, and L. G. Nonato, “Local affine multidimensional projection,” in Proc. IEEE TVCG, 2011, pp. 2563–2571.
F. V. Paulovich, L. G. Nonato, R. Minghim, and H. Levkowitz, “Least square projection: A fast high-precision multidimensional projection technique and its application to document mapping,” IEEE TVCG, pp. 564–575, 2008.
B. C. Benato, A. X. Falcão, and A.-C. Telea, “Linking data separation, visual separation, and classifier performance using pseudo-labeling by contrastive learning,” in Proc. VISAPP, 2023.
J. L. Fleiss and J. Cohen, “The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability,” Educ. Psychol. Meas., vol. 33, no. 3, pp. 613–619, 1973.
B. C. Benato, J. F. Gomes, A. C. Telea, and A. X. Falcão, “Semi-automatic data annotation guided by feature space projection,” Pattern Recognit., vol. 109, p. 107612, 2021.
M. F. C. Rodrigues, R. Hirata, and A. Telea, “Image-based visualization of classifier decision boundaries,” in Proc. SIBGRAPI, 2018, pp. 353–360.
B. C. Benato, A. X. Falcao, and A. C. Telea, “Linking data separation, visual separation, and classifier performance using pseudo-labeling by contrastive learning,” in Proc. VISAPP. SciTePress, 2023.
B. C. Benato, A. X. Falcão, and A. C. Telea, “Measuring the quality of projections of high-dimensional labeled data,” Computers & Graphics, vol. 116, pp. 287–297, 2023.
B. C. Benato, C. Grosu, A. X. Falcao, and A. C. Telea, “Human-in-the-loop: Using classifier decision boundary maps to improve pseudo labels,” in submitted, 2024.
M. Roder, L. A. Passos, L. C. F. Ribeiro, B. C. Benato, A. X. Falcão, and J. P. Papa, “Intestinal parasites classification using deep belief networks,” in International Conference on Artificial Intelligence and Soft Computing. Springer, 2020, pp. 242–251.
B. C. Benato, I. E. de Souza, F. L. Galvão, , and A. X. Falcão, “Convolutional neural networks from image markers,” in Beyond back-propagation: novel ideas for training neural architectures, Workshop at NeurIPS, 2020.
I. E. de Souza, B. C. Benato, and A. X. Falcão, “33rd conference on graphics, patterns and images. sibgrapi,” in Feature learning from image markers for object delineation, 2020.
L. M. João, M. C. Abrantes, B. C. Benato, and A. X. Falcão, “19th international joint conference on computer vision, imaging and computer graphics theory and applications. visapp (to appear),” in Understanding marker-based normalization for FLIM Networks, 2024.
Publicado
30/09/2024
Como Citar
BENATO, Bárbara C.; TELEA, Alexandru C.; FALCÃO, Alexandre X..
Pseudo Labeling and Classification of High-Dimensional Data using Visual Analytics. In: WORKSHOP DE TESES E DISSERTAÇÕES - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 37. , 2024, Manaus/AM.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 49-55.
DOI: https://doi.org/10.5753/sibgrapi.est.2024.31644.