Optimizing Diffusion Rate and Label Reliability in a Graph-Based Semi-supervised Classifier

Bruno Klaus de Aquino Afonso; Lilian Berton

Bruno Klaus de Aquino Afonso UNIFESP https://orcid.org/0000-0003-2086-1054
Lilian Berton UNIFESP https://orcid.org/0000-0003-1397-6005

Resumo

Semi-supervised learning has received attention from researchers, as it allows one to exploit the structure of unlabeled data to achieve competitive classification results with much fewer labels than supervised approaches. The Local and Global Consistency (LGC) algorithm is one of the most well-known graph-based semi-supervised (GSSL) classifiers. Notably, its solution can be written as a linear combination of the known labels. The coefficients of this linear combination depend on a parameter α, determining the decay of the reward over time when reaching labeled vertices in a random walk. In this work, we discuss how removing the self-influence of a labeled instance may be beneficial, and how it relates to leave-one-out error. Moreover, we propose to minimize this leave-one-out loss with automatic differentiation. Within this framework, we propose methods to estimate label reliability and diffusion rate. Optimizing the diffusion rate is more efficiently accomplished with a spectral representation. Results show that the label reliability approach competes with robust ℓ1-norm methods and that removing diagonal entries reduces the risk of overfitting and leads to suitable criteria for parameter selection.

Palavras-chave: Machine learning, Leave-one-out, Semi-supervised learning, Graph-based approaches, Label propagation, Eigendecomposition