A Robust Pseudo-label Reevaluation Strategy for the Self-training Algorithm

  • Luiz M. S. Silva UFRN
  • Renan M. R. A. Costa UFRN
  • José A. A. Paiva UFRN
  • Arthur C. Gorgônio UFRN
  • Karliane M. O. Vale UFRN
  • Flavius L. Gorgônio UFRN

Resumo


Este trabalho propõe uma extensão do algoritmo Self-Training com reavaliação iterativa de pseudo-rótulos, utilizando a métrica de silhueta para identificar e remover instâncias ruidosas, e um comitê de classificadores com votação ponderada para reforçar decisões em casos de baixa confiança. A abordagem visa mitigar a propagação de erros e aumentar a robustez do aprendizado semissupervisionado. Avaliações realizadas em 18 bases de dados demonstraram desempenho superior ao self-training original em termos de acurácia, F1-score e estabilidade, especialmente em cenários com poucos dados rotulados.

Referências

Amini, M.-R., Feofanov, V., Pauletto, L., Hadjadj, L., Émilie Devijver, and Maximov, Y. (2025). Self-training: A survey. Neurocomputing, 616:128904.

Chapelle, O., Scholkopf, B., and Zien, A. (2006). Semi-Supervised Learning. The MIT Press.

Dinh, D.-T., Fujinami, T., and Huynh, V.-N. (2025). Estimating the optimal number of clusters in categorical data clustering by silhouette coefficient. arXiv preprint arXiv:2501.15542.

Fang, B., Li, X., Han, G., and He, J. (2023). Rethinking pseudo-labeling for semi-supervised facial expression recognition with contrastive self-supervised learning. IEEE Access, 11:45547–45558.

Gomes, H. M., Grzenda, M., Mello, R., Read, J., Nguyen, M.-H. L., and Bifet, A. (2021). A survey on semi-supervised learning for delayed partially labelled data streams. ACM Computing Surveys, 55:1 – 42.

Guérin, A., Chauvet, P., and Saubion, F. (2024). A survey on recent advances in self-organizing maps. arXiv preprint arXiv:2501.08416.

He, Y., Chen, W., Liang, K., Tan, Y., Liang, Z., and Guo, Y. (2023). Pseudo-label correction and learning for semi-supervised object detection. arXiv preprint arXiv:2303.02998.

Li, D., Liu, Z., Armaghani, D. J., Xiao, P., and Zhou, J. (2022). Novel ensemble intelligence methodologies for rockburst assessment in complex and variable environments. Scientific Reports, 12.

Li, J., Xie, Q., Dai, Z., Hovy, E., Le, Q. V., and Luong, M.-T. (2021). Confidence-aware pseudo label selection for semi-supervised learning. In International Conference on Learning Representations (ICLR). arXiv preprint arXiv:2006.10807.

Li, X., Grandvalet, Y., and Davoine, F. (2019). Learning to self-train for semi-supervised few-shot classification. In Advances in Neural Information Processing Systems (NeurIPS), pages 10276–10286.

Liu, Y., Zhan, L., Feng, Y., Si, P., Jiang, S., Zhao, Q., and Yan, C. (2024). Loose-tight cluster regularization for unsupervised person re-identification. The Visual Computer, pages 1–14. Early online version.

Oymak, S. and Gulcu, T. C. (2020). Statistical and algorithmic insights for semi-supervised learning with self-training. arXiv preprint arXiv:2006.11006.

Radosavovic, I., Dollar, P., Girshick, R., Gkioxari, G., and He, K. (2021). Designing pseudo-labeling for semi-supervised learning. In International Conference on Learning Representations (ICLR).

Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53–65.

Slack, D., Hilgard, S., Wu, X., Singh, S., and Talwalkar, A. (2020). Noisy student training for robust semi-supervised learning. arXiv preprint arXiv:2006.06855.

Sun, C., Shrivastava, A., Singh, S., and Gupta, A. (2021). Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 843–852.

Vale, K. M. O., Gorgônio, A. C., Flavius Da Luz, E. G., and Canuto, A. M. D. P. (2021). An efficient approach to select instances in self-training and co-training semi-supervised methods. IEEE Access, 10:7254–7276.

Wang, K., Zhang, C., Geng, Y., and Hu, H. (2023). Evidential pseudo-label ensemble for semi-supervised classification. Pattern Recognition Letters, 177:135–141.

Xie, L., Singh, A., and Precup, D. (2021). Revisiting k-means: New algorithms via bayesian nonparametrics. In Proceedings of the International Conference on Machine Learning (ICML), pages 11399–11408. PMLR.

Xie, Q., Luong, M.-T., Hovy, E., and Le, Q. V. (2020). Self-training with noisy student improves imagenet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10687–10698.

Zhu, X. and Goldberg, A. B. (2009). Introduction to Semi-Supervised Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool.
Publicado
29/09/2025
SILVA, Luiz M. S.; COSTA, Renan M. R. A.; PAIVA, José A. A.; GORGÔNIO, Arthur C.; VALE, Karliane M. O.; GORGÔNIO, Flavius L.. A Robust Pseudo-label Reevaluation Strategy for the Self-training Algorithm. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 22. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 1739-1750. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2025.13905.

Artigos mais lidos do(s) mesmo(s) autor(es)