Analysing a New Experimental Design Impact on the Performance of FlexCon-CE

Renan M. R. A. Costa; Luiz M. S. Silva; Arthur C. Gorgônio; Flavius L. Gorgônio; Karliane M. O. Vale

doi:10.5753/eniac.2024.245241

Renan M. R. A. Costa UFRN
Luiz M. S. Silva UFRN
Arthur C. Gorgônio UFRN
Flavius L. Gorgônio UFRN
Karliane M. O. Vale UFRN

DOI: https://doi.org/10.5753/eniac.2024.245241

Abstract

Semi-supervised learning is a subfield of machine learning that explores the combined use of labelled and unlabeled instances, with the latter being more numerous than the former. This type of learning is essentially an intersection of supervised and unsupervised learning. Self-training stands out among the various algorithms used in this context due to its wide application in the literature. Over the years, several variants of Self-training have been developed to enhance its performance, including FlexCon-CE, which serves as the basis for this work. FlexCon-CE is an algorithm that employs classifier committees to select and label unlabelled instances, integrating them into the labelled dataset. This study aims to analyse the performance of FlexCon-CE by expanding the number of analysed datasets, adding performance evaluation metrics and varying the percentages of initially labelled instances. The results show that FlexCon-CE performs well regardless of the experimental design and across different datasets.

Keywords: Semi-supervised learning, Self-training, FlexCon-CE

References

Bisong, E. et al. (2019). Building machine learning and deep learning models on Google cloud platform. Springer.

Breiman, L. (1996). Bias, variance, and arcing classifiers. Technical report, Tech. Rep. 460, Statistics Department, University of California, Berkeley . . . .

Chapelle, O., Scholkopf, B., and Zien, A. (2006). Semi-Supervised Learning. The MIT Press.

Chapelle, O., Scholkopf, B., and Zien, A. (2009). Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, 20(3):542–542.

Dheeru, D. and Taniskidou, E. K. (2017). Uci machine learning repository.

Gharroudi, O. (2017). Ensemble multi-label learning in supervised and semi-supervised settings. PhD thesis, Université de Lyon.

Gorgônio, A. C., Alves, C. T., Lucena, A. J., Gorgônio, F. L., Vale, K. M., and Canuto, A. M. (2019). Análise da variação do limiar para o algoritmo de aprendizado semissupervisionado flexcon-c. Brazilian Journal of Development, 5(11):26654–26669.

Harris, C. R., Millman, K. J., Van Der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., et al. (2020). Array programming with numpy. Nature, 585(7825):357–362.

Jan, M. Z. and Verma, B. (2019). A novel diversity measure and classifier selection approach for generating ensemble classifiers. Ieee Access, 7:156360–156373.

Knisely, B. M. and Pavliscsak, H. H. (2023). Research proposal content extraction using natural language processing and semi-supervised clustering: A demonstration and comparative analysis. Scientometrics.

Kuncheva, L. I. (2014). Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, Hoboken, NJ, 2nd edition.

Li, K., Li, X., Yin, R., and Wang, L. (2024a). A method for seismic fault identification based on self-training with high-stability pseudo-labels. Applied Soft Computing, page 111894.

Li, S., Kou, P., Ma, M., Yang, H., Huang, S., and Yang, Z. (2024b). Application of semi-supervised learning in image classification: Research on fusion of labeled and unlabeled data. IEEE Access, 12:27331–27343.

Ma, H., Jiang, F., Rong, Y., Guo, Y., and Huang, J. (2024). Toward robust self-training paradigm for molecular prediction tasks. Journal of Computational Biology, 31(3):213–228.

Medeiros, A., Gorgônio, A. C., Vale, K. M. O., Gorgônio, F. L., and Canuto, A. M. d. P. (2023). Flexcon-ce: A semi-supervised method with an ensemble-based adaptive confidence. In Brazilian Conference on Intelligent Systems, pages 95–109. Springer.

Monard, M. C. and Baranauskas, J. A. (2003). Conceitos sobre aprendizado de máquina. Sistemas inteligentes-Fundamentos e aplicações, 1(1):32.

Nascimento, D. S., Canuto, A. M., and Coelho, A. L. (2014). An empirical analysis of meta-learning for the automatic choice of architecture and components in ensemble systems. In 2014 Brazilian Conference on Intelligent Systems, pages 1–6. IEEE.

Nelli, F. (2018). Python data analytics with Pandas, NumPy, and Matplotlib. Springer.

Ninalga, D. (2023). Cordyceps@lt-edi: Depression detection with reddit and self-training.

Parvin, A. S. and Saleena, B. (2020). An ensemble classifier model to predict credit scoring-comparative analysis. In 2020 IEEE international symposium on smart electronic systems (iSES)(Formerly iNiS), pages 27–30. IEEE.

Pölsterl, S. (2020). scikit-survival: A library for time-to-event analysis built on top of scikit-learn. Journal of Machine Learning Research, 21(212):1–6.

Rodrigues, F. M., Câmara, C. J., Canuto, A. M., and Santos, A. M. (2014). Confidence factor and feature selection for semi-supervised multi-label classification methods. In 2014 International Joint Conference on Neural Networks (IJCNN), pages 864–871. IEEE.

Sanches, M. K. (2003). Aprendizado de máquina semi-supervisionado: proposta de um algoritmo para rotular exemplos a partir de poucos exemplos rotulados. PhD thesis, Universidade de São Paulo.

Smith, J. W., Everhart, J. E., Dickson, W., Knowler, W. C., and Johannes, R. S. (1988). Using the adap learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the annual symposium on computer application in medical care, page 261. American Medical Informatics Association.

Vale, K. M. O., Canuto, A. M. d. P., de Medeiros Santos, A., e Gorgônio, F. d. L., Tavares, A. d. M., Gorgnio, A. C., and Alves, C. T. (2018). Automatic adjustment of confidence values in self-training semi-supervised method. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE.

Vale, K. M. O., Gorgônio, A. C., Flavius Da Luz, E. G., and Canuto, A. M. D. P. (2021). An efficient approach to select instances in self-training and co-training semi-supervised methods. IEEE Access, 10:7254–7276.

Wang, M., Fu, W., Hao, S., Tao, D., and Wu, X. (2016). Scalable semi-supervised learning by efficient anchor graph regularization. IEEE Transactions on Knowledge and Data Engineering, 28(7):1864–1877.

Wei, W., Jiang, F., Yu, X., and Du, J. (2022). An ensemble learning algorithm based on resampling and hybrid feature selection, with an application to software defect prediction. In 2022 7th International Conference on Information and Network Technologies (ICINT), pages 52–56. IEEE.

Wolpert, D. H. and Macready, W. G. (1997). No free lunch theorems for optimization. IEEE transactions on evolutionary computation, 1(1):67–82.

Xinqin, L., Tianyun, S., Ping, L., and Wen, Z. (2019). Application of bagging ensemble classifier based on genetic algorithm in the text classification of railway fault hazards. In 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD), pages 286–290. IEEE.

Xu, C. and Li, J. (2023). Borrowing human senses: Comment-aware self-training for social media multimodal classification.

Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In 33rd annual meeting of the association for computational linguistics, pages 189–196.

Analysing a New Experimental Design Impact on the Performance of FlexCon-CE

Abstract

References

Most read articles by the same author(s)