Analysing a New Experimental Design Impact on the Performance of FlexCon-CE
Resumo
O aprendizado semissupervisionado é um subcampo do aprendizado de máquina que explora o uso combinado de instâncias rotuladas e não rotuladas, sendo estas últimas mais numerosas que as primeiras. Este tipo de aprendizado é essencialmente uma interseção do aprendizado supervisionado e não supervisionado. O Self-training se destaca dentre os diversos algoritmos utilizados nesse contexto devido à sua ampla aplicação na literatura. Ao longo dos anos, diversas variantes do Self-training foram desenvolvidas para melhorar seu desempenho, incluindo o FlexCon-CE, que serve de base para este trabalho. O FlexCon-CE é um algoritmo que emprega comitês de classificadores para selecionar e rotular instâncias não rotuladas, integrando-as ao conjunto de dados rotulado. Este estudo tem como objetivo analisar o desempenho do FlexCon-CE ampliando o número de bases de dados analisados, adicionando métricas de avaliação de desempenho e variando os percentuais de instâncias inicialmente rotuladas. Os resultados mostram que o FlexCon-CE tem um bom desempenho, independentemente do design experimental e em diferentes conjuntos de dados.
Palavras-chave:
Aprendizado semissupervisionado, Self-training, FlexCon-CE
Referências
Bisong, E. et al. (2019). Building machine learning and deep learning models on Google cloud platform. Springer.
Breiman, L. (1996). Bias, variance, and arcing classifiers. Technical report, Tech. Rep. 460, Statistics Department, University of California, Berkeley . . . .
Chapelle, O., Scholkopf, B., and Zien, A. (2006). Semi-Supervised Learning. The MIT Press.
Chapelle, O., Scholkopf, B., and Zien, A. (2009). Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, 20(3):542–542.
Dheeru, D. and Taniskidou, E. K. (2017). Uci machine learning repository.
Gharroudi, O. (2017). Ensemble multi-label learning in supervised and semi-supervised settings. PhD thesis, Université de Lyon.
Gorgônio, A. C., Alves, C. T., Lucena, A. J., Gorgônio, F. L., Vale, K. M., and Canuto, A. M. (2019). Análise da variação do limiar para o algoritmo de aprendizado semissupervisionado flexcon-c. Brazilian Journal of Development, 5(11):26654–26669.
Harris, C. R., Millman, K. J., Van Der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., et al. (2020). Array programming with numpy. Nature, 585(7825):357–362.
Jan, M. Z. and Verma, B. (2019). A novel diversity measure and classifier selection approach for generating ensemble classifiers. Ieee Access, 7:156360–156373.
Knisely, B. M. and Pavliscsak, H. H. (2023). Research proposal content extraction using natural language processing and semi-supervised clustering: A demonstration and comparative analysis. Scientometrics.
Kuncheva, L. I. (2014). Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, Hoboken, NJ, 2nd edition.
Li, K., Li, X., Yin, R., and Wang, L. (2024a). A method for seismic fault identification based on self-training with high-stability pseudo-labels. Applied Soft Computing, page 111894.
Li, S., Kou, P., Ma, M., Yang, H., Huang, S., and Yang, Z. (2024b). Application of semi-supervised learning in image classification: Research on fusion of labeled and unlabeled data. IEEE Access, 12:27331–27343.
Ma, H., Jiang, F., Rong, Y., Guo, Y., and Huang, J. (2024). Toward robust self-training paradigm for molecular prediction tasks. Journal of Computational Biology, 31(3):213–228.
Medeiros, A., Gorgônio, A. C., Vale, K. M. O., Gorgônio, F. L., and Canuto, A. M. d. P. (2023). Flexcon-ce: A semi-supervised method with an ensemble-based adaptive confidence. In Brazilian Conference on Intelligent Systems, pages 95–109. Springer.
Monard, M. C. and Baranauskas, J. A. (2003). Conceitos sobre aprendizado de máquina. Sistemas inteligentes-Fundamentos e aplicações, 1(1):32.
Nascimento, D. S., Canuto, A. M., and Coelho, A. L. (2014). An empirical analysis of meta-learning for the automatic choice of architecture and components in ensemble systems. In 2014 Brazilian Conference on Intelligent Systems, pages 1–6. IEEE.
Nelli, F. (2018). Python data analytics with Pandas, NumPy, and Matplotlib. Springer.
Ninalga, D. (2023). Cordyceps@lt-edi: Depression detection with reddit and self-training.
Parvin, A. S. and Saleena, B. (2020). An ensemble classifier model to predict credit scoring-comparative analysis. In 2020 IEEE international symposium on smart electronic systems (iSES)(Formerly iNiS), pages 27–30. IEEE.
Pölsterl, S. (2020). scikit-survival: A library for time-to-event analysis built on top of scikit-learn. Journal of Machine Learning Research, 21(212):1–6.
Rodrigues, F. M., Câmara, C. J., Canuto, A. M., and Santos, A. M. (2014). Confidence factor and feature selection for semi-supervised multi-label classification methods. In 2014 International Joint Conference on Neural Networks (IJCNN), pages 864–871. IEEE.
Sanches, M. K. (2003). Aprendizado de máquina semi-supervisionado: proposta de um algoritmo para rotular exemplos a partir de poucos exemplos rotulados. PhD thesis, Universidade de São Paulo.
Smith, J. W., Everhart, J. E., Dickson, W., Knowler, W. C., and Johannes, R. S. (1988). Using the adap learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the annual symposium on computer application in medical care, page 261. American Medical Informatics Association.
Vale, K. M. O., Canuto, A. M. d. P., de Medeiros Santos, A., e Gorgônio, F. d. L., Tavares, A. d. M., Gorgnio, A. C., and Alves, C. T. (2018). Automatic adjustment of confidence values in self-training semi-supervised method. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE.
Vale, K. M. O., Gorgônio, A. C., Flavius Da Luz, E. G., and Canuto, A. M. D. P. (2021). An efficient approach to select instances in self-training and co-training semi-supervised methods. IEEE Access, 10:7254–7276.
Wang, M., Fu, W., Hao, S., Tao, D., and Wu, X. (2016). Scalable semi-supervised learning by efficient anchor graph regularization. IEEE Transactions on Knowledge and Data Engineering, 28(7):1864–1877.
Wei, W., Jiang, F., Yu, X., and Du, J. (2022). An ensemble learning algorithm based on resampling and hybrid feature selection, with an application to software defect prediction. In 2022 7th International Conference on Information and Network Technologies (ICINT), pages 52–56. IEEE.
Wolpert, D. H. and Macready, W. G. (1997). No free lunch theorems for optimization. IEEE transactions on evolutionary computation, 1(1):67–82.
Xinqin, L., Tianyun, S., Ping, L., and Wen, Z. (2019). Application of bagging ensemble classifier based on genetic algorithm in the text classification of railway fault hazards. In 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD), pages 286–290. IEEE.
Xu, C. and Li, J. (2023). Borrowing human senses: Comment-aware self-training for social media multimodal classification.
Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In 33rd annual meeting of the association for computational linguistics, pages 189–196.
Breiman, L. (1996). Bias, variance, and arcing classifiers. Technical report, Tech. Rep. 460, Statistics Department, University of California, Berkeley . . . .
Chapelle, O., Scholkopf, B., and Zien, A. (2006). Semi-Supervised Learning. The MIT Press.
Chapelle, O., Scholkopf, B., and Zien, A. (2009). Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, 20(3):542–542.
Dheeru, D. and Taniskidou, E. K. (2017). Uci machine learning repository.
Gharroudi, O. (2017). Ensemble multi-label learning in supervised and semi-supervised settings. PhD thesis, Université de Lyon.
Gorgônio, A. C., Alves, C. T., Lucena, A. J., Gorgônio, F. L., Vale, K. M., and Canuto, A. M. (2019). Análise da variação do limiar para o algoritmo de aprendizado semissupervisionado flexcon-c. Brazilian Journal of Development, 5(11):26654–26669.
Harris, C. R., Millman, K. J., Van Der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., et al. (2020). Array programming with numpy. Nature, 585(7825):357–362.
Jan, M. Z. and Verma, B. (2019). A novel diversity measure and classifier selection approach for generating ensemble classifiers. Ieee Access, 7:156360–156373.
Knisely, B. M. and Pavliscsak, H. H. (2023). Research proposal content extraction using natural language processing and semi-supervised clustering: A demonstration and comparative analysis. Scientometrics.
Kuncheva, L. I. (2014). Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, Hoboken, NJ, 2nd edition.
Li, K., Li, X., Yin, R., and Wang, L. (2024a). A method for seismic fault identification based on self-training with high-stability pseudo-labels. Applied Soft Computing, page 111894.
Li, S., Kou, P., Ma, M., Yang, H., Huang, S., and Yang, Z. (2024b). Application of semi-supervised learning in image classification: Research on fusion of labeled and unlabeled data. IEEE Access, 12:27331–27343.
Ma, H., Jiang, F., Rong, Y., Guo, Y., and Huang, J. (2024). Toward robust self-training paradigm for molecular prediction tasks. Journal of Computational Biology, 31(3):213–228.
Medeiros, A., Gorgônio, A. C., Vale, K. M. O., Gorgônio, F. L., and Canuto, A. M. d. P. (2023). Flexcon-ce: A semi-supervised method with an ensemble-based adaptive confidence. In Brazilian Conference on Intelligent Systems, pages 95–109. Springer.
Monard, M. C. and Baranauskas, J. A. (2003). Conceitos sobre aprendizado de máquina. Sistemas inteligentes-Fundamentos e aplicações, 1(1):32.
Nascimento, D. S., Canuto, A. M., and Coelho, A. L. (2014). An empirical analysis of meta-learning for the automatic choice of architecture and components in ensemble systems. In 2014 Brazilian Conference on Intelligent Systems, pages 1–6. IEEE.
Nelli, F. (2018). Python data analytics with Pandas, NumPy, and Matplotlib. Springer.
Ninalga, D. (2023). Cordyceps@lt-edi: Depression detection with reddit and self-training.
Parvin, A. S. and Saleena, B. (2020). An ensemble classifier model to predict credit scoring-comparative analysis. In 2020 IEEE international symposium on smart electronic systems (iSES)(Formerly iNiS), pages 27–30. IEEE.
Pölsterl, S. (2020). scikit-survival: A library for time-to-event analysis built on top of scikit-learn. Journal of Machine Learning Research, 21(212):1–6.
Rodrigues, F. M., Câmara, C. J., Canuto, A. M., and Santos, A. M. (2014). Confidence factor and feature selection for semi-supervised multi-label classification methods. In 2014 International Joint Conference on Neural Networks (IJCNN), pages 864–871. IEEE.
Sanches, M. K. (2003). Aprendizado de máquina semi-supervisionado: proposta de um algoritmo para rotular exemplos a partir de poucos exemplos rotulados. PhD thesis, Universidade de São Paulo.
Smith, J. W., Everhart, J. E., Dickson, W., Knowler, W. C., and Johannes, R. S. (1988). Using the adap learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the annual symposium on computer application in medical care, page 261. American Medical Informatics Association.
Vale, K. M. O., Canuto, A. M. d. P., de Medeiros Santos, A., e Gorgônio, F. d. L., Tavares, A. d. M., Gorgnio, A. C., and Alves, C. T. (2018). Automatic adjustment of confidence values in self-training semi-supervised method. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE.
Vale, K. M. O., Gorgônio, A. C., Flavius Da Luz, E. G., and Canuto, A. M. D. P. (2021). An efficient approach to select instances in self-training and co-training semi-supervised methods. IEEE Access, 10:7254–7276.
Wang, M., Fu, W., Hao, S., Tao, D., and Wu, X. (2016). Scalable semi-supervised learning by efficient anchor graph regularization. IEEE Transactions on Knowledge and Data Engineering, 28(7):1864–1877.
Wei, W., Jiang, F., Yu, X., and Du, J. (2022). An ensemble learning algorithm based on resampling and hybrid feature selection, with an application to software defect prediction. In 2022 7th International Conference on Information and Network Technologies (ICINT), pages 52–56. IEEE.
Wolpert, D. H. and Macready, W. G. (1997). No free lunch theorems for optimization. IEEE transactions on evolutionary computation, 1(1):67–82.
Xinqin, L., Tianyun, S., Ping, L., and Wen, Z. (2019). Application of bagging ensemble classifier based on genetic algorithm in the text classification of railway fault hazards. In 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD), pages 286–290. IEEE.
Xu, C. and Li, J. (2023). Borrowing human senses: Comment-aware self-training for social media multimodal classification.
Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In 33rd annual meeting of the association for computational linguistics, pages 189–196.
Publicado
17/11/2024
Como Citar
COSTA, Renan M. R. A.; SILVA, Luiz M. S.; GORGÔNIO, Arthur C.; GORGÔNIO, Flavius L.; VALE, Karliane M. O..
Analysing a New Experimental Design Impact on the Performance of FlexCon-CE. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 21. , 2024, Belém/PA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 894-905.
ISSN 2763-9061.
DOI: https://doi.org/10.5753/eniac.2024.245241.