Semi-supervised Predictive Clustering Trees for Multi-label Protein Subcellular Localization
Resumo
Protein subcellular localization is an important classification task because the location of proteins in a cell is directly linked to their functions. Since a protein can act at two or more locations simultaneously, multi-label classification algorithms are necessary. The currently used algorithms are usually based on supervised learning, which presents some disadvantages such as (i) a need for a large amount of labeled instances for training; (ii) a waste of valuable information that labeled instances can provide; and (iii) a high cost involved in obtaining labeled instances for training. To overcome these disadvantages, semi-supervised learning can be applied, where classifiers exploit both labeled and unlabeled data. Thus, in this paper, we propose a new semi-supervised algorithm for multi-label protein subcellular localization. Our proposal is based on decision tree classifiers induced using predictive clustering trees. We investigate many semi-supervised protein subcellular localization scenarios to test whether unlabeled instances can improve the multi-label classification process. Our results show that the proposal can achieve competitive or better results when compared to the pure supervised version of the predictive clustering trees.
Publicado
17/11/2024
Como Citar
ALCANTARA, Leonardo U.; TRIGUERO, Isaac; CERRI, Ricardo.
Semi-supervised Predictive Clustering Trees for Multi-label Protein Subcellular Localization. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 384-399.
ISSN 2643-6264.