Semi-supervised Predictive Clustering Trees for Multi-label Protein Subcellular Localization

  • Leonardo U. Alcantara UFSCar
  • Isaac Triguero University of Granada
  • Ricardo Cerri USP

Resumo


Protein subcellular localization is an important classification task because the location of proteins in a cell is directly linked to their functions. Since a protein can act at two or more locations simultaneously, multi-label classification algorithms are necessary. The currently used algorithms are usually based on supervised learning, which presents some disadvantages such as (i) a need for a large amount of labeled instances for training; (ii) a waste of valuable information that labeled instances can provide; and (iii) a high cost involved in obtaining labeled instances for training. To overcome these disadvantages, semi-supervised learning can be applied, where classifiers exploit both labeled and unlabeled data. Thus, in this paper, we propose a new semi-supervised algorithm for multi-label protein subcellular localization. Our proposal is based on decision tree classifiers induced using predictive clustering trees. We investigate many semi-supervised protein subcellular localization scenarios to test whether unlabeled instances can improve the multi-label classification process. Our results show that the proposal can achieve competitive or better results when compared to the pure supervised version of the predictive clustering trees.
Publicado
17/11/2024
ALCANTARA, Leonardo U.; TRIGUERO, Isaac; CERRI, Ricardo. Semi-supervised Predictive Clustering Trees for Multi-label Protein Subcellular Localization. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 384-399. ISSN 2643-6264.