Methodological Insights into Cancer Driver Gene Prediction: Comparing Graph-Based and Traditional ML Approaches

  • Renan Soares de Andrades Universidade Federal do Rio Grande do Sul (UFRGS) / Hospital de Clínicas de Porto Alegre (HCPA) https://orcid.org/0000-0003-1102-7359
  • Mariana Recamonde Mendoza Universidade Federal do Rio Grande do Sul (UFRGS) / Hospital de Clínicas de Porto Alegre (HCPA)

Resumo


This study investigates the impact of key methodological choices on cancer driver gene (CDG) prediction using graph neural networks (GNNs) and traditional machine learning (ML) models. We evaluate three GNN architectures, four ML algorithms, three protein–protein interaction networks, multiple node feature configurations (single-omics, multi-omics, centrality measures), and three strategies to mitigate class imbalance. Graph Convolutional Networks consistently outperform other GNNs, while Gradient Boosted Trees remain competitive when structural features are included. Node centrality measures further enhance prediction across models. These results underscore the role of feature design and model selection in achieving accurate and robust CDG prediction.
Palavras-chave: machine learning, cancer driver genes, graph-based learning, bioinformatics

Referências

Andrades, R. and Recamonde-Mendoza, M. (2022). Machine learning methods for prediction of cancer driver genes: a survey paper. Briefings in Bioinformatics, 23(3). bbac062.

Huang, J. K., Carlin, D. E., Yu, M. K., Zhang, W., Kreisberg, J. F., Tamayo, P., and Ideker, T. (2018). Systematic evaluation of molecular networks for discovery of disease genes. Cell Systems, 6(4):484–495.

Jung, S., Wang, S., and Lee, D. (2024). CancerGATE: Prediction of cancer-driver genes using graph attention autoencoders. Computers in Biology and Medicine, 176:108568.

Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, pages 2980–2988.

Ostroverkhova, D., Przytycka, T. M., and Panchenko, A. R. (2023). Cancer driver mutations: predictions and reality. Trends in Molecular Medicine, 29(7):554–566.

Peng, W., Wu, R., Dai, W., and Yu, N. (2023). Identifying cancer driver genes based on multi-view heterogeneous graph convolutional network and self-attention mechanism. BMC Bioinformatics, 24(1):16.

Pratt, D., Chen, J., Pillich, R., Rynkov, V., Gary, A., Demchak, B., and Ideker, T. (2017). Ndex 2.0: a clearinghouse for research on cancer pathways. Cancer Research, 77(21):e58–e61.

Rogers, M. F., Gaunt, T. R., and Campbell, C. (2020). Prediction of driver variants in the cancer genome via machine learning methodologies. Briefings in Bioinformatics, 22(4). bbaa250.

Schulte-Sasse, R., Budach, S., Hnisz, D., and Marsico, A. (2021). Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms. Nature Machine Intelligence, 3(6):513–526.

Song, H., Yin, C., Li, Z., Feng, K., Cao, Y., Gu, Y., and Sun, H. (2023). Identification of cancer driver genes by integrating multiomics data with graph neural networks. Metabolites, 13(3):339.

Wang, L., Zhou, J., Wang, X., Wang, Y., and Li, J. (2024). MCDHGN: heterogeneous network-based cancer driver gene prediction and interpretability analysis. Bioinformatics, 40(6):btae362.

WHO, W. H. O. (2024). Global cancer burden growing, amidst mounting need for services. [link] [Accessed: May 2025].

Zhang, H., Lin, C., Chen, Y., Shen, X., Wang, R., Chen, Y., and Lyu, J. (2025). Enhancing molecular network-based cancer driver gene prediction using machine learning approaches: Current challenges and opportunities. Journal of Cellular and Molecular Medicine, 29(1):e70351.

Zhang, X.-M., Liang, L., Liu, L., and Tang, M.-J. (2021). Graph neural networks and their current applications in bioinformatics. Frontiers in Genetics, 12.
Publicado
29/09/2025
ANDRADES, Renan Soares de; RECAMONDE MENDOZA, Mariana. Methodological Insights into Cancer Driver Gene Prediction: Comparing Graph-Based and Traditional ML Approaches. In: SIMPÓSIO BRASILEIRO DE BIOINFORMÁTICA (BSB), 18. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 130-141. ISSN 2316-1248. DOI: https://doi.org/10.5753/bsb.2025.14617.