Evaluating the Generalization of Neural Network-Based Pan-Cancer Classification Models for Cohort-Specific Predictions

  • Thomas Fontanari UFRGS / HCPA
  • Mariana Recamonde-Mendoza UFRGS / HCPA

Resumo


This study develops and evaluates pan-cancer (PC) models for cohort-specific (CS) predictions using neural networks (NNs). We adopt a dual approach, including a method inspired by few-shot learning, aiming at improving the models’ ability to distinguish between normal and tumorous tissues across diverse cohorts. The first approach trains a NN with comprehensive PC datasets containing 16 cancer types, comparing it against CS models on a target cohort, while the second analyzes whether PC models could generalize to smaller and unseen cohorts by training on 15 cohorts and evaluating on the excluded cohort. Our experiments show that PC models generally outperform CS models, even with limited sample sizes and class imbalances. Moreover, the few-shot approach successfully generalizes to other cancer types, highlighting its potential to advance personalized cancer diagnosis and treatment.

Referências

Albaradei, S., Napolitano, F., Thafar, M. A., Gojobori, T., Essack, M., and Gao, X. (2021). MetaCancer: A deep learning-based pan-cancer metastasis prediction model developed using multi-omics data. Computational and Structural Biotechnology Journal, 19:4404–4411.

Alharbi, F. and Vakanski, A. (2023). Machine learning methods for cancer classification using gene expression data: A review. Bioengineering, 10(2):173.

Chaudhary, K., Poirion, O. B., Lu, L., and Garmire, L. X. (2018). Deep learning–based multi-omics integration robustly predicts survival in liver cancer. Clinical Cancer Research, 24(6):1248–1259.

Chen, X., Zhang, T., Su, W., Dou, Z., Zhao, D., Jin, X., Lei, H., Wang, J., Xie, X., Cheng, B., Li, Q., Zhang, H., and Di, C. (2022). Mutant p53 in cancer: from molecular mechanism to therapeutic modulation. Cell Death & Disease, 13(11):974.

Divate, M., Tyagi, A., Richard, D. J., Prasad, P. A., Gowda, H., and Nagaraj, S. H. (2022). Deep learning-based pan-cancer classification model reveals tissue-of-origin specific gene expression signatures. Cancers, 14(5):1185.

Duan, R., Gao, L., Gao, Y., Hu, Y., Xu, H., Huang, M., Song, K., Wang, H., Dong, Y., Jiang, C., Zhang, C., and Jia, S. (2021). Evaluation and comparison of multi-omics data integration methods for cancer subtyping. PLoS Computational Biology, 17(8):1–33.

Goldman, M. J., Craft, B., Hastie, M., Repečka, K., McDade, F., Kamath, A., Banerjee, A., Luo, Y., Rogers, D., Brooks, A. N., Zhu, J., and Haussler, D. (2020). Visualizing and interpreting cancer genomics data via the Xena platform. Nature Biotechnology, 38(6):675–678.

Hanczar, B., Bourgeais, V., and Zehraoui, F. (2022). Assessment of deep learning and transfer learning for cancer prediction based on gene expression data. BMC Bioinformatics, 23(1):262.

Hayakawa, J., Seki, T., Kawazoe, Y., and Ohe, K. (2022). Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma. PLOS ONE, 17(6):e0269570.

Khalsan, M., Machado, L. R., Al-Shamery, E. S., Ajit, S., Anthony, K., Mu, M., and Agyeman, M. O. (2022). A survey of machine learning approaches applied to gene expression analysis for cancer prediction. IEEE Access, 10:27522–27534.

Khorshed, T., Moustafa, M. N., and Rafea, A. (2020). Deep Learning for Multi-Tissue Cancer Classification of Gene Expressions (GeneXNet). IEEE Access, 8:90615–90629.

Koh, W. and Hoon, S. (2021). MapCell: Learning a Comparative Cell Type Distance Metric With Siamese Neural Nets With Applications Toward Cell-Type Identification Across Experimental Datasets. Frontiers in Cell and Developmental Biology, 9:767897.

Lee, S., Lim, S., Lee, T., Sung, I., and Kim, S. (2020). Cancer subtype classification and modeling by pathway attention and propagation. Bioinformatics, 36(12):3818–3824.

Li, R., Li, L., Xu, Y., and Yang, J. (2022). Machine learning meets omics: applications and perspectives. Briefings in Bioinformatics, 23(1):bbab460.

Loshchilov, I. and Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.

Ma, Z., Lu, Y. Y., Wang, Y., Lin, R., Yang, Z., Zhang, F., and Wang, Y. (2022). Metric learning for comparing genomic data with triplet network. Briefings in Bioinformatics, 23(5):bbac345.

Mohammed, M., Mwambi, H., Mboya, I. B., Elbashir, M. K., and Omolo, B. (2021). A stacking ensemble deep learning approach to cancer type classification based on TCGA data. Scientific Reports, 11(1):1–22.

Mostavi, M., Chiu, Y.-C., Chen, Y., and Huang, Y. (2021). Cancer-Siamese: one-shot learning for predicting primary and metastatic tumor types unseen during model training. BMC Bioinformatics, 22(1):244.

Mostavi, M., Chiu, Y.-C., Huang, Y., and Chen, Y. (2020). Convolutional neural network models for cancer type prediction based on gene expression. BMC Medical Genomics, 13:1–13.

Ramirez, R., Chiu, Y.-C., Hererra, A., Mostavi, M., Ramirez, J., Chen, Y., Huang, Y., and Jin, Y.-F. (2020). Classification of cancer types using graph convolutional neural networks. Frontiers in Physics, 8:203.

Wang, A., Liu, H., Yang, J., and Chen, G. (2022). Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data. Computers in Biology and Medicine, 142(33):105208.

Wang, Y., Yao, Q., Kwok, J. T., and Ni, L. M. (2020). Generalizing from a Few Examples: A Survey on Few-Shot Learning. ACM Computing Surveys, 53(3).

Yu, H., Samuels, D. C., Zhao, Y.-y., and Guo, Y. (2019). Architectures and accuracy of artificial neural network for disease classification from omics data. BMC Genomics, 20(1).

Zhang, T.-H., Hasib, M. M., Chiu, Y.-C., Han, Z.-F., Jin, Y.-F., Flores, M., Chen, Y., and Huang, Y. (2022). Transformer for Gene Expression Modeling (T-GEM): An Interpretable Deep Learning Model for Gene Expression-Based Phenotype Predictions. Cancers, 14(19):4763.

Zuo, S., Dai, G., and Ren, X. (2019). Identification of a 6-gene signature predicting prognosis for colorectal cancer. Cancer Cell International, 19(1):1–15.
Publicado
02/12/2024
FONTANARI, Thomas; RECAMONDE-MENDOZA, Mariana. Evaluating the Generalization of Neural Network-Based Pan-Cancer Classification Models for Cohort-Specific Predictions. In: SIMPÓSIO BRASILEIRO DE BIOINFORMÁTICA (BSB), 17. , 2024, Vitória/ES. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 12-23. ISSN 2316-1248. DOI: https://doi.org/10.5753/bsb.2024.245165.