Breast Cancer Subtypes Classification: A study based on representative genes

  • João Reis UFAM
  • Rayol M. Neto UFAM
  • Fabíola G. Nakamura UFAM
  • Eduardo F. Nakamura UFAM

Abstract


Breast cancer is the second most common cancer type and is the leading cause of cancer-related deaths worldwide. Since it is a heterogeneous disease, subtyping breast cancer plays an important role in performing a specific treatment. In this work, we propose an approach that uses different machine learning techniques for a broader analysis of the PAM50 list in the classification of breast cancer subtypes. The experiments show that the best method to be used in the classification of breast cancer subtypes is the SVM with linear kernel, which presented an F1 score of 0.97 for the Basal subtype and 0.83 for the Her 2 subtype, the two subtypes with worse prognosis, respectively.
Keywords: Breast cancer, Classification, Gene expression

References

Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13(1):281–305.

Bray, F., Ferlay, J., Soerjomataram, I., L. Siegel, R., Torre, L., and Jemal, A. (2018). Global cancer statistics 2018. CA: A Cancer Journal for Clinicians, 68:394–424.

Chen, X., Hu, H., He, L., Yu, X., Liu, X., Zhong, R., and Shu, M. (2016). A novel subtype classification and risk of breast cancer by histone modification profiling. Breast cancer research and treatment, 157(2):267–279.

Chia, S. K., Bramwell, V. H., Tu, D., et al. (2012). A 50-gene intrinsic subtype classifier for prognosis and prediction of benefit from adjuvant tamoxifen. Clinical cancer research, 18(16):4465–4472.

Chicco, D. (2017). Ten quick tips for machine learning in computational biology. BioData mining, 10(1):1–17.

Dwivedi, S., Purohit, P., Misra, R., Lingeswaran, M., et al. (2019). Application of singlecell omics in breast cancer. In Single-Cell Omics, volume 2, pages 69–103.

Graudenzi, A., Cava, C., Bertoli, G., Fromm, B., et al. (2017). Pathway-based classification of breast cancer subtypes. Front Biosci, 22(10):1697–1712.

Jiang, D., Tang, C., and Zhang, A. (2004). Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge & Data Engineering, (11):1370–1386.

Lee, S., Lim, S., Lee, T., Sung, I., and Kim, S. (2020). Cancer subtype classification and modeling by pathway attention and propagation. Bioinformatics, 36(12):3818–3824.

Mostavi, M., Chiu, Y.-C., et al. (2020). Convolutional neural network models for cancer type prediction based on gene expression. BMC Medical Genomics, 13(44):1–13.

Parker, J. S., Mullins, M., Cheang, M. C., et al. (2009). Supervised risk predictor of breast cancer based on intrinsic subtypes. Journal of Clinical Oncology, 27(8):1160–1167.
Published
2021-07-18
REIS, João; M. NETO, Rayol; NAKAMURA, Fabíola G.; NAKAMURA, Eduardo F.. Breast Cancer Subtypes Classification: A study based on representative genes. In: INTEGRATED SOFTWARE AND HARDWARE SEMINAR (SEMISH), 48. , 2021, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 279-287. ISSN 2595-6205. DOI: https://doi.org/10.5753/semish.2021.15833.