Analyzing different cancer mutation data sets from breast invasive carcinoma (BRCA), lung adenocarcinoma (LUAD), and prostate adenocarcinoma (PRAD)

  • Rodrigo Ramos IFSP
  • Jorge Cutigi IFSP
  • Cynthia Ferreira USP
  • Adriane Evangelista HCBarretos
  • Adenilso Simão USP


With the advancements of next-generation sequencing (NGS) technologies, a massive volume of genetic data has been generated. It makes possible the study of complex disease by computational approaches. In the context of cancer, there is a huge variety of mutation data in public databases. However, it is not feasible to use all available data in every analysis; thus, a data subset must be selected. This work is aiming to investigate and understand the mutational characteristics presented in different cancer mutation data sets of the same type of cancer. To achieve this goal, exploration and visualization of cancer mutation data were performed. Several analyses are presented for three common types of cancer: 1) Breast Invasive Carcinoma (BRCA); 2) Lung Adenocarcinoma (LUAD); and Prostate Adenocarcinoma (PRAD). For each cancer type, three distinct data sets were analyzed in order to understand if there are significant differences or similarities among them. The analyses show that BRCA and LUAD have evidence of similarity among their data sets, while PRAD is likely heterogeneous.


Abeshouse, A., Ahn, J., Akbani, R., Ally, A., Amin, S., Andry, C. D., Annala, M., Aprikian, A., Armenia, J., Arora, A., et al. (2015). The molecular taxonomy of primary prostate cancer. Cell, 163(4):1011–1025.

Barbieri, C. E., Baca, S. C., Lawrence, M. S., Demichelis, F., Blattner, M., Theurillat, J.-P., White, T. A., Stojanov, P., Van Allen, E., Stransky, N., et al. (2012). Exome sequencing identifies recurrent spop, foxa1 and med12 mutations in prostate cancer. Nature genetics, 44(6):685–689.

Cerami, E., Gao, J., Dogrusoz, U., Gross, B. E., Sumer, S. O., Aksoy, B. A., Jacobsen, A., Byrne, C. J., Heuer, M. L., Larsson, E., Antipin, Y., Reva, B., Goldberg, A. P., Sander, C., and Schultz, N. (2012). The cbio cancer genomics portal: An open platform for exploring multidimensional cancer genomics data. Cancer Discovery, 2(5):401–404.

Chazal, F. and Michel, B. (2017). An introduction to topological data analysis: fundamental and practical aspects for data scientists. arXiv preprint arXiv:1710.04019.

Cho, A., Shim, J. E., Kim, E., Supek, F., Lehner, B., and Lee, I. (2016). Muffinn: cancer gene discovery via network analysis of somatic mutation data. Genome Biology, 17(1):129.

Ciriello, G., Gatza, M. L., Beck, A. H., Wilkerson, M. D., Rhie, S. K., Pastore, A., Zhang, H., McLellan, M., Yau, C., Kandoth, C., et al. (2015). Comprehensive molecular portraits of invasive lobular breast cancer. Cell, 163(2):506–519.

Collisson, E., Campbell, J., Brooks, A., and others. (2014). Comprehensive molecular profiling of lung adenocarcinoma. Nature, 511(7511):543–550.

COSMIC (2019). Mutational signatures. [Online; accessed March-2020].

Demkow, U. and Ploski, R. (2015). Clinical applications for next-generation sequencing. Academic Press.

Gao, J., Aksoy, B. A., Dogrusoz, U., Dresdner, G., Gross, B., Sumer, S. O., Sun, Y., Jacobsen, A., Sinha, R., Larsson, E., et al. (2013). Integrative analysis of complex cancer genomics and clinical profiles using the cbioportal. Sci. Signal., 6(269):pl1– pl1.

Hristov, B. H. and Singh, M. (2017). Network-based coverage of mutational profiles reveals cancer genes. Cell systems, 5(3):221–229.

Imielinski, M., Berger, A. H., Hammerman, P. S., Hernandez, B., Pugh, T. J., Hodis, E., Cho, J., Suh, J., Capelletti, M., Sivachenko, A., et al. (2012). Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell, 150(6):1107–1120.

Koboldt, D., Fulton, R., McLellan, M., et al. (2012). Comprehensive molecular portraits of human breast tumours. Nature, 490(7418):61.

Kumar, A., Coleman, I., Morrissey, C., Zhang, X., True, L. D., Gulati, R., Etzioni, R., Bolouri, H., Montgomery, B., White, T., et al. (2016). Substantial interindividual and limited intraindividual genomic diversity among tumors from men with metastatic prostate cancer. Nature medicine, 22(4):369.

Lawrence, M. S., Stojanov, P., Polak, P., Kryukov, G. V., Cibulskis, K., Sivachenko, A., Carter, S. L., Stewart, C., Mermel, C. H., Roberts, S. A., Kiezun, A., Hammerman, P. S., McKenna, A., Drier, Y., Zou, L., Ramos, A. H., Pugh, T. J., Stransky, N., Helman, E., Kim, J., Sougnez, C., Ambrogio, L., Nickerson, E., Shefler, E., Cortes, M. L.,áuclair, D., Saksena, G., Voet, D., Noble, M., and DiCara, D. (2013). Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature, 499.

Nicolau, M., Levine, A. J., and Carlsson, G. (2011). Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences, 108(17):7265–7270.

Patania, A., Selvaggi, P., Veronese, M., Dipasquale, O., Expert, P., and Petri, G. (2019). Topological gene expression networks recapitulate brain anatomy and function. Network Neuroscience, 3(3):744–762.

Shyr, C., Tarailo-Graovac, M., Gottlieb, M., Lee, J. J., van Karnebeek, C., and Wasserman, W. W. (2014). Flags, frequently mutated genes in public exomes. BMC medical genomics, 7(1):64.

Singh, G., Memoli, F., and Carlsson, G. E. (2007). Topological methods for the analysisóf high dimensional data sets and 3d object recognition. In SPBG, pages 91–100.

Tamborero, D., Gonzalez-Perez, A., Perez-Llamas, C., Deu-Pons, J., Kandoth, C., Reimand, J., Lawrence, M. S., Getz, G., Bader, G. D., Ding, L., and Lopez-Bigas, N. (2013). Comprehensive identification of mutational cancer driver genes across 12 tumor types. Scientific Reports, 3:2650–.

TCGA (2020). The cancer genome atlas. [Online; accessed March-2020].

TCGA, C. . (2018). TCGA, cell 2018. [Online; accessed March-2020].

Veen, H. J. V. and Saul, N. (2017). Keplermapper: a python class for visualization of highdimensional data and 3-D point cloud data. [Online; accessed March-2020].

WHO (2018). Cancer – (world health organization). [Online; accessed March-2020].
Como Citar

Selecione um Formato
RAMOS, Rodrigo; CUTIGI, Jorge; FERREIRA, Cynthia; EVANGELISTA, Adriane; SIMÃO, Adenilso. Analyzing different cancer mutation data sets from breast invasive carcinoma (BRCA), lung adenocarcinoma (LUAD), and prostate adenocarcinoma (PRAD). In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 20. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 37-48. ISSN 2763-8952. DOI: