Skip to main content

Study on the Complexity of Omics Data: An Analysis for Cancer Survival Prediction

  • Conference paper
  • First Online:
Advances in Bioinformatics and Computational Biology (BSB 2022)

Abstract

The use of machine learning approaches in studying cancer through omics datasets has been an important research tool since the advent of high-throughput technologies. However, these datasets present an intrinsic data complexity that may hinder model development despite their information richness. This work, therefore, aims to study the characteristics of different omics data commonly employed for clinical predictive analysis using a broad set of data complexity measures tailored for imbalanced domains. We focus on the task of cancer survival prediction in eight tumor types based on four types of omics data (i.e., copy number variation, gene expression, microRNA expression, and DNA methylation) and the combination among them (i.e., multi-omics approach). We found that F1-MaxDr, F3_partial, F4_partial, and N3_partial could be used as predictors of performance in this scenario. Furthermore, our experiments suggested that the studied omics data types are strongly correlated in terms of data complexity, including the multi-omics approach. All eight cancer types appeared to be highly correlated with each other, except for Adrenocortical Carcinoma (ACC), which showed a significantly lower complexity than the others in the analyzed data.

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001, and by grants from the Fundação de Amparo á Pesquisa do Estado do Rio Grande do Sul (FAPERGS) [21/2551-0002052-0] and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) [308075/2021-8].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://firebrowse.org/.

  2. 2.

    https://github.com/lpfgarcia/ECoL.

  3. 3.

    https://github.com/victorhb/ImbCoL.

  4. 4.

    The raw results of our experiments can be found in the project Github repository: https://github.com/carlosdanielandrade/complexity-of-omics-data-in-cancer.

References

  1. Barella, V.H., Garcia, L.P., de Souto, M.C., Lorena, A.C., de Carvalho, A.C.: Assessing the data complexity of imbalanced datasets. Inf. Sci. 553, 83–109 (2021)

    Article  Google Scholar 

  2. Barella, V.H., Garcia, L.P., de Souto, M.P., Lorena, A.C., de Carvalho, A.: Data complexity measures for imbalanced classification tasks. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)

    Google Scholar 

  3. Bolón-Canedo, V., Moran-Fernandez, L., Alonso-Betanzos, A.: An insight on complexity measures and classification in microarray data. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2015)

    Google Scholar 

  4. Duan, R., et al.: Evaluation and comparison of multi-omics data integration methods for cancer subtyping. PLOS Comput. Biol. 17(8), 1–33 (2021)

    Article  Google Scholar 

  5. Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)

    Article  Google Scholar 

  6. Li, J., et al.: Predicting breast cancer 5-year survival using machine learning: a systematic review. PLOS ONE 16(4), 1–23 (2021)

    Article  Google Scholar 

  7. Lorena, A.C., Costa, I.G., Spolaôr, N., De Souto, M.C.: Analysis of complexity indices for classification problems: cancer gene expression data. Neurocomputing 75(1), 33–42 (2012)

    Article  Google Scholar 

  8. Lorena, A.C., Garcia, L.P., Lehmann, J., Souto, M.C., Ho, T.K.: How complex is your classification problem? a survey on measuring classification complexity. ACM Comput. Surv. 52(5), 1–34 (2019)

    Article  Google Scholar 

  9. Lorena, A.C., Spolaor, N., Costa, I.G., Souto, M.C.P.: On the complexity of gene marker selection. In: 2010 Eleventh Brazilian Symposium on Neural Networks, pp. 85–90 (2010)

    Google Scholar 

  10. Morán-Fernández, L., Bolón-Canedo, V., Alonso-Betanzos, A.: Can classification performance be predicted by complexity measures? a study using microarray data. Knowl. Inf. Syst. 51(3), 1067–1090 (2017)

    Article  Google Scholar 

  11. Okun, O., Priisalu, H.: Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors. Artif. Intell. Med. 45(2–3), 151–162 (2009)

    Article  Google Scholar 

  12. Olivier, M., Asmis, R., Hawkins, G.A., Howard, T.D., Cox, L.A.: The need for multi-omics biomarker signatures in precision medicine. Int. J. Molec. Sci. 20(19), 4781 (2019)

    Article  CAS  Google Scholar 

  13. Sánchez, J.S., García, V.: Addressing the links between dimensionality and data characteristics in gene-expression microarrays. In: Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications, pp. 1–6 (2018)

    Google Scholar 

  14. de Souto, M.C.P., Lorena, A.C., Spolaôr, N., Costa, I.G.: Complexity measures of supervised classifications tasks: a case study for cancer gene expression data. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2010)

    Google Scholar 

  15. Zhao, D., et al.: Pan-cancer survival classification with clinicopathological and targeted gene expression features. Cancer Inf. 20, 11769351211035137 (2021). pMID: 34376966

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mariana Recamonde-Mendoza .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Andrade, C.D., Fontanari, T., Recamonde-Mendoza, M. (2022). Study on the Complexity of Omics Data: An Analysis for Cancer Survival Prediction. In: Scherer, N.M., de Melo-Minardi, R.C. (eds) Advances in Bioinformatics and Computational Biology. BSB 2022. Lecture Notes in Computer Science(), vol 13523. Springer, Cham. https://doi.org/10.1007/978-3-031-21175-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21175-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21174-4

  • Online ISBN: 978-3-031-21175-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics