Study on the Complexity of Omics Data: An Analysis for Cancer Survival Prediction

Andrade, Carlos Daniel; Fontanari, Thomas; Recamonde-Mendoza, Mariana

doi:10.1007/978-3-031-21175-1_6

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 13523))

Included in the following conference series:

Brazilian Symposium on Bioinformatics

256 Accesses

Abstract

The use of machine learning approaches in studying cancer through omics datasets has been an important research tool since the advent of high-throughput technologies. However, these datasets present an intrinsic data complexity that may hinder model development despite their information richness. This work, therefore, aims to study the characteristics of different omics data commonly employed for clinical predictive analysis using a broad set of data complexity measures tailored for imbalanced domains. We focus on the task of cancer survival prediction in eight tumor types based on four types of omics data (i.e., copy number variation, gene expression, microRNA expression, and DNA methylation) and the combination among them (i.e., multi-omics approach). We found that F1-MaxDr, F3_partial, F4_partial, and N3_partial could be used as predictors of performance in this scenario. Furthermore, our experiments suggested that the studied omics data types are strongly correlated in terms of data complexity, including the multi-omics approach. All eight cancer types appeared to be highly correlated with each other, except for Adrenocortical Carcinoma (ACC), which showed a significantly lower complexity than the others in the analyzed data.

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001, and by grants from the Fundação de Amparo á Pesquisa do Estado do Rio Grande do Sul (FAPERGS) [21/2551-0002052-0] and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) [308075/2021-8].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://firebrowse.org/.
2.
https://github.com/lpfgarcia/ECoL.
3.
https://github.com/victorhb/ImbCoL.
4.
The raw results of our experiments can be found in the project Github repository: https://github.com/carlosdanielandrade/complexity-of-omics-data-in-cancer.

References

Barella, V.H., Garcia, L.P., de Souto, M.C., Lorena, A.C., de Carvalho, A.C.: Assessing the data complexity of imbalanced datasets. Inf. Sci. 553, 83–109 (2021)
Article Google Scholar
Barella, V.H., Garcia, L.P., de Souto, M.P., Lorena, A.C., de Carvalho, A.: Data complexity measures for imbalanced classification tasks. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)
Google Scholar
Bolón-Canedo, V., Moran-Fernandez, L., Alonso-Betanzos, A.: An insight on complexity measures and classification in microarray data. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2015)
Google Scholar
Duan, R., et al.: Evaluation and comparison of multi-omics data integration methods for cancer subtyping. PLOS Comput. Biol. 17(8), 1–33 (2021)
Article Google Scholar
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Article Google Scholar
Li, J., et al.: Predicting breast cancer 5-year survival using machine learning: a systematic review. PLOS ONE 16(4), 1–23 (2021)
Article Google Scholar
Lorena, A.C., Costa, I.G., Spolaôr, N., De Souto, M.C.: Analysis of complexity indices for classification problems: cancer gene expression data. Neurocomputing 75(1), 33–42 (2012)
Article Google Scholar
Lorena, A.C., Garcia, L.P., Lehmann, J., Souto, M.C., Ho, T.K.: How complex is your classification problem? a survey on measuring classification complexity. ACM Comput. Surv. 52(5), 1–34 (2019)
Article Google Scholar
Lorena, A.C., Spolaor, N., Costa, I.G., Souto, M.C.P.: On the complexity of gene marker selection. In: 2010 Eleventh Brazilian Symposium on Neural Networks, pp. 85–90 (2010)
Google Scholar
Morán-Fernández, L., Bolón-Canedo, V., Alonso-Betanzos, A.: Can classification performance be predicted by complexity measures? a study using microarray data. Knowl. Inf. Syst. 51(3), 1067–1090 (2017)
Article Google Scholar
Okun, O., Priisalu, H.: Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors. Artif. Intell. Med. 45(2–3), 151–162 (2009)
Article Google Scholar
Olivier, M., Asmis, R., Hawkins, G.A., Howard, T.D., Cox, L.A.: The need for multi-omics biomarker signatures in precision medicine. Int. J. Molec. Sci. 20(19), 4781 (2019)
Article CAS Google Scholar
Sánchez, J.S., García, V.: Addressing the links between dimensionality and data characteristics in gene-expression microarrays. In: Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications, pp. 1–6 (2018)
Google Scholar
de Souto, M.C.P., Lorena, A.C., Spolaôr, N., Costa, I.G.: Complexity measures of supervised classifications tasks: a case study for cancer gene expression data. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2010)
Google Scholar
Zhao, D., et al.: Pan-cancer survival classification with clinicopathological and targeted gene expression features. Cancer Inf. 20, 11769351211035137 (2021). pMID: 34376966
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
Carlos Daniel Andrade, Thomas Fontanari & Mariana Recamonde-Mendoza
Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre, RS, Brazil
Thomas Fontanari & Mariana Recamonde-Mendoza

Authors

Carlos Daniel Andrade
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Fontanari
View author publications
You can also search for this author in PubMed Google Scholar
Mariana Recamonde-Mendoza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mariana Recamonde-Mendoza .

Editor information

Editors and Affiliations

Instituto Nacional de Câncer, Rio de Janeiro, Brazil
Nicole M. Scherer
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Raquel C. de Melo-Minardi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Andrade, C.D., Fontanari, T., Recamonde-Mendoza, M. (2022). Study on the Complexity of Omics Data: An Analysis for Cancer Survival Prediction. In: Scherer, N.M., de Melo-Minardi, R.C. (eds) Advances in Bioinformatics and Computational Biology. BSB 2022. Lecture Notes in Computer Science(), vol 13523. Springer, Cham. https://doi.org/10.1007/978-3-031-21175-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-21175-1_6
Published: 16 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21174-4
Online ISBN: 978-3-031-21175-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Study on the Complexity of Omics Data: An Analysis for Cancer Survival Prediction