Study on the Complexity of Omics Data: An Analysis for Cancer Survival Prediction

Abstract

The use of machine learning approaches in studying cancer through omics datasets has been an important research tool since the advent of high-throughput technologies. However, these datasets present an intrinsic data complexity that may hinder model development despite their information richness. This work, therefore, aims to study the characteristics of different omics data commonly employed for clinical predictive analysis using a broad set of data complexity measures tailored for imbalanced domains. We focus on the task of cancer survival prediction in eight tumor types based on four types of omics data (i.e., copy number variation, gene expression, microRNA expression, and DNA methylation) and the combination among them (i.e., multi-omics approach). We found that F1-MaxDr, F3 partial, F4 partial, and N3 partial could be used as predictors of performance in this scenario. Furthermore, our experiments suggested that the studied omics data types are strongly correlated in terms of data complexity, including the multi-omics approach. All eight cancer types appeared to be highly correlated with each other, except for Adrenocortical Carcinoma (ACC), which showed a significantly lower complexity than the others in the analyzed data.
Published
2022-09-21
How to Cite
ANDRADE, Carlos Daniel; FONTANARI, Thomas; RECAMONDE-MENDOZA, Mariana. Study on the Complexity of Omics Data: An Analysis for Cancer Survival Prediction. Proceedings of the Brazilian Symposium on Bioinformatics (BSB), [S.l.], p. 44-55, sep. 2022. ISSN 2316-1248. Available at: <https://sol.sbc.org.br/index.php/bsb/article/view/22857>. Date accessed: 17 may 2024.