Combining Mutation and Gene Network Data in a Machine Learning Approach for False-Positive Cancer Driver Gene Discovery

  • Jorge Francisco Cutigi IFSP / USP
  • Renato Feijo Evangelista USP
  • Rodrigo Henrique Ramos IFSP / USP
  • Cynthia de Oliveira Lage Ferreira USP
  • Adriane Feijo Evangelista Barretos Cancer Hospital
  • André C. P. F. de Carvalho USP
  • Adenilso Simao USP

Resumo


An increasing interest in Cancer Genomics research emerged from the advent and widespread use of next-generation sequencing technologies, which have generated a large amount of digital biological data. However, not all of this information in fact contributes to cancer studies. For instance, false-positive-driver genes may contain characteristics of cancer genes but are not actually relevant to the cancer initiation and progression. Including this type of genes in cancer studies may lead to identifying unrealistic trends in the data and mislead biomedical decisions. Therefore, proper screening to detect this specific type of gene among genes considered drivers is of utmost importance. This work is focused on the development of models dedicated to this task. Support Vector Machine (SVM) and Random Forest (RF) machine learning algorithms were selected to induce predictive models to classify supposedly driver genes as real drivers or false-positive drivers based on both mutation data and gene network interactions. The results confirmed that the combination of the two sources of information improves the performance of the models. Moreover, SVM and RF models achieved a classification accuracy of 85.0% and 82.4% over labeled data, respectively. Finally, a literature-based analysis was performed over the classification of a new set of genes to further validate the concept.
Palavras-chave: Cancer bioinformatics, Driver genes, False-positive driver, Complex networks, Machine learning
Publicado
23/11/2020
CUTIGI, Jorge Francisco; EVANGELISTA, Renato Feijo; RAMOS, Rodrigo Henrique; FERREIRA, Cynthia de Oliveira Lage; EVANGELISTA, Adriane Feijo; DE CARVALHO, André C. P. F.; SIMAO, Adenilso. Combining Mutation and Gene Network Data in a Machine Learning Approach for False-Positive Cancer Driver Gene Discovery. In: SIMPÓSIO BRASILEIRO DE BIOINFORMÁTICA (BSB), 13. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 81-92. ISSN 2316-1248.