Survival Prediction for Oral Cancer Patients: A Machine Learning Approach

  • Murilo Cruz Lopes UEFS
  • Marília de Matos Amorim UEFS
  • Valéria Souza Freitas UEFS
  • Rodrigo Tripodi Calumby UEFS


There is a high incidence of oral cancer in Brazil, with 150,000 new cases estimated for 2020-2022. In most cases, it is diagnosed at an advanced stage and are related to many risk factors. The Registro Hospitalar de Câncer (RHC), managed by Instituto Nacional de Câncer (INCA), is a nation-wide database that integrates cancer registers from several hospitals in Brazil. RHC is mostly an administrative database but also include clinical, socioeconomic and hospitalization data for each patient with a cancer diagnostic in the country. For these patients, prognostication is always a difficult task a demand multi-dimensional analysis. Therefore, exploiting large-scale data and machine intelligence approaches emerge as promising tool for computer-aided decision support on death risk estimation. Given the importance of this context, some works have reported high prognostication effectiveness, however with extremely limited data collections, relying on weak validation protocols or simple robustness analysis. Hence, this work describes a detailed workflow and experimental analysis for oral cancer patient survival prediction considering careful data curation and strict validation procedures. By exploiting multiple machine learning algorithms and optimization techniques the proposed approach allowed promising survival prediction effectiveness with F1 and AuC-ROC over 0.78 and 0.80, respectively. Moreover, a detailed analysis have shown that the minimization of different types of prediction errors were achieved by different models, which highlights the importance of the rigour in this kind of validation.

Palavras-chave: RHC, health, oral cancer, machine learning


Borges, D., Sena, M., Ferreira, M., and Roncalli, Mortalidade por câncer de boca e condição sócio-econômica no Brasil. Cadernos de Saúde Pública vol. 25(2), pp. 321–327, 2009. DOI:

Conway, D., Petticrew, M., Marlborough, H., Berthiller, J., Hasbibe, M., and Macpherson, L. Socioeconomic inequalities and oral cancer risk: a systematic review and meta-analysis of case-control studies. International journal of câncer. vol. 122:2811–2819, 2008. DOI: 10.1002/ijc.23430.

Dantas, T., Silva, P., Sousa, E., Cunha, M., Aguiar, A., Costa, F., Mota, M., Alves, A., and Sousa, F. Influence of educational level, stage, and histological type on survival of oral cancer in a brazilian population: A retrospective study of 10 years observation. Medicine (Baltimore), 2016. Doi: 10.1097/MD.0000000000002314.

Delen, D. Analysis of cancer data: A data mining approach. The Journal of Knowledge Engineering, Expert Sys-tems26 (1): 100–112, 2009. DOI:

Fiocruz. Determinantes sociais. [link], 2021. Acessado: 07-08-2021.

Groome, P., Rohland, S., Hall, S., Irish, J., Mackillop, M., and O’ Sullivan, B. A population-based study of factors associated with early versus late stage oral cavity cancer diagnoses. Oral oncology, 47(7):642-647., 2011. DOI: 10.1016/j.oraloncology.2011.04.018.

IARC. Cancer tomorrow [internet]. [link], 2020. Acessado: 07-08-2021.

INCA. O que é câncer. [link], 2019a. Acessado: 07-08-2021.

INCA. Registro hospitalar de câncer. [link], 2019b. Acessado: 07-08-2021.

INCA. Estimativa 2020: incidência de câncer no brasil. [link], 2020.

INCA. Câncer de boca [internet]. [link], 2021. Acessado: 07-08-2021.

Ministerio da Saúde do Brasil. Manual de Bases Técnicas da Oncologia – SIA/SUS - Sistema de Informações Ambulatoriais. [link], 2019. Acessado: 09-08-2021.

Neville, B., Damm, D., Allen, C., and JE, J. B. Patologia oral e Maxilofacial. GEN Guanabara Koogan, Rio de Janeiro, 2020.

Salmi, N. and Rustam, Z. Naïve bayes classifier models for predicting the colon cancer. IOP Conference Series: Materials Science and Engineering vol. 546, pp. 052068, jun, 2019. DOI 10.1088/1757-899x/546/5/052068

Sharma, N. and Om, H.Data mining models for predicting oral cancer survivability. Network Modeling Analysis in Health Informatics and Bioinformatics. vol. 2, pp. 285–295, 2013. DOI:

Tseng, W., Chiang, W., Liu, S., Roan, J., and Lin, C. The application of data mining techniques to oral cancer prognosis. Journal of Medical Systems39 (59), 2015. DOI: 10.1007/s10916-015-0241-3.

Warnakulasuriya, S. Global epidemiology of oral and oropharyngeal cancer. Oral Oncol vol. 45(4-5), pp. 309–316, 2009. DOI: 10.1016/j.oraloncology.2008.06.002.
LOPES, Murilo Cruz; AMORIM, Marília de Matos; FREITAS, Valéria Souza; CALUMBY, Rodrigo Tripodi. Survival Prediction for Oral Cancer Patients: A Machine Learning Approach. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 9. , 2021, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 97-104. ISSN 2763-8944. DOI: