Skip to main content

Clinical Oncology Textual Notes Analysis Using Machine Learning and Deep Learning

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2023)

Abstract

Advances in textual classification can foster quality in existing clinical systems. Our research explored experimentally text classification methods applied in non-synthetic oncology clinical notes corpora. The experiments were performed in a dataset with 3,308 medical notes. Experiments evaluated the following machine learning and deep learning classification methods: Multilayer Perceptron Neural network, Logistic Regression, Decision Tree classifier, Random Forest classifier, K-nearest neighbors classifier, and Long-Short Term Memory. An experiment evaluated the influence of the corpora preprocessing step on the results, allowing us to identify that the classifier’s mean accuracy was leveraged from 26.1% to 86.7% with the per-clinical-event corpus and 93.9% with the per-patient corpus. The best-performing classifier was the Multilayer Perceptron, which achieved 93.90% accuracy, a Macro F1 score of 93.61%, and a Weighted F1 score of 93.99%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://scikit-learn.org/.

  2. 2.

    https://keras.io/.

References

  1. Sabra, S., Alobaidi, M., Malik, K.M., Sabeeh, V.: Performance evaluation for semantic-based risk factors extraction from clinical narratives. In: IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC). Las Vegas, NV, USA, vol. 2018, pp. 695–701 (2018). https://doi.org/10.1109/CCWC.2018.8301742

  2. Glatzer, M., Panje, C.M., Sirén, C., Cihoric, N., Putora, P.M.: Decision making criteria in oncology. Oncology 98(6), 370–378 (2020). Epub 2018 Sep 18. PMID: 30227426. https://doi.org/10.1159/000492272

  3. Reyes-Ortiz, J.A., González-Beltrán, B.A., Gallardo-López, L.: Clinical decision support systems: a survey of NLP-based approaches from unstructured data. In: 26th International Workshop on Database and Expert Systems Applications (DEXA). Valencia, Spain vol. 2015, pp. 163–167 (2015). https://doi.org/10.1109/DEXA.2015.47

  4. Alemzadeh, H., Devarakonda, M.: An NLP-based cognitive system for disease status identification in electronic health records. In: 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), Orlando, FL, USA, pp. 89–92 (2017). https://doi.org/10.1109/BHI.2017.7897212

  5. Meskó, B.: The guide to the future of medicine: technology and the human touch. In: Webicina KFT (2014)

    Google Scholar 

  6. Zhang, R., Ma, S., Shanahan, L., Munroe, J., Horn, S., Speedie, S.: Automatic methods to extract New York heart association classification from clinical notes. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, MO, USA, pp. 1296–1299 (2017). https://doi.org/10.1109/BIBM.2017.8217848

  7. Chen, X., Xie, H., Wang, F., et al.: A bibliometric analysis of natural language processing in medical research. BMC Med. Inform. Decis. Mak. 18(Suppl 1), 14 (2018). https://doi.org/10.1186/s12911-018-0594-x

    Article  Google Scholar 

  8. Shickel, B., et al.: Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Inform. 22(5), 1589–1604 (2017)

    Article  Google Scholar 

  9. Kreimeyer, K., et al.: Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review. J. Biomed. Inform. 73, 14–29 (2017). Epub 2017 Jul 17. PMID: 28729030; PMCID: PMC6864736. https://doi.org/10.1016/j.jbi.2017.07.012

  10. Hunt, D.L., Haynes, R.B., Hanna, S.E., Smith, K.: Effects of computer-based clinical decision support systems on physician performance and patient outcomes: a systematic review. JAMA 280(15), 1339–1346 (1998). PMID: 9794315. https://doi.org/10.1001/jama.280.15.1339

  11. Bucur, A., van Leeuwen, J., Cirstea, T.C., Graf, N.: Clinical decision support framework for validation of multiscale models and personalization of treatment in oncology. In: 13th IEEE International Conference on BioInformatics and BioEngineering, Chania, Greece, pp. 1–4 (2013). https://doi.org/10.1109/BIBE.2013.6701695

  12. Polpinij, J.: The cancerology ontology: designed to support the search of evidence-based oncology from biomedical literatures. In: 24th International Symposium on Computer-Based Medical Systems (CBMS). Bristol, UK, pp. 1–6 (2011). https://doi.org/10.1109/CBMS.2011.5999168

  13. Wang, Y., et al.: Clinical information extraction applications: a literature review. J. Biomed. Inform. 77, 34–49 (2018). ISSN 1532–0464. https://doi.org/10.1016/j.jbi.2017.11.011

  14. InterProcess: InterProcess Gemed Oncology - Oncological management system (2019). www.interprocess.com.br/en/gemed-oncology/

  15. Naraei, P., Abhari, A., Sadeghian, A.: Application of multilayer perceptron neural networks and support vector machines in classification of healthcare data. In: Future Technologies Conference (FTC). San Francisco, CA, USA, vol. 2016, pp. 848–852 (2016). https://doi.org/10.1109/FTC.2016.7821702

  16. Lemon, S.C., Roy, J., Clark, M.A., Friedmann, P.D., Rakowski, W.: Classification and regression tree analysis in public health: methodological review and comparison with logistic regression. Ann. Behav. Med. 26(3), 172–181 (2003). PMID: 14644693. https://doi.org/10.1207/S15324796ABM2603_02

  17. Lavanya, D., Rani, K.U.: Ensemble decision tree classifier for breast cancer data. Int. J. Inf. Technol. Convergence Serv., 2(1), 17 (2012)

    Google Scholar 

  18. DuBrava, S., et al.: Using random forest models to identify correlates of a diabetic peripheral neuropathy diagnosis from electronic health record data. Pain Med. 18(1), 107–115 (2017). PMID: 27252307. https://doi.org/10.1093/pm/pnw096

  19. Tayeb, S., et al.: Toward predicting medical conditions using K-nearest neighbors. In: 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, pp. 3897–3903 (2017). https://doi.org/10.1109/BigData.2017.8258395

  20. ul Haq, A., Li, J.P., Memon, M.H., Nazir, S., Sun, R.: A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Hindawi Mobile Inf. Syst. (2018). https://doi.org/10.1155/2018/3860146

  21. Haq, A.U., et al.: Intelligent machine learning approach for effective recognition of diabetes in E-Healthcare using clinical data. Sensors 20, 2649 (2020). https://doi.org/10.3390/s20092649

  22. Tai, K.S., Socher, R., Manning, C.D.: Language processing, improved semantic representations from tree structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural, pp. 1556–1566 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diego Pinheiro da Silva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

da Silva, D.P., Fröhlich, W.d.R., Schwertner, M.A., Rigo, S.J. (2023). Clinical Oncology Textual Notes Analysis Using Machine Learning and Deep Learning. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14196. Springer, Cham. https://doi.org/10.1007/978-3-031-45389-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45389-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45388-5

  • Online ISBN: 978-3-031-45389-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics