Automated classification of cardiology diagnoses based on textual medical reports

  • João Antonio Oliveira Pedrosa Universidade Federal de Minas Gerais
  • Derick Oliveira Universidade Federal de Minas Gerais
  • Wagner Meira Jr. Universidade Federal de Minas Gerais
  • Antônio Ribeiro Universidade Federal de Minas Gerais


Automatic diagnoses of diseases has been a long term challenge for Computer Science and related disciplines. Textual clinical reports can be used as a great source of data for such diagnoses. However, building classification models from them is not a trivial task. The problem tackled in this work is the identification of the medical diagnoses that are indicated in these reports. In the past, several methods have been proposed for addressing this problem, but a method developed for reports in the cardiology area that are written in Portuguese is still needed. In this paper we describe a method that is able to handle the peculiarities of clinical reports, including the medical terminology, and that is implemented to estimate correctly the disease based on raw clinical reports and a list of the possible diagnoses. Experimental results show that our method has a high degree of accuracy, even for infrequent classes and complex databases.

Palavras-chave: cardiology, information extraction, machine learning, natural language processing


Alkmim, M. B., Figueira, R. M., Marcolino, M. S., Cardoso, C. S., Abreu, M. P. d., Cunha, L. R., Cunha, D. F. d., Antunes, A. P., Resende, A. G. d. A., Resende, E. S., et al. Improving patient access to specialized health care: the telehealth network of minas gerais, brazil. Bulletin of the World Health Organization vol. 90, pp. 373–378, 2012.

Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., and Kochut, K. A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919 , 2017.

Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 , 2014.

Baud, R., Rassinoux, A.-M., and Scherrer, J.-R. Natural language processing and semantical representation of medical texts. Methods of information in medicine 31 (02): 117–125, 1992.

Dang PA, Kalra MK, B. M. e. a. Natural language processing using online analytic processing for assessing recommendations in radiology reports. J Am Coll Radiol vol. 5,3, pp. 197-204, 2008.

Fan, J., Upadhye, S., and Worster, A. Understanding receiver operating characteristic (roc) curves. Canadian Journal of Emergency Medicine 8 (1): 19–20, 2006.

Ford, E., Nicholson, A., Koeling, R., Tate, A. R., Carroll, J., Axelrod, L., Smith, H. E., Rait, G., Davies, K. A., Petersen, I., et al. Optimising the use of electronic health records to estimate the incidence of rheumatoid arthritis in primary care: what information is hidden in free text? BMC medical research methodology 13 (1): 105, 2013.

Friedman, C. Towards a comprehensive medical language processing system: methods and issues. In Proceedings of the AMIA annual fall symposium. American Medical Informatics Association, pp. 595, 1997.

Friedman, C., Hripcsak, G., DuMouchel, W., Johnson, S. B., and Clayton, P. D. Natural language processing in an operational clinical information system. Natural Language Engineering 1 (1): 83–108, 1995.

Gabrieli, E. R. and Speth, D. J. Automated analysis of medical text i. clue gathering. Journal of medical systems 14 (1-2): 71–91, 1990.

Harris, Z. S. Distributional structure. Word 10 (2-3): 146–162, 1954.

Hassanpour, S. and Langlotz, C. P. Information extraction from multi-institutional radiology reports. Artificial intelligence in medicine vol. 66, pp. 29–39, 2016.

Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural computation 9 (8): 1735–1780, 1997.

Hripcsak, G., Friedman, C., Alderson, P. O., DuMouchel, W., Johnson, S. B., and Clayton, P. D. Unlocking clinical data from narrative reports: a study of natural language processing. Annals of internal medicine 122 (9): 681–688, 1995.

Hughes, N. P., Tarassenko, L., and Roberts, S. J. Markov models for automated ecg interval analysis. In Advances in Neural Information Processing Systems. pp. 611–618, 2004.

Jagannatha, A. N. and Yu, H. Structured prediction models for rnn based sequence labeling in clinical text. In Proceedings of the conference on empirical methods in natural language processing. conference on empirical methods in natural language processing. Vol. 2016. NIH Public Access, pp. 856, 2016.

Levenshtein, V. I. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady. Vol. 10. pp. 707–710, 1966.

Mamlin, B. W., Heinze, D. T., and McDonald, C. J. Automated extraction and normalization of findings from cancer-related free-text radiology reports. In AMIA Annual Symposium Proceedings. Vol. 2003. American Medical Informatics Association, pp. 420, 2003.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. pp. 3111–3119, 2013.

Paixao, G., Silva e Silva, L. G., Gomes, P., Ferreira, M., Oliveira, D., Ribeiro, M., Ribeiro, A., Nascimento, J., Cardoso, G., Araujo, R., et al. Clinical outcomes in digital electrocardiography: Evaluation of mortality in atrial fibrillation (code study). Circulation 138 (Suppl_1): A16594–A16594, 2018.

Prince, V. and Roche, M. Information retrieval in biomedicine: natural language processing for knowledge integration. Medical Information Science Reference New York, 2009.

Ribeiro, A. H., Ribeiro, M. H., Paixão, G. M. M., Oliveira, D. M., Gomes, P. R., Canazart, J. A., Ferreira, M. P. S., Andersson, C. R., Macfarlane, P. W., Meira Jr., W., Schön, T. B., and Ribeiro, A. L. P. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nature Communications 11 (1): 1760, 2020.

Souza, R. C., de Brito, D. E., Cardoso, R. L., de Oliveira, D. M., Meira, W., and Pappa, G. L. An evolutionary methodology for handling data scarcity and noise in monitoring real events from social media data. In Ibero-American Conference on Artificial Intelligence. Springer, pp. 295–306, 2014.

Spyns, P. Natural language processing in medicine: an overview. Methods of information in medicine 35 (04/05): 285–301, 1996.

Stein HD, Nadkarni P, E. J. M. P. Exploring the degree of concordance of coded and textual data in answering clinical queries from a clinical data repository, 2000.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. Attention is all you need. In Advances in neural information processing systems. pp. 5998–6008, 2017.

Xu, J. and Sharma, P. Structured report data from a medical text report, 2019. US Patent App. 16/382,358.

Yadav, P. Patient report retrieval using semantic lda with cosine similarity. Int. J. Innov. Sci. Eng. Technol. 4 (7): 402–408, 2017.

Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., and Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. pp. 1480–1489, 2016.
PEDROSA, João Antonio Oliveira; OLIVEIRA, Derick; MEIRA JR., Wagner; RIBEIRO, Antônio. Automated classification of cardiology diagnoses based on textual medical reports. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 8. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 185-192. ISSN 2763-8944. DOI: