Natural Language Processing for Clinical Data Classification




named-entity recognition, clinical dataset, term frequency - inverse Document Frequency


The widespread adoption of systems for managing and recording medical documents (MD) has generated a large volume of unstructured data. It corresponds to free text containing ambiguous expressions to describe conditions or procedures. It makes the task of manually categorizing MD error-prone. This work aims to label and classify MD in Portuguese using binary labeling (Recipes and Others) and multi-class (Recipes, Exams, Certificates, and Others). The n-gram and term frequency - inverse document frequency (TF–IDF) were used in the text vectorization step. The results achieved are promising: they presented 0.99 and 0.97 for Kappa in the binary and multi-class classification, respectively. Thus, with the classification of MD, it is possible to provide segmentation of information to manage prescription drugs.


Download data is not yet available.


L. V. de Sousa, O. ., M. V. Magalhães, D. ., E. S. Campelo, V. ., & R. V. e Silva, R. (2022). Natural Language Processing for Clinical Data Classification. ISys - Brazilian Journal of Information Systems, 15(1), 13:1–13:17.



