Semantic Textual Similarity for Abridging Clinical Notes in Brazilian Electronic Health Records
ResumoWith the growing importance of the use of information from electronic patient records in the development of machine learning models, there is also a need for a holistic understanding of those records, in particular abridging the clinical notes so that important information is used in the training process without the repetition that is commonly found in such notes. This paper presents the pre-processing of clinical notes from the BRATECA Dataset, a Brazilian tertiary care data collection, aiming at removing repeated information resulting from the interaction between healthcare providers and patients, considering assigned values of semantic similarity between sentences in clinical notes.
Mutinda, F., Yada, S., Wakamiya, S., and Aramaki, E. (2021). Semantic textual similarity in japanese clinical domain texts using bert.
Real, L., F. E. G. O. H. (2021). The assin 2 shared task: A quick overview. Methods Inf Med. https://doi.org/10.1007/978-3-030-41505-1_39
Schneider, E., Souza, J., Knafou, J., Copara, J., Oliveira, L., Gumiel, Y., Ferro Antunes de Oliveira, L., Teodoro, D., Paraiso, E., and Moro, C. (2020). Biobertpt – a portuguese neural language model for clinical named entity recognition. pages 65–72. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.clinicalnlp-1.7
Shamout F, Zhu T, C. D. (2021). Machine learning for clinical outcome prediction. volume 14, pages 116–126. Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/RBME.2020.3007816