ARTERIAL: A Natural Language Processing Model for Prevention of Information Leakage from Electronic Health Records

Guilherme Goldschmidt; Felipe André Zeiser; Rodrigo Da Rosa Righi; Cristiano André Da Costa

Guilherme Goldschmidt UNISINOS
Felipe André Zeiser UNISINOS
Rodrigo Da Rosa Righi UNISINOS
Cristiano André Da Costa UNISINOS

Resumo

Over the past decade, there has been a steady increase in health security breaches. Therefore, healthcare organizations must protect their sensitive information such as test results, diagnoses, prescriptions, research, and customer personal information. A leak of sensitive data can result in significant economic loss and damage to the organization’s image. In this sense, Data Leakage Prevention (DLP) systems can help to identify, monitor, protect, and reduce the risks of leaking sensitive data. However, state-of-the-art DLP solutions only use signature comparisons and static comparisons. Therefore, we propose to develop the ARTERIAL model based on Natural Language Processing (NLP), Entity Recognition (NER), and Artificial Neural Networks (ANN) to be more assertive in extracting information and recognizing entities from Electronic Health Records (EHR). Different from the current literature, ARTERIAL considers semantic features present in the EHR. Three approaches were implemented and tested, two based on ANN and the following based on machine learning algorithms. As a result, the approach taken in its implementation using a machine learning algorithm reached 98.0% of Precision, 86.0% of Recall, and 91.0% of F1-Score.

Palavras-chave: component, formatting, style, styling, insert