An Analysis of Pre-trained Models to Identify Cybersecurity Incidents Entities in the Healthcare Industry

  • Rafael Paim UNISINOS
  • Luciano Ignaczak UNISINOS

Resumo


Healthcare institutions have always been a critical sector in any community. Cybersecurity issues, such as attacks or incidents, may impact their operations and cause damage that could eventually lead to patient death. Named Entity Recognition and Classification (NERC) can support these institutions in analyzing incidents, highlighting the incident’s type, attack type, and location, just to name a few examples. This work evaluated pre-trained machine learning models to comprehend how they help in this identification. For this purpose, we analyzed two fine-tuned BERT models used in a corpus with incidents related to Healthcare institutions in the U.S. We evaluated the entity recognition using both the Strict and Partial approaches. Experiment results indicated a higher precision (above 0.776) but with low Recall, with less than 0.267. This may indicate a good performance for entity recognition. However, the models missed many entities.

Referências

Aghaei, E., Niu, X., Shadid, W., and Al-Shaer, E. (2023). Securebert: A domain-specific language model for cybersecurity. In Li, F., Liang, K., Lin, Z., and Katsikas, S. K., editors, Security and Privacy in Communication Networks, pages 39–56, Cham. Springer Nature Switzerland.

Ajagbe, M. and Zhao, L. (2022). Retraining a bert model for transfer learning in requirements engineering: A preliminary study. In Retraining a BERT Model for Transfer Learning in Requirements Engineering: A Preliminary Study, pages 309–315.

Alqudhaibi, A., Albarrak, M., Aloseel, A., Jagtap, S., and Salonitis, K. (2023). Predicting cybersecurity threats in critical infrastructure for industry 4.0: A proactive approach based on attacker motivations. Sensors, 23(9).

Bhuyan, S. S., Kabir, U. Y., Escareno, J. M., Ector, K., Palakodeti, S., Wyant, D., Kumar, S., Levy, M., Kedia, S., Dasgupta, D., and Dobalian, A. (2020). Transforming healthcare cybersecurity from reactive to proactive: Current status and future recommendations. Journal of Medical Systems, 44(5):98.

CrowdStrike (2023). Threat intelligence. Accessed: 2024-10-30.

Franco, M. F., Soares, L. R., and Nobre, J. C. (2025). Saúde sob ataque: Da avaliação de riscos ao desenvolvimento de estratégias de investimentos em cibersegurança na Área da saúde. In Anais do 25o Simpósio Brasileiro de Computação Aplicada à Saúde – SBCAS 2025. Sociedade Brasileira de Computação. Acessado em julho de 2025.

Future, R. (2019). The threat intelligence handbook - second edition.

IBM (2022). Ibm cost of data breach report 2022.

Ignaczak, L., Martins, M. G., da Costa, C. A., Donida, B., and da Silva, M. C. P. (2023). An evaluation of nerc learning-based approaches to discover personal data in brazilian portuguese documents. Discover Data, 1(1):5.

Iqbal, F., Fung, B. C. M., Debbabi, M., Batool, R., and Marrington, A. (2019). Wordnet-based criminal networks mining for cybercrime investigation. IEEE Access, 7:22740–22755.

Islam, S., Abba, A., Ismail, U., Mouratidis, H., and Papastergiou, S. (2022). Vulnerability prediction for secure healthcare supply chain service delivery. Integr. Comput.-Aided Eng., 29(4):389–409.

Li, Y., Cheng, J., Huang, C., Chen, Z., and Niu, W. (2021). Nedetector: Automatically extracting cybersecurity neologisms from hacker forums. Journal of Information Security and Applications, 58:102784.

Luh, F. and Yen, Y. (2020). Cybersecurity in science and medicine: Threats and challenges. Trends in Biotechnology, 38(8):825–828.

Meskó, B., Drobni, Z., Éva Bényei, Gergely, B., and Győrffy, Z. (2017). Digital health is a cultural transformation of traditional healthcare. Mhealth, 3:38.

Mumtaz, G., Akram, S., Iqbal, W., Usman Ashraf, M., Almarhabi, K. A., Alghamdi, A. M., and Bahaddad, A. A. (2023). Classification and prediction of significant cyber incidents (sci) using data mining and machine learning (dm-ml). IEEE Access, pages 1–1.

Powers, D. M. W. (2011). Evaluation: From precision, recall and f-measure to roc, informedness, markedness and correlation. International Journal of Machine Learning Technology, 2(1):37–63. Acessado em julho de 2025.

Ranade, P., Piplai, A., Joshi, A., and Finin, T. (2021). Cybert: Contextualized embeddings for the cybersecurity domain. In 2021 IEEE International Conference on Big Data (Big Data), pages 3334–3342.

Samtani, S., Chai, Y., and Chen, H. (2022). Linking exploits from the dark web to known vulnerabilities for proactive cyber threat intelligence: An attention-based deep structured semantic model1. MIS quarterly, 46(2).

Segura-Bedmar, I., Martínez, P., and Herrero-Zazo, M. (2013). SemEval-2013 task 9 : Extraction of drug-drug interactions from biomedical texts. In Manandhar, S. and Yuret, D., editors, Second Joint Conference on Lexical and Computational Semantics (*SEM), pages 341–350, Atlanta, Georgia, USA.

Silvestri, S., Islam, S., Papastergiou, S., Tzagkarakis, C., and Ciampi, M. (2023). A machine learning approach for the nlp-based analysis of cyber threats and vulnerabilities of the healthcare ecosystem. Sensors, 23(2).

Sundheim, B. M. (1995). Overview of results of the muc-6 evaluation. In Proceedings of the Sixth Message Understanding Conference (MUC-6), pages 13–31, Columbia, Maryland. ACL.

Wang, X., He, S., Xiong, Z., Wei, X., Jiang, Z., Chen, S., and Jiang, J. (2022). Aptner: A specific dataset for ner missions in cyber threat intelligence field. In 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pages 1233–1238.
Publicado
01/09/2025
PAIM, Rafael; IGNACZAK, Luciano. An Analysis of Pre-trained Models to Identify Cybersecurity Incidents Entities in the Healthcare Industry. In: WORKSHOP DE TRABALHOS DE INICIAÇÃO CIENTÍFICA E DE GRADUAÇÃO - SIMPÓSIO BRASILEIRO DE CIBERSEGURANÇA (SBSEG), 25. , 2025, Foz do Iguaçu/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 135-146. DOI: https://doi.org/10.5753/sbseg_estendido.2025.10726.