AnonShield: Scalable On-Premise Pseudonymization for CSIRT Network Vulnerability Data

Cristhian Kapelinski; Douglas Lautert; Beatriz Machado; Diego Kreutz; Isadora Garcia Ferrão

doi:10.5753/sbrc_estendido.2026.23154

Cristhian Kapelinski UNIPAMPA http://orcid.org/0009-0005-5750-022X
Douglas Lautert UNIPAMPA https://orcid.org/0009-0005-9036-5453
Beatriz Machado UNIPAMPA https://orcid.org/0009-0002-2750-0323
Diego Kreutz UNIPAMPA https://orcid.org/0000-0003-0830-0238
Isadora Garcia Ferrão UBO https://orcid.org/0000-0002-0612-486X

DOI: https://doi.org/10.5753/sbrc_estendido.2026.23154

Resumo

We present AnonShield, a high-throughput, on-premise pseudonymization system for network vulnerability scan reports that combines GPU-accelerated NER, streaming processing, caching, and schema-aware configuration. Evaluated on datasets up to 550 MB (70,951 records), AnonShield reduces processing time from over 92 hours to under 10 minutes (up to 738× speedup), reaching F1 = 94.2% and Recall = 96.4% on a specialist-annotated validation set. Our results show that scalable pseudonymization of network vulnerability data is feasible without sacrificing analytical utility, enabling compliant data sharing in operational CSIRT environments.

Referências

Ahl, C. (2023). LogLicker: Anonymizing logs made easy. [link]. Permiso Security. Accessed: 2026.

Albakri, A., Boiten, E., and De Lemos, R. (2019). Sharing cyber threat intelligence under the General Data Protection Regulation. In Privacy Technologies and Policy, LNCS.

Almeida, G., Pohlmann, M., Severo, A., Kreutz, D., Heinrich, T., and Pereira, L. (2025). On-premise SLMs vs. commercial LLMs: Prompt engineering and incident classification in SOCs and CSIRTs. In XXII ERRC.

Almorjan, A., Basheri, M., and Almasre, M. (2025). Large language models for synthetic dataset generation of cybersecurity indicators of compromise. Sensors, 25(9):2825.

Amazon Web Services (2017). Amazon comprehend. [link]. Accessed: 2026.

Amoo, O. O., Atadoga, A., Osasona, F., Abrahams, T. O., Ayinla, B. S., and Farayola, O. A. (2024). GDPR’s impact on cybersecurity: A review focusing on USA and European practices. International Journal of Science and Research Archive, 11:1338–1347.

Bandel, C. T., Esteves, J. P. R., Guerra, K. P., Bertholdo, L. M., Kreutz, D., and Miani, R. S. (2025). Anonimização de incidentes de segurança com reidentificação controlada. In Anais do SBSeg 2025.

CVE Details (2026). Browse CVE vulnerabilities by date. Accessed: 2026-03-26. Reports 48,448 CVEs in 2025 and 40,308 in 2024.

Digitale Gesellschaft (2014). Anonip – IP address anonymisation tool. [link]. Accessed: 2026.

FIRST (2026). Vulnerability forecast. Median forecast: 59,427 CVEs in 2026. Google Cloud (2018). Cloud data loss prevention (Cloud DLP). [link]. Accessed: 2026.

IRI (2017). IRI DarkShield – data discovery and masking. [link]. Accessed: 2026.

Kapelinski, C., Lautert, D., Machado, B., and Kreutz, D. (2025). AnonLFI 2.0: Extensible architecture for PII pseudonymization in CSIRTs with OCR and technical recognizers. In ERRC 2025.

Machado, B., Lautert, D., Kapelinski, C., and Kreutz, D. (2025). Structured extraction of vulnerabilities in openvas and tenable was reports using llms. In XXII ERRC.

Microsoft (2018). Presidio – data protection and de-identification SDK. [link]. Accessed: 2026.

Nweke, L. O. and Wolthusen, S. (2020). Legal issues related to cyber threat information sharing. In Proc. CyCon. NATO CCDCOE.

Prasser, F., Kohlmayer, F., Lautenschlager, R., and Kuhn, K. A. (2014). ARX – a comprehensive tool for anonymizing biomedical data. AMIA, pages 984–993.

Severo, A., Lautert, D., Almeida, G., Kreutz, D., Rodrigo, G., Pereira Jr, L., and Bertholdo, L. (2025). LLMs e engenharia de prompt para classificação automatizada de incidentes em SOCs. In XXV SBSeg.

Slijepčević, D., Hein, D., Zec, M., and Kaltenbrunner, M. (2021). k-anonymity in practice: How generalisation and suppression affect machine learning classifiers. Computers & Security, 111:102488.

VulnCheck (2026). State of exploitation 2026. 884 KEVs identified in 2025; 28.96% exploited on or before CVE publication date.

Wagner, C., Dulaunoy, A., Wagener, G., and Iklody, A. (2016). MISP: The design and implementation of a collaborative threat intelligence sharing platform. In ACM WISCS.

Xu, H., Wang, S., Li, N., Wang, K., Zhao, Y., Chen, K., Yu, T., Liu, Y., and Wang, H. (2025). Large language models for cyber security: A systematic literature review. ACM Transactions on Software Engineering and Methodology.