AnonShield: Scalable On-Premise Pseudonymization for CSIRT Network Vulnerability Data
Resumo
We present AnonShield, a high-throughput, on-premise pseudonymization system for network vulnerability scan reports that combines GPU-accelerated NER, streaming processing, caching, and schema-aware configuration. Evaluated on datasets up to 550 MB (70,951 records), AnonShield reduces processing time from over 92 hours to under 10 minutes (up to 738× speedup), reaching F1 = 94.2% and Recall = 96.4% on a specialist-annotated validation set. Our results show that scalable pseudonymization of network vulnerability data is feasible without sacrificing analytical utility, enabling compliant data sharing in operational CSIRT environments.Referências
Ahl, C. (2023). LogLicker: Anonymizing logs made easy. [link]. Permiso Security. Accessed: 2026.
Albakri, A., Boiten, E., and De Lemos, R. (2019). Sharing cyber threat intelligence under the General Data Protection Regulation. In Privacy Technologies and Policy, LNCS.
Almeida, G., Pohlmann, M., Severo, A., Kreutz, D., Heinrich, T., and Pereira, L. (2025). On-premise SLMs vs. commercial LLMs: Prompt engineering and incident classification in SOCs and CSIRTs. In XXII ERRC.
Almorjan, A., Basheri, M., and Almasre, M. (2025). Large language models for synthetic dataset generation of cybersecurity indicators of compromise. Sensors, 25(9):2825.
Amazon Web Services (2017). Amazon comprehend. [link]. Accessed: 2026.
Amoo, O. O., Atadoga, A., Osasona, F., Abrahams, T. O., Ayinla, B. S., and Farayola, O. A. (2024). GDPR’s impact on cybersecurity: A review focusing on USA and European practices. International Journal of Science and Research Archive, 11:1338–1347.
Bandel, C. T., Esteves, J. P. R., Guerra, K. P., Bertholdo, L. M., Kreutz, D., and Miani, R. S. (2025). Anonimização de incidentes de segurança com reidentificação controlada. In Anais do SBSeg 2025.
CVE Details (2026). Browse CVE vulnerabilities by date. Accessed: 2026-03-26. Reports 48,448 CVEs in 2025 and 40,308 in 2024.
Digitale Gesellschaft (2014). Anonip – IP address anonymisation tool. [link]. Accessed: 2026.
FIRST (2026). Vulnerability forecast. Median forecast: 59,427 CVEs in 2026. Google Cloud (2018). Cloud data loss prevention (Cloud DLP). [link]. Accessed: 2026.
IRI (2017). IRI DarkShield – data discovery and masking. [link]. Accessed: 2026.
Kapelinski, C., Lautert, D., Machado, B., and Kreutz, D. (2025). AnonLFI 2.0: Extensible architecture for PII pseudonymization in CSIRTs with OCR and technical recognizers. In ERRC 2025.
Machado, B., Lautert, D., Kapelinski, C., and Kreutz, D. (2025). Structured extraction of vulnerabilities in openvas and tenable was reports using llms. In XXII ERRC.
Microsoft (2018). Presidio – data protection and de-identification SDK. [link]. Accessed: 2026.
Nweke, L. O. and Wolthusen, S. (2020). Legal issues related to cyber threat information sharing. In Proc. CyCon. NATO CCDCOE.
Prasser, F., Kohlmayer, F., Lautenschlager, R., and Kuhn, K. A. (2014). ARX – a comprehensive tool for anonymizing biomedical data. AMIA, pages 984–993.
Severo, A., Lautert, D., Almeida, G., Kreutz, D., Rodrigo, G., Pereira Jr, L., and Bertholdo, L. (2025). LLMs e engenharia de prompt para classificação automatizada de incidentes em SOCs. In XXV SBSeg.
Slijepčević, D., Hein, D., Zec, M., and Kaltenbrunner, M. (2021). k-anonymity in practice: How generalisation and suppression affect machine learning classifiers. Computers & Security, 111:102488.
VulnCheck (2026). State of exploitation 2026. 884 KEVs identified in 2025; 28.96% exploited on or before CVE publication date.
Wagner, C., Dulaunoy, A., Wagener, G., and Iklody, A. (2016). MISP: The design and implementation of a collaborative threat intelligence sharing platform. In ACM WISCS.
Xu, H., Wang, S., Li, N., Wang, K., Zhao, Y., Chen, K., Yu, T., Liu, Y., and Wang, H. (2025). Large language models for cyber security: A systematic literature review. ACM Transactions on Software Engineering and Methodology.
Albakri, A., Boiten, E., and De Lemos, R. (2019). Sharing cyber threat intelligence under the General Data Protection Regulation. In Privacy Technologies and Policy, LNCS.
Almeida, G., Pohlmann, M., Severo, A., Kreutz, D., Heinrich, T., and Pereira, L. (2025). On-premise SLMs vs. commercial LLMs: Prompt engineering and incident classification in SOCs and CSIRTs. In XXII ERRC.
Almorjan, A., Basheri, M., and Almasre, M. (2025). Large language models for synthetic dataset generation of cybersecurity indicators of compromise. Sensors, 25(9):2825.
Amazon Web Services (2017). Amazon comprehend. [link]. Accessed: 2026.
Amoo, O. O., Atadoga, A., Osasona, F., Abrahams, T. O., Ayinla, B. S., and Farayola, O. A. (2024). GDPR’s impact on cybersecurity: A review focusing on USA and European practices. International Journal of Science and Research Archive, 11:1338–1347.
Bandel, C. T., Esteves, J. P. R., Guerra, K. P., Bertholdo, L. M., Kreutz, D., and Miani, R. S. (2025). Anonimização de incidentes de segurança com reidentificação controlada. In Anais do SBSeg 2025.
CVE Details (2026). Browse CVE vulnerabilities by date. Accessed: 2026-03-26. Reports 48,448 CVEs in 2025 and 40,308 in 2024.
Digitale Gesellschaft (2014). Anonip – IP address anonymisation tool. [link]. Accessed: 2026.
FIRST (2026). Vulnerability forecast. Median forecast: 59,427 CVEs in 2026. Google Cloud (2018). Cloud data loss prevention (Cloud DLP). [link]. Accessed: 2026.
IRI (2017). IRI DarkShield – data discovery and masking. [link]. Accessed: 2026.
Kapelinski, C., Lautert, D., Machado, B., and Kreutz, D. (2025). AnonLFI 2.0: Extensible architecture for PII pseudonymization in CSIRTs with OCR and technical recognizers. In ERRC 2025.
Machado, B., Lautert, D., Kapelinski, C., and Kreutz, D. (2025). Structured extraction of vulnerabilities in openvas and tenable was reports using llms. In XXII ERRC.
Microsoft (2018). Presidio – data protection and de-identification SDK. [link]. Accessed: 2026.
Nweke, L. O. and Wolthusen, S. (2020). Legal issues related to cyber threat information sharing. In Proc. CyCon. NATO CCDCOE.
Prasser, F., Kohlmayer, F., Lautenschlager, R., and Kuhn, K. A. (2014). ARX – a comprehensive tool for anonymizing biomedical data. AMIA, pages 984–993.
Severo, A., Lautert, D., Almeida, G., Kreutz, D., Rodrigo, G., Pereira Jr, L., and Bertholdo, L. (2025). LLMs e engenharia de prompt para classificação automatizada de incidentes em SOCs. In XXV SBSeg.
Slijepčević, D., Hein, D., Zec, M., and Kaltenbrunner, M. (2021). k-anonymity in practice: How generalisation and suppression affect machine learning classifiers. Computers & Security, 111:102488.
VulnCheck (2026). State of exploitation 2026. 884 KEVs identified in 2025; 28.96% exploited on or before CVE publication date.
Wagner, C., Dulaunoy, A., Wagener, G., and Iklody, A. (2016). MISP: The design and implementation of a collaborative threat intelligence sharing platform. In ACM WISCS.
Xu, H., Wang, S., Li, N., Wang, K., Zhao, Y., Chen, K., Yu, T., Liu, Y., and Wang, H. (2025). Large language models for cyber security: A systematic literature review. ACM Transactions on Software Engineering and Methodology.
Publicado
25/05/2026
Como Citar
KAPELINSKI, Cristhian; LAUTERT, Douglas; MACHADO, Beatriz; KREUTZ, Diego; FERRÃO, Isadora Garcia.
AnonShield: Scalable On-Premise Pseudonymization for CSIRT Network Vulnerability Data. In: SALÃO DE FERRAMENTAS - SIMPÓSIO BRASILEIRO DE REDES DE COMPUTADORES E SISTEMAS DISTRIBUÍDOS (SBRC), 44. , 2026, Praia do Forte/BA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2026
.
p. 1-12.
ISSN 2177-9384.
DOI: https://doi.org/10.5753/sbrc_estendido.2026.23154.
