Pseudonymization in Legal Texts According to the LGPD: A Named Entity Recognition Approach

  • Marcelo Anselmo UnB
  • Bruno César Ribas UnB

Resumo


This study explores the application of Named Entity Recognition (NER) for the pseudonymization of data in legal texts, aiming to protect Personally Identifiable Information (PII) in compliance with Brazil’s General Data Protection Law (LGPD). The research highlights the challenge of balancing data privacy and utility, presenting a methodology that uses artificial intelligence technologies to effectively identify and obscure PII in legal documents. In this study, we propose a Transformer model along with Regex techniques to identify entities in a text. To test the model, we created a new dataset from the existing LenerBR. We also used a function and prompt engineering applied to the Llama 8B version 3 model to generate synthetic data. Tests showed the need for further adjustments in the proposed new model. Future work will focus on improving the model’s accuracy and efficiency, as well as enhancing the identification of sensitive data and learning from user interactions.
Publicado
17/11/2024
ANSELMO, Marcelo; RIBAS, Bruno César. Pseudonymization in Legal Texts According to the LGPD: A Named Entity Recognition Approach. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 309-323. ISSN 2643-6264.