An Anonymization Library for Rapid and Diverse Anonymization of Brazilian Personal Data

  • Stefano Luppi Sposito Universidade de Brasília (UnB)
  • Raylan da Silva Sales Universidade de Brasília (UnB)
  • Edna Dias Canedo Universidade de Brasília (UnB)
  • Geovana Ramos Sousa Silva Universidade de Brasília (UnB) https://orcid.org/0000-0002-0304-0804

Resumo


The prevalence of personal data in the hands of large companies highlights the necessity for robust regulatory frameworks. The General Data Protection Law (LGPD) seeks to standardize data usage, emphasizing minimal ownership and, when needed, anonymization in line with regulations. The absence of a specific tool for anonymizing Brazilian personal data remains a significant challenge. The lack of a dedicated tool for anonymizing Brazilian personal data poses a hurdle in achieving LGPD compliance. This study proposes the development of a library tailored to anonymize personal data, considering the unique aspects of Brazilian regulations. The goal is to create an efficient and secure library for removing identifiable information from documents, aligning with the LGPD. Furthermore, the results obtained from the implementation and testing of the developed library provide significant contributions to the data privacy community. The successful integration of support for various document formats such as .PDF, .DOCX, and .XLSX, coupled with the ability to anonymize text strings, demonstrates the versatility and practicality of the library. Notably, the performance tests reveal promising outcomes, showcasing the effectiveness of each function and regular expression employed. These results not only validate the functionality of the library but also underscore its potential in aiding individuals and organizations in adhering to data protection regulations.

Palavras-chave: Anonymization, Data, General Data Protection Law, Quasi-Identifiers, Suppression

Referências

Alves, C. and Neves, M. (2021). Especificação de requisitos de privacidade em conformidade com a lgpd: Resultados de um estudo de caso. In WER.

ArtLabs (2022). anonympy 0.3.7. [link].

Bild, R., kuhn, K. A., and Prasser, F. (2020). Better safe than sorry – implementing reliable health data anonymization. PDigital Personalized Health and Medicine, 270:68–72.

Carvalho, A. P., Canedo, E. D., Carvalho, F. P., and Carvalho, P. H. P. (2020). Anonymisation and compliance to protection data: Impacts and challenges into big data. In Filipe, J., Smialek, M., Brodsky, A., and Hammoudi, S., editors, Proceedings of the 22nd International Conference on Enterprise Information Systems, ICEIS 2020, Prague, Czech Republic, May 5-7, 2020, Volume 1, pages 31–41. SCITEPRESS.

da República, P. (2018). Lei geral de proteção de dados pessoais (lgpd). Secretaria-Geral, accessed in November 19, 2019. [link].

de São João da Barra, P. (2024). Dados de vacinação da covid-19. [link].

FOUNDATION, P. S. (2023). Python documentation. [link].

Foundation, T. P. S. (2024). Regular expression operations. [link].

Fridriksson, A. (2020). anonymizedf 1.0.1. [link].

intersoft consulting (2018). General data protection regulation. [link].

Jha, N., Vassio, L., Trevisan, M., Leonardi, E., and Mellia, M. (2023). Practical anonymization for data streams: z-anonymity and relation with k-anonymity. Perform. Evaluation, 159:102329.

Kalam, A. A. E., Deswarte, Y., Trouessin, G., and Cordonnier, E. (2005). Personal data anonymization for security and privacy in collaborative environments. In McQuay, W. K. and Smari, W. W., editors, Proceedings of the 2005 International Symposium on Collaborative Technologies and Systems, CTS 2005, Saint Louis, Missouri, USA, May 15-20, 2005, pages 56–61. IEEE Computer Society.

Murthy, S., Abu Bakar, A., Abdul Rahim, F., and Ramli, R. (2019). A comparative study of data anonymization techniques. In 2019 IEEE 5th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS), pages 306–309.

OpenAI (2024). [link].

pandas via NumFOCUS, I. H. b. O. (2024). Python pandas documentation. [link].

Pelgrim, R. (2023). Data anonymization in python. [link].

Prasser, F., Eicher, J., Spengler, H., Bild, R., and Kuhn, K. A. (2020). Flexible data anonymization using arx—current status and challenges ahead. Software: Practice and Experience, 50(7):1277–1304.

Raylan Da Silva Sales, S. L. S. (2023). anonymization-library. [link].

Teixeira, G. C. (2020). O papel social da lei geral de proteção de dados no brasil. UNIVERSIDADE DO SUL DE SANTA CATARINA, pages 1–59.

Tomás, J. C. P. (2022). Data anonymization: algorithms, techniques and tools. PhD thesis, Instituto Politecnico de Coimbra.

Valvekens, M. (2024). Pdfminer.six documentation. [link].
Publicado
20/05/2024
SPOSITO, Stefano Luppi; SALES, Raylan da Silva; CANEDO, Edna Dias; SILVA, Geovana Ramos Sousa. An Anonymization Library for Rapid and Diverse Anonymization of Brazilian Personal Data. In: CONCURSO DE TRABALHOS DE CONCLUSÃO DE CURSO EM SISTEMAS DE INFORMAÇÃO - SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 20. , 2024, Juiz de Fora/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 192-201. DOI: https://doi.org/10.5753/sbsi_estendido.2024.238628.