Classifying Potentially Non-compliant Portuguese Language Sentences Concerning Privacy Policies

  • Matheus Tocchini USP / Instituto Lawgorithm
  • Igor M. Rocha USP / Instituto Lawgorithm
  • Raphael M. de Barros USP / Instituto Lawgorithm
  • Jéssica O. e Silva Instituto Lawgorithm
  • Ananda F. Garcia Instituto Lawgorithm
  • Felipe Zular Instituto Lawgorithm
  • Juliano Maranhão USP / Instituto Lawgorithm
  • Jaime Simão Sichman USP / Instituto Lawgorithm

Resumo


Privacy policies (PPol) are extensive and contain complex sentences that are difficult to understand. They detail what happens with a person’s personal data and must comply with specific legislation. One way to assess the compliance of PPol with legislation is through machine learning models. There are some studies already carried out, aiming to detect compliance with the GDPR, basically analyzing PPol in English. In this work, we present a mapping of Brazilian data protection legislation into 27 categories, divided into 3 blocks, and 3 levels of potential compliance. We also introduced a corpus in Portuguese, with PPol sentences annotated through mapping of Brazilian legislation. We evaluated some classifier models in a task of detecting potentially non-compliant sentences and another task of categorizing potentially non-compliant sentences. We achieved performance close to the literature for studies in English and the GDPR. Our study points out ways to improve the automated assessment of PPol and highlights the complexity of the tasks that seek to ensure compliance with data protection legislation.
Publicado
17/11/2024
TOCCHINI, Matheus; ROCHA, Igor M.; BARROS, Raphael M. de; O. E SILVA, Jéssica; GARCIA, Ananda F.; ZULAR, Felipe; MARANHÃO, Juliano; SICHMAN, Jaime Simão. Classifying Potentially Non-compliant Portuguese Language Sentences Concerning Privacy Policies. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 107-121. ISSN 2643-6264.