Semantic Analysis of Synthetic EULA Contracts, with Natural Language Processing Techniques for LGPD Compliance Assessment

  • João Freire Abramowicz UPE
  • Heuryk Wylk Éboli UPE
  • Cleyton Rodrigues UPE

Resumo


The increasing digitization of online services has resulted in the proliferation of software license agreements (EULA), which are often accepted by users without reading. However, these documents may contain clauses that violate privacy legislation, such as the General Data Protection Law (LGPD) in Brazil. This paper proposes an automated semantic analysis approach using Natural Language Processing (NLP) techniques to assess the similarity between synthetic and real EULA contracts, with a focus on legal compliance with the LGPD. Models based on the BERT architecture, including Legal-BERT and BERTimbau, are used to extract clause embeddings, calculate semantic similarity, and classify legal compliance. The results demonstrate high performance, with the best model achieving 91.2% accuracy and F1-score of 0.89%, confirming the feasibility of using synthetic legal documents in the training and validation of classification systems. The proposed approach contributes to more scalable and privacy-secure compliance audits.
Publicado
29/09/2025
ABRAMOWICZ, João Freire; ÉBOLI, Heuryk Wylk; RODRIGUES, Cleyton. Semantic Analysis of Synthetic EULA Contracts, with Natural Language Processing Techniques for LGPD Compliance Assessment. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 35. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 470-485. ISSN 2643-6264.