Automatic Text Simplification for the Legal Domain in Brazilian Portuguese

Francielle Vasconcellos Pereira; Ana Frazão; Viviane P. Moreira

Francielle Vasconcellos Pereira UFRGS
Ana Frazão USP
Viviane P. Moreira UFRGS

Resumo

Legal and juridical documents such as rulings, laws, agreements, and contracts contain domain-specific terms and jargon, long and complex sentences that may be difficult to understand for laypeople without domain expertise, reading issues, or with a low education level. The simplification of these documents has been a concern for several years, aiming to democratize access to justice. Courts are already adopting simpler language, especially in documents aimed at laypeople, such as warrants and notifications, to enhance inclusion and clarity. Automatic textual simplification, a subfield of Natural Language Processing, seeks to make complex texts more accessible. This paper explores the task of automatic text simplification in Portuguese for the legal domain. The main challenge here is the lack of datasets containing complex sentences and their simplified versions. This work investigates how existing datasets, methods, and metrics used for text simplification perform applied to legal texts in Portuguese. We present qualitative and quantitative analyses using five models. The results show that GPT-based models have the best results, but fine-tuning with domain data is a viable open-source alternative.