Rhetorical Role Identification for Portuguese Legal Documents


In this paper, we present a new corpus for Rhetorical Role Identification in Portuguese legal documents. The corpus comprises petitions from 70 civil lawsuits filed in TJMS court and was manually labeled with rhetorical roles specifically tailored for petitions. Since petition documents are created without a standard structure, we had to deal with several issues to clean the extracted textual content. We assessed classic and deep learning machine learning methods on the proposed corpus. The best performing method obtained an F-score of 80.50. At the best of our knowledge, this is the first work to deal with rhetorical role identification for petitions, given that previous works focused only on judicial decisions. Additionally, it is also the first work to tackle this task for the Portuguese language. The proposed corpus, as well as the proposed rhetorical roles, can foster new research in the judicial area and also lead to new solutions to improve the flow of Brazilian court houses.

Palavras-chave: Rhetorical role identification, Legal sentence classification, Natural language processing, Corpus
ARAGY, Roberto; FERNANDES, Eraldo Rezende; CACERES, Edson Norberto. Rhetorical Role Identification for Portuguese Legal Documents. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 10. , 2021, Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . ISSN 2643-6264.