Rhetorical Role Identification for Portuguese Legal Documents

Roberto Aragy; Eraldo Rezende Fernandes; Edson Norberto Caceres

Roberto Aragy UFMS / TJ-MS https://orcid.org/0000-0002-4872-7044
Eraldo Rezende Fernandes UFMS https://orcid.org/0000-0002-7039-2447
Edson Norberto Caceres UFMS https://orcid.org/0000-0001-6471-3462

Resumo

In this paper, we present a new corpus for Rhetorical Role Identification in Portuguese legal documents. The corpus comprises petitions from 70 civil lawsuits filed in TJMS court and was manually labeled with rhetorical roles specifically tailored for petitions. Since petition documents are created without a standard structure, we had to deal with several issues to clean the extracted textual content. We assessed classic and deep learning machine learning methods on the proposed corpus. The best performing method obtained an F-score of 80.50. At the best of our knowledge, this is the first work to deal with rhetorical role identification for petitions, given that previous works focused only on judicial decisions. Additionally, it is also the first work to tackle this task for the Portuguese language. The proposed corpus, as well as the proposed rhetorical roles, can foster new research in the judicial area and also lead to new solutions to improve the flow of Brazilian court houses.

Palavras-chave: Rhetorical role identification, Legal sentence classification, Natural language processing, Corpus