Exploring Text Decoding Methods for Portuguese Legal Text Generation

Kenzo Sakiyama; Raphael Montanari; Roseval Malaquias Junior; Rodrigo Nogueira; Roseli A. F. Romero

Kenzo Sakiyama USP https://orcid.org/0000-0002-5284-4745
Raphael Montanari USP https://orcid.org/0000-0003-2281-3646
Roseval Malaquias Junior USP https://orcid.org/0000-0002-6005-0515
Rodrigo Nogueira UNICAMP https://orcid.org/0000-0002-2600-6035
Roseli A. F. Romero USP https://orcid.org/0000-0001-9366-2780

Resumo

In recent years, there has been considerable growth in the volume of legal proceedings in Brazil. In this context, there is a lot of potential in using recent advances in Natural Language Processing to automate tasks and analysis in the legal domain. In this article, we investigate text decoding methods for automating the writing of keyphrases, a sequence of key terms present in documents used in courts throughout Brazil. For this purpose, a text-to-text framework based on generative Transformers is used to generate keyphrases and evaluate three decoding techniques: greedy, top-K, and top-p. Since the keyphrases are designed to improve retrieval tasks, we evaluated keyphrases generated by the decoding methods in legal document retrieval. Traditional retrieval methods (TF-IDF and BM25) were used to evaluate the quality of the generated keyphrases. The results obtained (in terms of IR metrics) were statistically significant, and they indicate that greedy decoding generates high-quality keyphrases for the dockets used in this work, providing keyphrases close to the ones generated by human specialists.