Exploring Text Decoding Methods for Portuguese Legal Text Generation


In recent years, there has been considerable growth in the volume of legal proceedings in Brazil. In this context, there is a lot of potential in using recent advances in Natural Language Processing to automate tasks and analysis in the legal domain. In this article, we investigate text decoding methods for automating the writing of keyphrases, a sequence of key terms present in documents used in courts throughout Brazil. For this purpose, a text-to-text framework based on generative Transformers is used to generate keyphrases and evaluate three decoding techniques: greedy, top-K, and top-p. Since the keyphrases are designed to improve retrieval tasks, we evaluated keyphrases generated by the decoding methods in legal document retrieval. Traditional retrieval methods (TF-IDF and BM25) were used to evaluate the quality of the generated keyphrases. The results obtained (in terms of IR metrics) were statistically significant, and they indicate that greedy decoding generates high-quality keyphrases for the dockets used in this work, providing keyphrases close to the ones generated by human specialists.
SAKIYAMA, Kenzo; MONTANARI, Raphael; JUNIOR, Roseval Malaquias; NOGUEIRA, Rodrigo; ROMERO, Roseli A. F.. Exploring Text Decoding Methods for Portuguese Legal Text Generation. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 12. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 63-77. ISSN 2643-6264.