Deep Learning for Rhetorical Move Detection

  • Bruno Vinicius Veronez de Jesus UNESP
  • Arnaldo Candido Junior UNESP

Abstract


The lack of tools to support scientific writing in Portuguese hinders the production of cohesive texts. This paper describes the fine-tuning of the BERTimbau language model for the automatic identification of rhetorical moves based on Swales’s model. The methodology involves using CorpusDT, a corpus of abstracts in the field of Computer Science, for sentence classification. The results from the classifier demonstrate high effectiveness, with an F1-score of 0,94 on the validation set, indicating the potential of this approach to improve the quality of scientific publications in Portuguese.

References

Adalberto Ferreira Barbosa Junior. distilbert-portuguese-cased (revision df1fa7a), 2024. URL [link].

Iz Beltagy, Kyle Lo, and Arman Cohan. Scibert: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676, 2019.

Ana Maria Teresa Benevides-Pereira. Considerações sobre a síndrome de burnout e seu impacto no ensino. Boletim de Psicologia, 62:155 – 168, 12 2012. ISSN 0006-5943. URL [link].

Carmen Dayrell, Arnaldo Candido Jr, Gabriel Lima, Danilo Machado Jr, Ann A Copestake, Valéria Delisandra Feltrim, Stella EO Tagnin, and Sandra M Aluísio. Rhetorical move detection in english abstracts: Multi-label sentence classifiers and their annotated corpora. In LREC, pages 1604–1609, 2012.

Fernanda Goulart Ritti Dias and Benedito Gomes Bezerra. Análise retórica de introduções de artigos científicos da área da saúde pública. Horizontes de Linguística Aplicada, Brasília, ano, 12:163–182, 2013.

Valéria Delisandra Feltrim. Uma abordagem baseada em corpus e em sistemas de crítica para a construção de ambientes Web de auxilio à escrita acadêmica em português. PhD thesis, Universidade de São Paulo, 2004.

J Richard Landis and Gary G Koch. The measurement of observer agreement for categorical data. biometrics, pages 159–174, 1977.

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 09 2019. ISSN 1367-4803. DOI: 10.1093/bioinformatics/btz682. DOI: 10.1093/bioinformatics/btz682.

Ren Ryba, Zoë A. Doubleday, and Sean D. Connell. How can we boost the impact of publications? try better writing. Proceedings of the National Academy of Sciences, 116 (2):341–343, 2019. DOI: 10.1073/pnas.1819937116. URL [link].

Fábio Souza, Rodrigo Nogueira, and Roberto Lotufo. BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In Proceedings of the 9th Brazilian Conference on Intelligent Systems (BRACIS), pages 403–417. Springer, 2020.

John M Swales. Genre analysis: English in academic and research settings. Cambridge University Press, 1990.

John M Swales. Research genres: Explorations and applications. Cambridge University Press, 2004.

Simone Teufel and Marc Moens. Summarizing scientific articles: experiments with relevance and rhetorical status. Computational linguistics, 28(4):409–445, 2002.
Published
2025-09-29
JESUS, Bruno Vinicius Veronez de; CANDIDO JUNIOR, Arnaldo. Deep Learning for Rhetorical Move Detection. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 16. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 182-191. DOI: https://doi.org/10.5753/stil.2025.37824.