Pun Classification in Portuguese Using BERTimbau Large: Challenges and Results

  • Êmylle Beatriz de Sousa IFPI
  • Aislan Rafael R. de Sousa IFPI
  • Rafael T. Anchiêta IFPI

Abstract


This work presents a study on the automatic detection of puns in Portuguese. For this purpose, the Puntuguese corpus was used, an unprecedented collection of 3,990 short texts, evenly divided between puns and non-puns. The BERTimbau Large model was applied to analyze the sentences, as it is specialized in the variations of Brazilian and European Portuguese. The results were satisfactory, even in light of the difficulty that the task imposes, showing that the model is promising for identifying this type of verbal humor. The study contributes to the field of Natural Language Processing in Portuguese and paves the way for future improvements, such as the addition of contextual information and more effective regularization techniques.

References

BONET, H. A.; RINCÓN, A. M.; LÓPEZ, A. M. Detection, classification and quantification of hurtful humor (huhu) on twitter using classical models, ensemble models, and transformers. In: IberLEF@ SEPLN. [S.l.: s.n.], 2023.

CRUZ, J. et al. In unity, there is strength: On weighted voting ensembles for hurtful humour detection. In: IberLEF@ SEPLN. [S.l.: s.n.], 2023.

DEVLIN, J. et al. Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). [S.l.: s.n.], 2019. p. 4171–4186.

INACIO, M. L. et al. Puntuguese: A corpus of puns in Portuguese with micro-edits. In: CALZOLARI, N. et al. (Ed.). Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Torino, Italia: ELRA and ICCL, 2024. p. 13332–13343. Disponível em: [link].

MILLER, T.; HEMPELMANN, C.; GUREVYCH, I. SemEval-2017 task 7: Detection and interpretation of English puns. In: BETHARD, S. et al. (Ed.). Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Vancouver, Canada: Association for Computational Linguistics, 2017. p. 58–68. Disponível em: [link].

PyTorch Documentation. torch.nn.CrossEntropyLoss. 2024. [link]. Acesso em: 6 abr. 2025.

SOUZA, F.; NOGUEIRA, R.; LOTUFO, R. Bertimbau: pretrained bert models for brazilian portuguese. In: SPRINGER. Brazilian conference on intelligent systems. [S.l.], 2020. p. 403–417.
Published
2025-05-28
SOUSA, Êmylle Beatriz de; SOUSA, Aislan Rafael R. de; ANCHIÊTA, Rafael T.. Pun Classification in Portuguese Using BERTimbau Large: Challenges and Results. In: UNIFIED COMPUTING MEETING OF PIAUÍ (ENUCOMPI), 17. , 2025, Teresina/PI. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 139-148. DOI: https://doi.org/10.5753/enucompi.2025.9781.