Classificação de Trocadilhos em Português com BERTimbau Large: Desafios e Resultados

Êmylle Beatriz de Sousa; Aislan Rafael R. de Sousa; Rafael T. Anchiêta

doi:10.5753/enucompi.2025.9781

Êmylle Beatriz de Sousa IFPI
Aislan Rafael R. de Sousa IFPI
Rafael T. Anchiêta IFPI

DOI: https://doi.org/10.5753/enucompi.2025.9781

Resumo

Este trabalho apresenta um estudo sobre a detecção automática de trocadilhos em português. Para isso, foi utilizado o corpus Puntuguese, uma coleção inédita de 3.990 textos curtos, divididos igualmente entre trocadilhos e não trocadilhos. O modelo BERTimbau Large foi aplicado para analisar as sentenças, por ser especializado nas variações do português brasileiro e europeu. Os resultados foram satisfatórios, mesmo diante da dificuldade que a tarefa impõe, mostrando que o modelo é promissor para identificar esse tipo de humor verbal. O estudo contribui com a área de Processamento de Linguagem Natural em português e abre espaço para melhorias futuras, como a adição de informações de contexto e técnicas de regularização mais eficazes.

Referências

BONET, H. A.; RINCÓN, A. M.; LÓPEZ, A. M. Detection, classification and quantification of hurtful humor (huhu) on twitter using classical models, ensemble models, and transformers. In: IberLEF@ SEPLN. [S.l.: s.n.], 2023.

CRUZ, J. et al. In unity, there is strength: On weighted voting ensembles for hurtful humour detection. In: IberLEF@ SEPLN. [S.l.: s.n.], 2023.

DEVLIN, J. et al. Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). [S.l.: s.n.], 2019. p. 4171–4186.

INACIO, M. L. et al. Puntuguese: A corpus of puns in Portuguese with micro-edits. In: CALZOLARI, N. et al. (Ed.). Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Torino, Italia: ELRA and ICCL, 2024. p. 13332–13343. Disponível em: [link].

MILLER, T.; HEMPELMANN, C.; GUREVYCH, I. SemEval-2017 task 7: Detection and interpretation of English puns. In: BETHARD, S. et al. (Ed.). Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Vancouver, Canada: Association for Computational Linguistics, 2017. p. 58–68. Disponível em: [link].

PyTorch Documentation. torch.nn.CrossEntropyLoss. 2024. [link]. Acesso em: 6 abr. 2025.

SOUZA, F.; NOGUEIRA, R.; LOTUFO, R. Bertimbau: pretrained bert models for brazilian portuguese. In: SPRINGER. Brazilian conference on intelligent systems. [S.l.], 2020. p. 403–417.