Albertina in Action: An Investigation of its Abilities in Aspect Extraction, Hate Speech Detection, Irony Detection, and Question-Answering

Júlia da Rocha Junqueira; Claudio Luis Junior; Félix Leonel V. Silva; Ulisses Brisolara Côrrea; Larissa A. de Freitas

doi:10.5753/stil.2023.234159

Júlia da Rocha Junqueira UFPel
Claudio Luis Junior UFPel
Félix Leonel V. Silva UFPel
Ulisses Brisolara Côrrea UFPel
Larissa A. de Freitas UFPel https://orcid.org/0000-0001-7708-4116

DOI: https://doi.org/10.5753/stil.2023.234159

Resumo

O campo de processamento de linguagem natural testemunhou avanços significativos nas últimas décadas, impulsionados pela aplicação de aprendizado profundo. Combinando com o uso de uma arquitetura neural chamada Transformers, os avanços são ainda mais superiores e marcantes. Neste trabalho, usamos um modelo baseado em BERT para a língua portuguesa do Brasil, chamado Albertina, nas tarefas de Extração de Aspecto, Detecção de Discurso de Ódio, Detecção de Ironia e Perguntas-Respostas. Por fim, comparamos os resultados obtidos em cada tarefa com os modelos de base e grande de BERTimbau e Albertina.

Palavras-chave: hate speech, sentiment analysis, aspect extraction, irony detection, question answering, albertina, bertimbau

Referências

Allam, A. M. N. and Haggag, M. H. (2012). The question answering systems: A survey. International Journal of Research and Reviews in Information Sciences (IJRRIS), 2(3).

Brownlee, J. (2016). Machine Learning Mastery With Python: Understand Your Data, Create Accurate Models, and Work Projects End-to-End. Machine Learning Mastery

Corrêa, U. B., Coelho, L., Santos, L., and de Freitas, L. A. (2021). Overview of the idpt task on irony detection in portuguese at iberlef 2021. Procesamiento del Lenguaje Natural, 67. [link].

da Silva, F. L. V., da S. Xavier, G., Mensenburg, H. M., Rodrigues, R. F., dos Santos, L. P., Araújo, R. M., Corrêa, U. B., and de Freitas, L. A. (2022). Absapt 2022 at iberlef: Overview of the task on aspect-based sentiment analysis in portuguese. Procesamiento del Lenguaje Natural, 69. [link].

Gomes, J. R. S., Garcia, E. A. S., Junior, A. F. B., Rodrigues, R. C., Silva, D. F. C., Maia, D. F., da Silva, N. F. F., Filho, A. R. G., and da Silva Soares, A. (2022). Deep learning brasil at ABSAPT 2022: Portuguese transformer ensemble approaches. In Proceedings of the Iberian Languages Evaluation Fórum (IberLEF 2022), co-located with the 38th Conference of the Spanish Society for Natural Language Processing (SEPLN 2022), Online. CEUR. org, Online. CEUR. org. https://ceur-ws.org/Vol-3202/absapt-paper1.pdf

Guillou, P. (2021). Portuguese bert base cased qa (question answering), finetuned on squad v1.1. [link].

Hoang, M., Bihorac, O. A., and Rouces, J. (2019). Aspect-based sentiment analysis using bert. In Proceedings of the 22nd Nordic Conference on Computational Linguistics https://aclanthology.org/W19-6120

Jiang, S., Chen, C., Lin, N., Chen, Z., and Chen, J. (2021). Irony detection in the portuguese language using bert. Proceedings http://ceur-ws.org ISSN, 1613 https://ceur-ws.org/Vol-2943/idpt_paper1.pdf

Kovács, G., Alonso, P., and Saini, R. (2021). Challenges of hate speech detection in social media. SN Computer Science, 2. https://doi.org/10.1007/s42979-021-00457-3

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature, 521(7553):436–444. https://doi.org/10.1038/nature14539

Lee, C. J. and Katz, A. N. (1998). The differential role of ridicule in sarcasm and irony. Metaphor and symbol, 13(1):1–15 https://doi.org/10.1207/s15327868ms1301_1

Leite, J. A., Silva, D. F., Bontcheva, K., and Scarton, C. (2020). Toxic language detection in social media for brazilian portuguese: New dataset and multilingual analysis. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. https://doi.org/10.48550/arXiv.2010.04543

Liu, B. (2015). Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge University Press. https://doi.org/10.1017/CBO9781139084789

Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv e-prints, page arXiv:1606.05250 https://doi.org/10.48550/arXiv.1606.05250

Rodrigues, J., Gomes, L., Silva, J., Branco, A., Santos, R., Cardoso, H. L., and Osório, T. (2023). Advancing neural encoding of portuguese with transformer albertina pt. arXiv preprint arXiv:2305.06721. https://doi.org/10.48550/arXiv.2305.06721

Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In Proceedings of the 9th Brazilian Conference on Intelligent Systems. [link].

Spindola, S., José, M. M., Oliveira, A. S., Cação, F. N., and Cozman, F. G. (2021). Interpretability of attention mechanisms in a portuguese-based question answering system about the blue amazon. In Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional, pages 775–786. SBC [link].

Subies, G. G. (2021). Guillemgsubies at idpt2021: Identifying irony in portuguese with bert. In Proceedings of the Iberian Languages Evaluation Fórum (IberLEF 2021), co-located with the 37th Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), Online. CEUR. org, pages 910–916, Online. CEUR. org https://ceur-ws.org/Vol-2943/idpt_paper3.pdf

Tenney, I., Das, D., and Pavlick, E. (2019). Bert rediscovers the classical nlp pipeline. arXiv preprint arXiv:1905.0595 https://doi.org/10.48550/arXiv.1905.05950

Van Hee, C., Lefever, E., and Hoste, V. (2018). SemEval-2018 task 3: Irony detection in English tweets. In Proceedings of the 12th International Workshop on Semantic Evaluation, pages 39–50, New Orleans, Louisiana. Association for Computational Linguistics. http://dx.doi.org/10.18653/v1/S18-1005