Albertina in Action: An Investigation of its Abilities in Aspect Extraction, Hate Speech Detection, Irony Detection, and Question-Answering
Resumo
O campo de processamento de linguagem natural testemunhou avanços significativos nas últimas décadas, impulsionados pela aplicação de aprendizado profundo. Combinando com o uso de uma arquitetura neural chamada Transformers, os avanços são ainda mais superiores e marcantes. Neste trabalho, usamos um modelo baseado em BERT para a língua portuguesa do Brasil, chamado Albertina, nas tarefas de Extração de Aspecto, Detecção de Discurso de Ódio, Detecção de Ironia e Perguntas-Respostas. Por fim, comparamos os resultados obtidos em cada tarefa com os modelos de base e grande de BERTimbau e Albertina.
Referências
Brownlee, J. (2016). Machine Learning Mastery With Python: Understand Your Data, Create Accurate Models, and Work Projects End-to-End. Machine Learning Mastery
Corrêa, U. B., Coelho, L., Santos, L., and de Freitas, L. A. (2021). Overview of the idpt task on irony detection in portuguese at iberlef 2021. Procesamiento del Lenguaje Natural, 67. [link].
da Silva, F. L. V., da S. Xavier, G., Mensenburg, H. M., Rodrigues, R. F., dos Santos, L. P., Araújo, R. M., Corrêa, U. B., and de Freitas, L. A. (2022). Absapt 2022 at iberlef: Overview of the task on aspect-based sentiment analysis in portuguese. Procesamiento del Lenguaje Natural, 69. [link].
Gomes, J. R. S., Garcia, E. A. S., Junior, A. F. B., Rodrigues, R. C., Silva, D. F. C., Maia, D. F., da Silva, N. F. F., Filho, A. R. G., and da Silva Soares, A. (2022). Deep learning brasil at ABSAPT 2022: Portuguese transformer ensemble approaches. In Proceedings of the Iberian Languages Evaluation Fórum (IberLEF 2022), co-located with the 38th Conference of the Spanish Society for Natural Language Processing (SEPLN 2022), Online. CEUR. org, Online. CEUR. org. https://ceur-ws.org/Vol-3202/absapt-paper1.pdf
Guillou, P. (2021). Portuguese bert base cased qa (question answering), finetuned on squad v1.1. [link].
Hoang, M., Bihorac, O. A., and Rouces, J. (2019). Aspect-based sentiment analysis using bert. In Proceedings of the 22nd Nordic Conference on Computational Linguistics https://aclanthology.org/W19-6120
Jiang, S., Chen, C., Lin, N., Chen, Z., and Chen, J. (2021). Irony detection in the portuguese language using bert. Proceedings http://ceur-ws.org ISSN, 1613 https://ceur-ws.org/Vol-2943/idpt_paper1.pdf
Kovács, G., Alonso, P., and Saini, R. (2021). Challenges of hate speech detection in social media. SN Computer Science, 2. https://doi.org/10.1007/s42979-021-00457-3
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature, 521(7553):436–444. https://doi.org/10.1038/nature14539
Lee, C. J. and Katz, A. N. (1998). The differential role of ridicule in sarcasm and irony. Metaphor and symbol, 13(1):1–15 https://doi.org/10.1207/s15327868ms1301_1
Leite, J. A., Silva, D. F., Bontcheva, K., and Scarton, C. (2020). Toxic language detection in social media for brazilian portuguese: New dataset and multilingual analysis. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. https://doi.org/10.48550/arXiv.2010.04543
Liu, B. (2015). Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge University Press. https://doi.org/10.1017/CBO9781139084789
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv e-prints, page arXiv:1606.05250 https://doi.org/10.48550/arXiv.1606.05250
Rodrigues, J., Gomes, L., Silva, J., Branco, A., Santos, R., Cardoso, H. L., and Osório, T. (2023). Advancing neural encoding of portuguese with transformer albertina pt. arXiv preprint arXiv:2305.06721. https://doi.org/10.48550/arXiv.2305.06721
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In Proceedings of the 9th Brazilian Conference on Intelligent Systems. [link].
Spindola, S., José, M. M., Oliveira, A. S., Cação, F. N., and Cozman, F. G. (2021). Interpretability of attention mechanisms in a portuguese-based question answering system about the blue amazon. In Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional, pages 775–786. SBC [link].
Subies, G. G. (2021). Guillemgsubies at idpt2021: Identifying irony in portuguese with bert. In Proceedings of the Iberian Languages Evaluation Fórum (IberLEF 2021), co-located with the 37th Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), Online. CEUR. org, pages 910–916, Online. CEUR. org https://ceur-ws.org/Vol-2943/idpt_paper3.pdf
Tenney, I., Das, D., and Pavlick, E. (2019). Bert rediscovers the classical nlp pipeline. arXiv preprint arXiv:1905.0595 https://doi.org/10.48550/arXiv.1905.05950
Van Hee, C., Lefever, E., and Hoste, V. (2018). SemEval-2018 task 3: Irony detection in English tweets. In Proceedings of the 12th International Workshop on Semantic Evaluation, pages 39–50, New Orleans, Louisiana. Association for Computational Linguistics. http://dx.doi.org/10.18653/v1/S18-1005