Adversarial Attack on Large Language Models for the Portuguese Language

  • Allan Chamon Figueiredo UNIRIO
  • Pedro Nuno de Souza Moura UNIRIO
  • Adriana C. F. Alvim UNIRIO

Resumo


Large language models (LLMs) have gained a lot of visibility recently and have been widely used. They revolutionized the field of artificial intelligence by allowing machines to process and reproduce human-like texts in an unprecedented way. In the same way that this technology promotes the creation of innovative applications, from chatbots and virtual assistants to content creation tools and personalized recommendation systems, it also faces major challenges and risks, such as adversarial attacks. Adversarial attacks aim to reveal the vulnerability of deep neural network models, the basis of LLMs, enabling the development of defense methods and thus making these models more robust. This research sought to investigate the robustness of LLMs fine-tuned for the Portuguese language against adversarial attacks, specifically the BERTimbau and Sabiá models. Experimental results demonstrate an attack success rate against the BERTimbau model of approximately 93% for the sentiment analysis task, while an attack success rate of approximately 83% was achieved for the textual entailment task, thus indicating that the model is susceptible to adversarial attacks. For the Sabiá model, a performance drop rate of 6.6% was obtained, which shows it is also vulnerable to adversarial attacks.
Publicado
29/09/2025
FIGUEIREDO, Allan Chamon; MOURA, Pedro Nuno de Souza; ALVIM, Adriana C. F.. Adversarial Attack on Large Language Models for the Portuguese Language. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 35. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 116-130. ISSN 2643-6264.