Adversarial Attack on Large Language Models for the Portuguese Language

Allan Chamon Figueiredo; Pedro Nuno de Souza Moura; Adriana C. F. Alvim

Allan Chamon Figueiredo UNIRIO
Pedro Nuno de Souza Moura UNIRIO
Adriana C. F. Alvim UNIRIO

Resumo

Large language models (LLMs) have gained a lot of visibility recently and have been widely used. They revolutionized the field of artificial intelligence by allowing machines to process and reproduce human-like texts in an unprecedented way. In the same way that this technology promotes the creation of innovative applications, from chatbots and virtual assistants to content creation tools and personalized recommendation systems, it also faces major challenges and risks, such as adversarial attacks. Adversarial attacks aim to reveal the vulnerability of deep neural network models, the basis of LLMs, enabling the development of defense methods and thus making these models more robust. This research sought to investigate the robustness of LLMs fine-tuned for the Portuguese language against adversarial attacks, specifically the BERTimbau and Sabiá models. Experimental results demonstrate an attack success rate against the BERTimbau model of approximately 93% for the sentiment analysis task, while an attack success rate of approximately 83% was achieved for the textual entailment task, thus indicating that the model is susceptible to adversarial attacks. For the Sabiá model, a performance drop rate of 6.6% was obtained, which shows it is also vulnerable to adversarial attacks.