Comparative Analysis of Red Team Activities of LLMs with LLMs
Abstract
The increasingly intensive use of Large Language Models (LLMs), with expressive parameters and context understanding, can be explored to cause security issues. This paper aims to provide a comparison between the tools available for vulnerability detection within the scope of Red Teaming in LLMs. A systematic review was conducted, as well as comparisons of state-of-the-art algorithms in the field with new proposals for using LLMs adversarially against algorithms in the same manner. An initial comparison was made, taking into account the performance and reproducibility of two algorithms: AutoDAN and GCG.
Keywords:
Machine Learning and High-Performance Computing, Evaluation, Measurement, and Performance Prediction, Data Science and High-Performance Computing
References
Large Language Model Meta AI (LLaMA) 3.2. Meta Platforms, Inc. [link] Acesso em 14 de setembro de 2024.
Bender, E., Gebru, T., McMillan-Major, A., and Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? pages 610–623. DOI: 10.1145/3442188.3445922.
Huang, J., Shao, H., and Chang, K. C.-C. (2022). Are large pre-trained language models leaking your personal information? [link].
Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. (2020). Scaling laws for neural language models. [link].
Kucharavy, A., Plancherel, O., Mulder, V., Mermoud, A., and Lenders, V. (2024). Large language models in cybersecurity: Threats, exposure and mitigation. DOI: 10.1007/978-3-031-54827-7.
Lin, L., Mu, H., Zhai, Z., Wang, M., Wang, Y., Wang, R., Gao, J., Zhang, Y., Che, W., Baldwin, T., Han, X., and Li, H. (2024). Against the achilles’ heel: A survey on red teaming for generative models. [link].
Liu, X., Xu, N., Chen, M., and Xiao, C. (2024). Autodan: Generating stealthy jailbreak prompts on aligned large language models. [link].
OpenAI (2025). OpenAI GPT4 Research. [link].
Oussidi, A. and Elhassouny, A. (2018). Deep generative models: Survey. In 2018 International Conference on Intelligent Systems and Computer Vision (ISCV), pages 1–8. DOI: 10.1109/ISACV.2018.8354080.
Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z., and Fredrikson, M. (2023). Universal and transferable adversarial attacks on aligned language models. [link].
Bender, E., Gebru, T., McMillan-Major, A., and Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? pages 610–623. DOI: 10.1145/3442188.3445922.
Huang, J., Shao, H., and Chang, K. C.-C. (2022). Are large pre-trained language models leaking your personal information? [link].
Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. (2020). Scaling laws for neural language models. [link].
Kucharavy, A., Plancherel, O., Mulder, V., Mermoud, A., and Lenders, V. (2024). Large language models in cybersecurity: Threats, exposure and mitigation. DOI: 10.1007/978-3-031-54827-7.
Lin, L., Mu, H., Zhai, Z., Wang, M., Wang, Y., Wang, R., Gao, J., Zhang, Y., Che, W., Baldwin, T., Han, X., and Li, H. (2024). Against the achilles’ heel: A survey on red teaming for generative models. [link].
Liu, X., Xu, N., Chen, M., and Xiao, C. (2024). Autodan: Generating stealthy jailbreak prompts on aligned large language models. [link].
OpenAI (2025). OpenAI GPT4 Research. [link].
Oussidi, A. and Elhassouny, A. (2018). Deep generative models: Survey. In 2018 International Conference on Intelligent Systems and Computer Vision (ISCV), pages 1–8. DOI: 10.1109/ISACV.2018.8354080.
Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z., and Fredrikson, M. (2023). Universal and transferable adversarial attacks on aligned language models. [link].
Published
2025-04-23
How to Cite
ALVES, Ana Carolina Vedoy; PAGLIUSI NETO, Milton Pedro; MIERS, Charles Christian.
Comparative Analysis of Red Team Activities of LLMs with LLMs. In: REGIONAL SCHOOL OF HIGH PERFORMANCE COMPUTING FROM SOUTHERN BRAZIL (ERAD-RS), 25. , 2025, Foz do Iguaçu/PR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 25-28.
ISSN 2595-4164.
DOI: https://doi.org/10.5753/eradrs.2025.6827.
