Generative Models Ethical Evaluation Approach: Industry Case Studies
Resumo
The advance of AI, mainly in image and text generative models, has accelerated the AI presence in many applications in our daily lives. However, the access to this tool for a large audience implies rigorous evaluation to avoid output biases, discrimination and other ethical problems. This paper describes the evaluation method focused on generative AI and LLM to identify toxicity of outputs. The case studies showed the importance of model evaluation approach from identification of gender biases in an image generative model, and demonstrated that it’s quite simple to circumvent the guardrails of State of The Art LLMs in order to obtain harmful outputs.
Palavras-chave:
Large Language Models, Models Ethical Evaluation, Gender Bias
Referências
Cho, J., Zala, A., and Bansal, M. (2023). Dall-eval: Probing the reasoning skills and social biases of text-to-image generation models.
Mehrotra, A., Zampetakis, M., Kassianik, P., Nelson, B., Anderson, H., Singer, Y., and
Karbasi, A. (2024). Tree of attacks: Jailbreaking black-box llms automatically. Saharia, C., et al (2022). Photorealistic text-to-image diffusion models with deep language understanding.
Zhang, X., Zhang, C., Li, T., Huang, Y., Jia, X., Hu, M., Zhang, J., Liu, Y., Ma, S., and Shen, C. (2024). Jailguard: A universal detection framework for LLM prompt-based attacks.
Mehrotra, A., Zampetakis, M., Kassianik, P., Nelson, B., Anderson, H., Singer, Y., and
Karbasi, A. (2024). Tree of attacks: Jailbreaking black-box llms automatically. Saharia, C., et al (2022). Photorealistic text-to-image diffusion models with deep language understanding.
Zhang, X., Zhang, C., Li, T., Huang, Y., Jia, X., Hu, M., Zhang, J., Liu, Y., Ma, S., and Shen, C. (2024). Jailguard: A universal detection framework for LLM prompt-based attacks.
Publicado
27/11/2024
Como Citar
SOUZA, Cristian; NASCIMENTO, Diogo T.; RAMALHO, Lucas L.; COLLINS, Eliane.
Generative Models Ethical Evaluation Approach: Industry Case Studies. In: CONFERÊNCIA LATINO-AMERICANA DE ÉTICA EM INTELIGÊNCIA ARTIFICIAL, 1. , 2024, Niteroi.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 57-60.
DOI: https://doi.org/10.5753/laai-ethics.2024.32451.