Genetic Algorithm for Interpretability in Vision Language Models

  • Marcelo H. L. Barreto UFS
  • Cristiano L. Oliveira UFS
  • Flávio A. O. Santos UFPE
  • Paulo Novais UFS
  • Leonardo N. Matos UFS
  • André Britto UFS

Resumo


The rapid advancement of deep learning has highlighted the need for methods capable of explaining model decisions to ensure transparency and trust in AI systems. In this context, Explainable Artificial Intelligence (XAI) has emerged as a key research area. This work introduces an optimization-based approach to interpretability, identifying the most and least semantically relevant regions of an image concerning textual descriptors. We formulate the task as an optimization problem and employ a genetic algorithm to explore the solution space, maximizing or minimizing the semantic alignment between image regions and text. We used a customized dataset derived from ImageNet, containing 1.000 classes, each with 5 images and 6–8 textual descriptors to evaluate our method. Several established interpretability methods were employed for comparison, including Saliency, Guided Backpropagation, Grad-CAM, Gradient Shap, Integrated Gradients, and InputXGradient. Experimental results show that our approach consistently outperforms baseline methods, achieving higher similarity scores between selected regions and their associated descriptors. Our findings demonstrate the effectiveness of genetic algorithms in solving interpretability problems, offering a flexible and scalable method for XAI.
Publicado
29/09/2025
BARRETO, Marcelo H. L.; OLIVEIRA, Cristiano L.; SANTOS, Flávio A. O.; NOVAIS, Paulo; MATOS, Leonardo N.; BRITTO, André. Genetic Algorithm for Interpretability in Vision Language Models. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 35. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 600-614. ISSN 2643-6264.