Dead-End Discovery and Secure Exploration via Large Language Models

  • Christian Delgado Polar USP / Universidad Católica San Pablo
  • Leliane Nunes de Barros USP
  • Valdinei Freire USP
  • Karina Valdivia Delgado USP

Resumo


A common issue in stochastic shortest path problems (SSPs) is the presence of non-terminal dead-end states - states from which reaching the agent’s goal is impossible. After reaching a non-terminal dead-end state, the agent may continue interacting with the environment for a long time in a dead-end path. As a consequence, reinforcement learning (RL) techniques may require a prohibitively large number of steps before convergence. Therefore, it is crucial to develop efficient methods to discover and avoid dead-end states. Recently, it has been shown that large language models (LLMs) can enhance various aspects of RL, such as reward shaping, fail state prevention, and goal-directed action generation. In this work, we use LLMs knowledge to discover dead-end states to perform a secure exploration. The secure property of this exploration helps to avoid dead-end states. We propose LLM-based methods for dead-end discovery and exploration that can be used with different optimization criteria. Additionally, we apply them to improve the efficiency and safety of the Q-learning-eGUBS+Cmax algorithm - an RL algorithm based on the theory of expected utility that allows a balance with meaningful, between cost and probability to reach the goal. The experimental results show that our proposed methods can significantly reduce the likelihood of encountering dead-end states, leading to better performance than state-of-the-art approaches.
Publicado
29/09/2025
POLAR, Christian Delgado; BARROS, Leliane Nunes de; FREIRE, Valdinei; DELGADO, Karina Valdivia. Dead-End Discovery and Secure Exploration via Large Language Models. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 35. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 331-346. ISSN 2643-6264.