Dead-End Discovery and Secure Exploration via Large Language Models
Resumo
A common issue in stochastic shortest path problems (SSPs) is the presence of non-terminal dead-end states - states from which reaching the agent’s goal is impossible. After reaching a non-terminal dead-end state, the agent may continue interacting with the environment for a long time in a dead-end path. As a consequence, reinforcement learning (RL) techniques may require a prohibitively large number of steps before convergence. Therefore, it is crucial to develop efficient methods to discover and avoid dead-end states. Recently, it has been shown that large language models (LLMs) can enhance various aspects of RL, such as reward shaping, fail state prevention, and goal-directed action generation. In this work, we use LLMs knowledge to discover dead-end states to perform a secure exploration. The secure property of this exploration helps to avoid dead-end states. We propose LLM-based methods for dead-end discovery and exploration that can be used with different optimization criteria. Additionally, we apply them to improve the efficiency and safety of the Q-learning-eGUBS+Cmax algorithm - an RL algorithm based on the theory of expected utility that allows a balance with meaningful, between cost and probability to reach the goal. The experimental results show that our proposed methods can significantly reduce the likelihood of encountering dead-end states, leading to better performance than state-of-the-art approaches.
Publicado
29/09/2025
Como Citar
POLAR, Christian Delgado; BARROS, Leliane Nunes de; FREIRE, Valdinei; DELGADO, Karina Valdivia.
Dead-End Discovery and Secure Exploration via Large Language Models. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 35. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 331-346.
ISSN 2643-6264.
