Dead-End Discovery and Secure Exploration via Large Language Models

Christian Delgado Polar; Leliane Nunes de Barros; Valdinei Freire; Karina Valdivia Delgado

Christian Delgado Polar USP / Universidad Católica San Pablo
Leliane Nunes de Barros USP
Valdinei Freire USP
Karina Valdivia Delgado USP

Resumo

A common issue in stochastic shortest path problems (SSPs) is the presence of non-terminal dead-end states - states from which reaching the agent’s goal is impossible. After reaching a non-terminal dead-end state, the agent may continue interacting with the environment for a long time in a dead-end path. As a consequence, reinforcement learning (RL) techniques may require a prohibitively large number of steps before convergence. Therefore, it is crucial to develop efficient methods to discover and avoid dead-end states. Recently, it has been shown that large language models (LLMs) can enhance various aspects of RL, such as reward shaping, fail state prevention, and goal-directed action generation. In this work, we use LLMs knowledge to discover dead-end states to perform a secure exploration. The secure property of this exploration helps to avoid dead-end states. We propose LLM-based methods for dead-end discovery and exploration that can be used with different optimization criteria. Additionally, we apply them to improve the efficiency and safety of the Q-learning-eGUBS+C_max algorithm - an RL algorithm based on the theory of expected utility that allows a balance with meaningful, between cost and probability to reach the goal. The experimental results show that our proposed methods can significantly reduce the likelihood of encountering dead-end states, leading to better performance than state-of-the-art approaches.