Distributional Safety Critic for Stochastic Latent Actor-Critic

  • Thiago S. Miranda Universidade Federal de Juiz de Fora
  • Heder S. Bernardino Universidade Federal de Juiz de Fora


When employing reinforcement learning techniques in real-world applications, one may desire to constrain the agent by limiting actions that lead to potential damage, harm, or unwanted scenarios. Particularly, recent approaches focus on developing safe behavior under partial observability conditions. In this vein, we develop a method that combines distributional reinforcement learning techniques with methods used to facilitate learning in partially observable environments, called distributional safe stochastic latent actor-critic (DS-SLAC). We evaluate the DS-SLAC performance on four Safety-Gym tasks and DS-SLAC obtained results better than those reached by state-of-the-art algorithms in two of the evaluated environments while being able to develop a safe policy in three of them. Lastly, we also identify the main challenges of performing distributional reinforcement learning in the safety-constrained partially observable setting.

Palavras-chave: Reinforcement Learning, Safe Reinforcement Learning


MIRANDA, Thiago S.; BERNARDINO, Heder S.. Distributional Safety Critic for Stochastic Latent Actor-Critic. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 20. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 1114-1128. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2023.234620.