A Unified Framework for Average Reward Criterion and Risk

  • Willy Arthur Silva Reis USP
  • Karina Valdivia Delgado USP
  • Valdinei Freire USP

Resumo


The average reward criterion is used to solve infinite-horizon MDPs. This risk-neutral criterion depends on the stochastic process in the limit and can use (i) the accumulated reward at infinity, which considers sequences of states of size h = ∞, or (ii) the steady state distribution of the MDP (i.e., the probability that the system is in each state in the long term), which considers sequences of states of size h = 1. In many situations, it is desirable to consider risk during the process at each stage, which can be achieved with the average reward criterion using a utility function or a risk measure such as VaR and CVaR. The objective of this work is to propose a mathematical framework that allows a unified treatment of the existing literature using average reward and risk, including works that use exponential utility functions and CVaR, as well as to include interpretations with 1 ≤ h ≤ ∞ not present in the literature. These new interpretations allow differentiating policies that may not be distinguished from existing criteria. A numerical example shows the behaviors of the criteria considering this new framework.

Publicado
17/11/2024
REIS, Willy Arthur Silva; DELGADO, Karina Valdivia; FREIRE, Valdinei. A Unified Framework for Average Reward Criterion and Risk. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 96-110. ISSN 2643-6264.