Risk Sensitive Probabilistic Planning with ILAO* and Exponential Utility Function

  • Elthon Manhas de Freitas USP
  • Karina Valdivia Delgado USP
  • Valdinei Freire USP

Resumo


Markov Decision Process (MDP) has been used very efficiently to solve sequential decision-making problems. However, there are problems in which dealing with the risks of the environment to obtain a reliable result is more important than minimizing the total expected cost. MDPs that deal with this type of problem are called risk-sensitive Markov decision processes (RSMDP). In this paper we propose an efficient heuristic search algorithm that allows to obtain a solution by evaluating only the relevant states to reach the goal states starting from an initial state.

Referências


Bertsekas, D. P. and Tsitsiklis, J. N. (1991). An analysis of stochastic shortest path problems. Mathematics of Operations Research, 16(3):580–595.

Denardo, E. V. and Rothblum, U. G. (1979). Optimal stopping, exponential utility, and linear programming. Mathematical Programming, 16(1):228–244.

Filar, J. A., Kallenberg, L. C. M., and Lee, H.-M. (1989). Variance-penalized Markov decision processes. Mathematics of Operations Research, 14(1):147–161.

Filar, J. A., Krass, D., Ross, K.W., and Ross, K.W. (1995). Percentile performance criteria for limiting average Markov decision processes. IEEE Transactions on Automatic Control, 40(1):2–10.

Freire, V. and Delgado, K. V. (2016). Extreme risk averse policy for goal-directed risksensitive Markov decision process. In 5th Brazilian Conference on Intelligent Systems (BRACIS), pages 79–84.

Freire, V. and Delgado, K. V. (2017). GUBS: A utility-based semantic for goal-directed markov decision processes. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pages 741–749.

Geffner, H. and Bonet, B. (2013). A concise introduction to models and methods for automated planning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 8(1):1–141.

Hansen, E. A. and Zilberstein, S. (2001). Lao*: A heuristic search algorithm that finds solutions with loops. Artificial Intelligence, 129(1):35 – 62.

Howard, R. A. and Matheson, J. E. (1972). Risk-sensitive markov decision processes. Management science, 18(7):356–369.

Jaquette, S. C. (1976). A utility criterion for Markov decision processes. Management Science, 23(1):43–49.

Moldovan, T. M. and Abbeel, P. (2012). Risk aversion in Markov decision processes via near optimal Chernoff bounds. In Advances in Neural Information Processing Systems, NIPS 2012, pages 3131–3139.

Patek, S. D. (2001). On terminating markov decision processes with a risk-averse objective function. Automatica, 37(9):1379–1386.

Puterman, M. L. (2014). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.

Rothblum, U. G. (1984). Multiplicative Markov decision chains. Mathematics of Operations Research, 9(1):6–24.

Sobel, M. J. (1982). The variance of discounted Markov decision processes. Journal of Applied Probability, 19(4):794–802.

Publicado
22/10/2018
Como Citar

Selecione um Formato
DE FREITAS, Elthon Manhas; DELGADO, Karina Valdivia; FREIRE, Valdinei. Risk Sensitive Probabilistic Planning with ILAO* and Exponential Utility Function. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 15. , 2018, São Paulo. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 401-412. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2018.4434.