Risk Sensitive Probabilistic Planning with ILAO* and Exponential Utility Function

Elthon Manhas de Freitas; Karina Valdivia Delgado; Valdinei Freire

doi:10.5753/eniac.2018.4434

Elthon Manhas de Freitas USP
Karina Valdivia Delgado USP
Valdinei Freire USP

DOI: https://doi.org/10.5753/eniac.2018.4434

Resumo

Markov Decision Process (MDP) has been used very efficiently to solve sequential decision-making problems. However, there are problems in which dealing with the risks of the environment to obtain a reliable result is more important than minimizing the total expected cost. MDPs that deal with this type of problem are called risk-sensitive Markov decision processes (RSMDP). In this paper we propose an efficient heuristic search algorithm that allows to obtain a solution by evaluating only the relevant states to reach the goal states starting from an initial state.

Referências

Bertsekas, D. P. and Tsitsiklis, J. N. (1991). An analysis of stochastic shortest path problems. Mathematics of Operations Research, 16(3):580–595.

Denardo, E. V. and Rothblum, U. G. (1979). Optimal stopping, exponential utility, and linear programming. Mathematical Programming, 16(1):228–244.

Filar, J. A., Kallenberg, L. C. M., and Lee, H.-M. (1989). Variance-penalized Markov decision processes. Mathematics of Operations Research, 14(1):147–161.

Filar, J. A., Krass, D., Ross, K.W., and Ross, K.W. (1995). Percentile performance criteria for limiting average Markov decision processes. IEEE Transactions on Automatic Control, 40(1):2–10.

Freire, V. and Delgado, K. V. (2016). Extreme risk averse policy for goal-directed risksensitive Markov decision process. In 5th Brazilian Conference on Intelligent Systems (BRACIS), pages 79–84.

Freire, V. and Delgado, K. V. (2017). GUBS: A utility-based semantic for goal-directed markov decision processes. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pages 741–749.

Geffner, H. and Bonet, B. (2013). A concise introduction to models and methods for automated planning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 8(1):1–141.

Hansen, E. A. and Zilberstein, S. (2001). Lao*: A heuristic search algorithm that finds solutions with loops. Artificial Intelligence, 129(1):35 – 62.

Howard, R. A. and Matheson, J. E. (1972). Risk-sensitive markov decision processes. Management science, 18(7):356–369.

Jaquette, S. C. (1976). A utility criterion for Markov decision processes. Management Science, 23(1):43–49.

Moldovan, T. M. and Abbeel, P. (2012). Risk aversion in Markov decision processes via near optimal Chernoff bounds. In Advances in Neural Information Processing Systems, NIPS 2012, pages 3131–3139.

Patek, S. D. (2001). On terminating markov decision processes with a risk-averse objective function. Automatica, 37(9):1379–1386.

Puterman, M. L. (2014). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.

Rothblum, U. G. (1984). Multiplicative Markov decision chains. Mathematics of Operations Research, 9(1):6–24.

Sobel, M. J. (1982). The variance of discounted Markov decision processes. Journal of Applied Probability, 19(4):794–802.

Risk Sensitive Probabilistic Planning with ILAO* and Exponential Utility Function

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)