Differentiable Planning with Indefinite Horizon

Daniel B. Dias; Leliane N. de Barros; Karina V. Delgado; Denis D. Mauá

doi:10.5753/kdmile.2022.227974

Daniel B. Dias Universidade de São Paulo
Leliane N. de Barros Universidade de São Paulo
Karina V. Delgado Universidade de São Paulo
Denis D. Mauá Universidade de São Paulo

DOI: https://doi.org/10.5753/kdmile.2022.227974

Resumo

With the recent advances in automated planning based on deep-learning techniques, Deep Reactive Policies (DRPs) have been shown as a powerful framework to solve Markov Decision Processes (MDPs) with a certain degree of complexity, like MDPs with continuous action-state spaces and exogenous events. Some differentiable planning algorithms can learn these policies through policy-gradient techniques considering a finite horizon MDP. However, for certain domains, we do not know the ideal size of the horizon needed to find an optimal solution, even when we have a planning goal description, that can either be a simple reachability goal or a complex goal involving path optimization. This work aims to solve a continuous MDP through differentiable planning, considering the problem horizon as a hyperparameter that can be adjusted for a DRP training process. This preliminary investigation show that it is possible to find better policies by choosing a horizon that encompasses the planning goal.

Palavras-chave: continuous state and action planning, Markov decision processes, machine learning, differentiable planning

Referências

Blum, A. L. and Furst, M. L. Fast planning through planning graph analysis. Artificial Intelligence 90 (1): 281–300, Feb., 1997.

Boutilier, C., Dean, T., and Hanks, S. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage. Journal of Artificial Intelligence Research vol. 11, pp. 1–94, July, 1999.

Bueno, T. P. Planning in stochastic computation graphs: solving stochastic nonlinear problems with backpropagation. Ph.D. thesis, Universidade de São Paulo, 2021.

Bueno, T. P., De Barros, L. N., Mauá, D. D., and Sanner, S. Deep Reactive Policies for Planning in Stochastic Nonlinear Domains. Proceedings of the AAAI Conference on Artificial Intelligence vol. 33, pp. 7530–7537, July, 2019.

Faulwasser, T. and Findeisen, R. Nonlinear Model Predictive Path-Following Control. In Nonlinear Model Predictive Control: Towards New Challenging Applications, L. Magni, D. M. Raimondo, and F. Allgöwer (Eds.). Lecture Notes in Control and Information Sciences. Springer, Berlin, Heidelberg, pp. 335–343, 2009.

Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Networks 4 (2): 251–257, Jan., 1991.

Kingma, D. P. and Welling, M. Auto-Encoding Variational Bayes, 2014. arXiv:1312.6114 [cs, stat].

Kocsis, L. and Szepesvári, C. Bandit Based Monte-Carlo Planning. In Machine Learning: ECML 2006, J. Fürnkranz, T. Scheffer, and M. Spiliopoulou (Eds.). Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp. 282–293, 2006.

Leshno, M., Lin, V. Y., Pinkus, A., and Schocken, S. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks 6 (6): 861–867, Jan., 1993.

Puterman, M. L. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 2014.

Sanner, S., Delgado, K. V., and de Barros, L. N. Symbolic dynamic programming for discrete and continuous state MDPs. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. UAI’11. AUAI Press, Arlington, Virginia, USA, pp. 643–652, 2011.

Schulman, J. Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs. Ph.D. thesis, UC Berkeley, 2016.

Schulman, J., Heess, N., Weber, T., and Abbeel, P. Gradient Estimation Using Stochastic Computation Graphs. In Advances in Neural Information Processing Systems. Vol. 28. Curran Associates, Inc., 2015.

Sutton, R. S., McAllester, D., Singh, S., and Mansour, Y. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Advances in Neural Information Processing Systems. Vol. 12. MIT Press, 1999.

Trevizan, F. and Veloso, M. Short-Sighted Stochastic Shortest Path Problems. Proceedings of the International Conference on Automated Planning and Scheduling vol. 22, pp. 288–296, May, 2012.

Vianna, L. G. R., de Barros, L. N., and Sanner, S. Real-time symbolic dynamic programming for hybrid MDPs. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI’15. AAAI Press, Austin, Texas, pp. 3402–3408, 2015.

Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8 (3): 229–256, May, 1992.

Wu, G., Say, B., and Sanner, S. Scalable Planning with Tensorflow for Hybrid Nonlinear Domains. In Advances in Neural Information Processing Systems. Vol. 30. Curran Associates, Inc., 2017.