Reinforcement Learning in Non-Stationary Continuous Time and Space Scenarios
Resumo
In this paper we propose a neural architecture for solving continuous time and space reinforcement learning problems in non-stationary environments. The method is based on a mechanism for creating, updating and selecting partial models of the environment. The partial models are incrementally estimated using linear approximation functions and are built according to the system’s capability of making predictions regarding a given sequence of observations. We propose, formalize and show the efficiency of this method in the non-stationary pendulum task. We show that the neural architecture with context detection performs better than a model-based RL algorithm and that it performs almost as well as the optimum, that is, a hypothetical system with extended sensor capabilities in a way that the environment effectively appears to be stationary. Finally, we present known limitations of the method and future works.Referências
Choi, S. P. M., Yan-Yeung, D., and Zhang, N. L. (2001). Hidden-mode markov decision processes for nonstationary sequential decision making. In Sequence Learning - Paradigms, Algorithms, and Applications, pages 264–287, London, UK. Springer-Verlag.
Doya, K. (1996). Temporal difference learning in continuous time and space. In Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E., editors, Proceedings of the conference on Advances in Neural Information Processing Systems, volume 8, pages 1073–1079. The MIT Press.
Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12(1):219–245.
Doya, K., Samejima, K., Katagiri, K.-I., and Kawato, M. (2002). Multiple model-based reinforcement learning. Neural Computation, 14(6):1347–1369.
Santamaría, J. C., Sutton, R. S., and Ram, A. (1997). Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior, 6(2):163–217.
Silva, B. C., Basso, E. W., Bazzan, A. L., and Engel, P. M. Dealing with non-stationary environments using context detection. In 23th International Conference on Machine Learning - (ICML 2006), pages 217–224.
Doya, K. (1996). Temporal difference learning in continuous time and space. In Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E., editors, Proceedings of the conference on Advances in Neural Information Processing Systems, volume 8, pages 1073–1079. The MIT Press.
Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12(1):219–245.
Doya, K., Samejima, K., Katagiri, K.-I., and Kawato, M. (2002). Multiple model-based reinforcement learning. Neural Computation, 14(6):1347–1369.
Santamaría, J. C., Sutton, R. S., and Ram, A. (1997). Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior, 6(2):163–217.
Silva, B. C., Basso, E. W., Bazzan, A. L., and Engel, P. M. Dealing with non-stationary environments using context detection. In 23th International Conference on Machine Learning - (ICML 2006), pages 217–224.
Publicado
20/07/2009
Como Citar
BASSO, Eduardo W.; ENGEL, Paulo M..
Reinforcement Learning in Non-Stationary Continuous Time and Space Scenarios. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 7. , 2009, Bento Gonçalves/RS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2009
.
p. 61-70.
ISSN 2763-9061.
