Accelerating reinforcement learning by reusing abstract policies
Resumo
Reinforcement learning (RL) provides a general approach for developing intelligent agents that are able to optimize their behaviors in stochastic environments. Unfortunately, most work in RL is based on propositional representations, making it difficult to apply it to more complex real-world tasks in which states and actions are more naturally represented in relational form. Moreover, most work in RL does not take into account existing solutions to similar problems when learning a policy to solve a new problem, and consequently solves the new problem from scratch, what can be very time consuming. In this article we explore the powerful possibilities of using relational descriptions so that we can learn abstract policies, and in reusing these policies to improve initial performance of an RL learner in a similar new problem. Experiments carried out attest the effectiveness of our proposal.Referências
Bianchi, R. A. C., Ribeiro, C. H. C., and Costa, A. H. R. (2007). Heuristic selection of actions in multiagent reinforcement learning. In IJCAI, pages 690–695.
Bianchi, R. A. C., Ribeiro, C. H. C., and Costa, A. H. R. (2008). Accelerating autonomous learning by using heuristic selection of actions. Journal of Heuristics, 14(2):135–168.
Blockeel, H. and De Raedt, L. (1997). Top-down induction of logical decision trees. In Artificial Intelligence.
Burkov, A. and Chaib-draa, B. (2007). Adaptive play q-learning with initial heuristic approximation. In ICRA, pages 1749–1754.
do Lago Pereira, S., de Barros, L., and Cozman, F. (2008). Strong probabilistic planning. In Gelbukh, A. and Morales, E., editors, MICAI 2008: Advances in Artificial Intelligence, volume 5317 of Lecture Notes in Computer Science, pages 636–652. Springer Berlin / Heidelberg.
Drummond, C. (2002). Accelerating reinforcement learning by composing solutions of automatically identified subtasks. Journal of Artificial Intelligence Research, 16:59–104.
Kersting, K., Otterlo, M. V., and Raedt, L. D. (2004). Bellman goes relational. In In ICML, pages 465–472. ACM.
Kersting, K., Plagemann, C., Cocora, A., Burgard, W., and Raedt, L. D. (2007). Learning to transfer optimal navigation policies. Advanced Robotics: Special Issue on Imitative Robots, 21(13):1565––1582.
Knox, W. B. and Stone, P. (2010). Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proc. of 9th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2010).
Lloyd, J. W. (1987). Foundations of Logic Programming, 2nd Edition. Springer Verlag.
Madden, M. G. and Howley, T. (2004). Transfer of experience between reinforcement learning environments with progressive difficulty. Artif. Intell. Rev., 21:375–398.
Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, USA, 1st edition.
Sherstov, A. A. and Stone, P. (2005). Improving action selection in MDP’s via knowledge transfer. In Proceedings of the Twentieth National Conference on Artificial Intelligence.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). The MIT Press.
Taylor, M. E. and Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(1):1633–1685.
Uther, W. T. B. and Veloso, M. M. (2002). Ttree: Tree-based state generalization with temporally abstract actions. In In Proceedings of SARA-2002, pages 260–290.
van Otterlo, M. (2004). Reinforcement learning for relational MDPs. In Nowe, A., Lenaerts, T., and Steenhaut, K., editors, Proceedings of the Machine Learning Conference of Belgium and the Netherlands, BeNeLearn ’04, pages 138–145, Brussels. Brussels.
Bianchi, R. A. C., Ribeiro, C. H. C., and Costa, A. H. R. (2008). Accelerating autonomous learning by using heuristic selection of actions. Journal of Heuristics, 14(2):135–168.
Blockeel, H. and De Raedt, L. (1997). Top-down induction of logical decision trees. In Artificial Intelligence.
Burkov, A. and Chaib-draa, B. (2007). Adaptive play q-learning with initial heuristic approximation. In ICRA, pages 1749–1754.
do Lago Pereira, S., de Barros, L., and Cozman, F. (2008). Strong probabilistic planning. In Gelbukh, A. and Morales, E., editors, MICAI 2008: Advances in Artificial Intelligence, volume 5317 of Lecture Notes in Computer Science, pages 636–652. Springer Berlin / Heidelberg.
Drummond, C. (2002). Accelerating reinforcement learning by composing solutions of automatically identified subtasks. Journal of Artificial Intelligence Research, 16:59–104.
Kersting, K., Otterlo, M. V., and Raedt, L. D. (2004). Bellman goes relational. In In ICML, pages 465–472. ACM.
Kersting, K., Plagemann, C., Cocora, A., Burgard, W., and Raedt, L. D. (2007). Learning to transfer optimal navigation policies. Advanced Robotics: Special Issue on Imitative Robots, 21(13):1565––1582.
Knox, W. B. and Stone, P. (2010). Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proc. of 9th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2010).
Lloyd, J. W. (1987). Foundations of Logic Programming, 2nd Edition. Springer Verlag.
Madden, M. G. and Howley, T. (2004). Transfer of experience between reinforcement learning environments with progressive difficulty. Artif. Intell. Rev., 21:375–398.
Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, USA, 1st edition.
Sherstov, A. A. and Stone, P. (2005). Improving action selection in MDP’s via knowledge transfer. In Proceedings of the Twentieth National Conference on Artificial Intelligence.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). The MIT Press.
Taylor, M. E. and Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(1):1633–1685.
Uther, W. T. B. and Veloso, M. M. (2002). Ttree: Tree-based state generalization with temporally abstract actions. In In Proceedings of SARA-2002, pages 260–290.
van Otterlo, M. (2004). Reinforcement learning for relational MDPs. In Nowe, A., Lenaerts, T., and Steenhaut, K., editors, Proceedings of the Machine Learning Conference of Belgium and the Netherlands, BeNeLearn ’04, pages 138–145, Brussels. Brussels.
Publicado
19/07/2011
Como Citar
BERGAMO, Yannick Plaino; MATOS, Tiago; SILVA, Valdinei Freire da; COSTA, Anna Helena Reali.
Accelerating reinforcement learning by reusing abstract policies. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 8. , 2011, Natal/RN.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2011
.
p. 596-606.
ISSN 2763-9061.