DreamerRL: Um Framework de RL para o Desenvolvimento Autônomo em Robótica Humanoide
Resumo
A aquisição de habilidades motoras e cognitivas em robôs humanoides ainda depende fortemente de engenharia de tarefas e instruções explícitas, restringindo-os a cenários limitados e pré-definidos. Nosso trabalho, o framework DreamerRL, supera essas limitações ao permitir que o agente aprenda habilidades de forma totalmente autônoma, guiado unicamente por sua curiosidade em explorar e entender o ambiente. Em experimentos realizados com o robô humanoide NAO, demonstramos que o DreamerRL avança o estado da arte ao possibilitar a emergência espontânea de comportamentos manipulativos complexos e habilidades cognitivas essenciais, tipicamente observadas em crianças de até três anos, sem a necessidade de intensa engenharia humana.Referências
Bengio, Y., Louradour, J., Collobert, R., and Weston, J. (2009). Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning ICML ’09, pages 1–8, Montreal, Quebec, Canada. ACM Press.
Cleveston, I., Santana, A. C., Costa, P. D., Gudwin, R. R., Simões, A. S., and Colombini, E. L. (2025). Instructrobot: A model-free framework for mapping natural language instructions into robot motion. arXiv preprint arXiv:2502.12861.
Colas, C., Karch, T., Lair, N., Dussoux, J.-M., Moulin-Frier, C., Dominey, P., and Oudeyer, P.-Y. (2020). Language as a cognitive tool to imagine goals in curiosity driven exploration. Advances in Neural Information Processing Systems, 33:3761–3774.
Craik, K. J. W. (1967). The nature of explanation, volume 445. CUP Archive.
de Santana Correia, A. and Colombini, E. L. (2022). Attention, please! a survey of neural attention models in deep learning. Artificial Intelligence Review, 55(8):6037–6124.
Gottlieb, J., Oudeyer, P.-Y., Lopes, M., and Baranes, A. (2013). Information-seeking, curiosity, and attention: computational and neural mechanisms. Trends in cognitive sciences, 17(11):585–593.
Hawkins, J., Ahmad, S., and Cui, Y. (2017). A theory of how columns in the neocortex enable learning the structure of the world. Frontiers in neural circuits, 11:295079.
LeCun, Y. (2022). A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62(1):1–62.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature, 521(7553):436–444.
Rohmer, E., Singh, S. P., and Freese, M. (2013). V-rep: A versatile and scalable robot simulation framework. In 2013 IEEE/RSJ international conference on intelligent robots and systems, pages 1321–1326. IEEE.
Santana, A. and Colombini, E. (2021). Neural attention models in deep learning: Survey and taxonomy. arXiv preprint arXiv:2112.05909.
Santana, A., Costa, P. P., and Colombini, E. L. (2025). Learning to explore with predictive world model via self-supervised learning. arXiv preprint arXiv:2502.13200.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Shah, R. and Kumar, V. (2021). Rrl: Resnet as representation for reinforcement learning. arXiv preprint arXiv:2107.03380.
Shamsuddin, S., Ismail, L. I., Yussof, H., Zahari, N. I., Bahari, S., Hashim, H., and Jaffar, A. (2011). Humanoid robot nao: Review of control and motion exploration. In 2011 IEEE international conference on Control System, Computing and Engineering, pages 511–516. IEEE.
Sun, Z., Pang, B., Yuan, X., Xu, X., Song, Y., Song, R., and Li, Y. (2025). Hierarchical reinforcement learning with curriculum demonstrations and goal-guided policies for sequential robotic manipulation. Engineering Applications of Artificial Intelligence, 153:110866.
Wang, Z., Bovik, A. C., and Sheikh, H. R. (2017). Structural similarity based image quality assessment. In Digital Video image quality and perceptual coding, pages 225–242. CRC Press.
Cleveston, I., Santana, A. C., Costa, P. D., Gudwin, R. R., Simões, A. S., and Colombini, E. L. (2025). Instructrobot: A model-free framework for mapping natural language instructions into robot motion. arXiv preprint arXiv:2502.12861.
Colas, C., Karch, T., Lair, N., Dussoux, J.-M., Moulin-Frier, C., Dominey, P., and Oudeyer, P.-Y. (2020). Language as a cognitive tool to imagine goals in curiosity driven exploration. Advances in Neural Information Processing Systems, 33:3761–3774.
Craik, K. J. W. (1967). The nature of explanation, volume 445. CUP Archive.
de Santana Correia, A. and Colombini, E. L. (2022). Attention, please! a survey of neural attention models in deep learning. Artificial Intelligence Review, 55(8):6037–6124.
Gottlieb, J., Oudeyer, P.-Y., Lopes, M., and Baranes, A. (2013). Information-seeking, curiosity, and attention: computational and neural mechanisms. Trends in cognitive sciences, 17(11):585–593.
Hawkins, J., Ahmad, S., and Cui, Y. (2017). A theory of how columns in the neocortex enable learning the structure of the world. Frontiers in neural circuits, 11:295079.
LeCun, Y. (2022). A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62(1):1–62.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature, 521(7553):436–444.
Rohmer, E., Singh, S. P., and Freese, M. (2013). V-rep: A versatile and scalable robot simulation framework. In 2013 IEEE/RSJ international conference on intelligent robots and systems, pages 1321–1326. IEEE.
Santana, A. and Colombini, E. (2021). Neural attention models in deep learning: Survey and taxonomy. arXiv preprint arXiv:2112.05909.
Santana, A., Costa, P. P., and Colombini, E. L. (2025). Learning to explore with predictive world model via self-supervised learning. arXiv preprint arXiv:2502.13200.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Shah, R. and Kumar, V. (2021). Rrl: Resnet as representation for reinforcement learning. arXiv preprint arXiv:2107.03380.
Shamsuddin, S., Ismail, L. I., Yussof, H., Zahari, N. I., Bahari, S., Hashim, H., and Jaffar, A. (2011). Humanoid robot nao: Review of control and motion exploration. In 2011 IEEE international conference on Control System, Computing and Engineering, pages 511–516. IEEE.
Sun, Z., Pang, B., Yuan, X., Xu, X., Song, Y., Song, R., and Li, Y. (2025). Hierarchical reinforcement learning with curriculum demonstrations and goal-guided policies for sequential robotic manipulation. Engineering Applications of Artificial Intelligence, 153:110866.
Wang, Z., Bovik, A. C., and Sheikh, H. R. (2017). Structural similarity based image quality assessment. In Digital Video image quality and perceptual coding, pages 225–242. CRC Press.
Publicado
13/10/2025
Como Citar
CORREIA, Alana de Santana; COSTA, Paula Dornhofer Paro; COLOMBINI, Esther Luna.
DreamerRL: Um Framework de RL para o Desenvolvimento Autônomo em Robótica Humanoide. In: CONCURSO DE TESES E DISSERTAÇÕES EM ROBÓTICA - CTDR (DOUTORADO) - SIMPÓSIO BRASILEIRO DE ROBÓTICA E SIMPÓSIO LATINO-AMERICANO DE ROBÓTICA (SBR/LARS), 16. , 2025, Vitória/ES.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 73-84.
DOI: https://doi.org/10.5753/sbrlars_estendido.2025.248267.
