Flexible Deliberation Costs via Inversely Proportional Decay in the Option-Critic Architecture

Augusto Antônio Fontanive Leal; Mateus Begnini Melchiades; Gabriel de Oliveira Ramos

doi:10.5753/eramiars.2025.16624

Augusto Antônio Fontanive Leal UNISINOS
Mateus Begnini Melchiades UNISINOS
Gabriel de Oliveira Ramos UNISINOS

DOI: https://doi.org/10.5753/eramiars.2025.16624

Resumo

Propõe-se um custo de deliberação flexível para a arquitetura option-critic, definido como uma função inversa da duração das options. Esse custo adaptativo reduz a sensibilidade a hiperparâmetros, melhora a especialização e estabilidade das options, evita a degeneração em ações primitivas e promove comportamentos mais coerentes.

Referências

Bacon, P.-L., Harb, J., and Precup, D. (2017). The option-critic architecture. Proceedings of the AAAI 2017, 31(1).

Barto, A. G., Sutton, R. S., and Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13(5):834–846.

Harb, J., Bacon, P.-L., Klissarov, M., and Precup, D. (2018). When waiting is not an option: Learning options with a deliberation cost. Proc. of AAAI 2018, 32(1).

Harutyunyan, A., Dabney, W., Borsa, D., Heess, N., Munos, R., and Precup, D. (2019). The termination critic. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, pages 2231–2240.

Schulman, J., Moritz, P., Levine, S., Jordan, M., and Abbeel, P. (2018). High-dimensional continuous control using generalized advantage estimation.

Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press, Cambridge, 2nd edition.

Sutton, R. S., Precup, D., and Singh, S. (1999). Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1):181–211.

Szepesvári, C. (2010). Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Springer Cham, 1 edition.

Wawrzyński, P. (2009). A cat-like robot real-time learning to run. In Adaptive and Natural Computing Algorithms, pages 380–390, Berlin. Springer Berlin Heidelberg.