Towards the Integration of Reinforcement Learning into MASPY

Alexandre L. L. Mellado; André Pinz Borges; Rafael C. Cardoso; Gleifer Vaz Alves

doi:10.5753/wesaac.2025.37544

Alexandre L. L. Mellado UTFPR
André Pinz Borges UTFPR
Rafael C. Cardoso University of Aberdeen
Gleifer Vaz Alves UTFPR

DOI: https://doi.org/10.5753/wesaac.2025.37544

Resumo

Learning in symbolic agent architectures remains a key challenge in the development of adaptive multi-agent systems. This paper introduces a learning module for MASPY, a Python-based framework inspired by the Belief-Desire-Intention (BDI) model. The module enables agents to learn optimal actions using tabular reinforcement learning algorithms, such as Q-Learning and SARSA. To support this, we propose the SART methodology, which decomposes the learning environment into four structured components: States, Actions, Rewards, and Transitions. This structure allows MASPY agents to perceive their environment through defined percepts, act through decorated functions, and adapt over time using discrete learning strategies. The learning module offers a unified Python-based architecture for symbolic reasoning agents that learn through reinforcement training. This is shown practically with a toy problem where agents are able to learn to execute the actions of a previously unknown environment.

Referências

Bosello, M. and Ricci, A. (2019). From programming agents to educating agents - A jason-based framework for integrating learning in the development of cognitive agents. In Dennis, L. A., Bordini, R. H., and Lespérance, Y., editors, Engineering Multi-Agent Systems - 7th International Workshop, EMAS 2019, Montreal, QC, Canada, May 13-14, 2019, Revised Selected Papers, volume 12058 of Lecture Notes in Computer Science, pages 175–194. Springer.

Bratman, M. (1987). Intention, Plans, and Practical Reason. Cambridge: Cambridge, MA: Harvard University Press.

Hu, K., Li, M., Song, Z., Xu, K., Xia, Q., Sun, N., Zhou, P., and Xia, M. (2024). A review of research on reinforcement learning algorithms for multi-agents. Neurocomputing, page 128068.

Mellado, A. L. L., G., F. I., Alves, G. V., and Borges, A. P. (2023). Maspy: Towards the creation of bdi multi-agent systems. In Proceedings of the 17th Workshop-School on Agents, Environments, and Applications (WESAAC 2023), pages 106–117.

Patrascu, A. T. (2025). Constructive symbolic reinforcement learning via intuitionistic logic and goal-chaining inference.

Sarathy, V., Kasenberg, D., Goel, S., Sinapov, J., and Scheutz, M. (2020). Spotter: Extending symbolic planning operators through targeted reinforcement learning.

Shindo, H., Delfosse, Q., Dhami, D. S., and Kersting, K. (2025). Blendrl: A framework for merging symbolic and neural policy learning.

Subramanian, C., Liu, M., Khan, N., Lenchner, J., Amarnath, A., Swaminathan, S., Riegel, R., and Gray, A. (2024). A neuro-symbolic approach to multi-agent rl for interpretability and probabilistic decision making.

Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT press.

Zhu, C., Dastani, M., and Wang, S. (2024). A survey of multi-agent deep reinforcement learning with communication. Autonomous Agents and Multi-Agent Systems, 38(1):4.

Zou, J., Zhang, X., He, Y., Zhu, N., and Leng, T. (2024). Fgeo-drl: Deductive reasoning for geometric problems through deep reinforcement learning.