Reinforcement Learning for Automated Investment in the Brazilian Stock Market: A Comparative Study of DQN, PPO, and Their Recurrent Versions

Paulo R. Sturion; André Carlos P. de L. F. de Carvalho

doi:10.5753/eniac.2025.13791

Paulo R. Sturion USP
André Carlos P. de L. F. de Carvalho USP

DOI: https://doi.org/10.5753/eniac.2025.13791

Resumo

Este trabalho investiga a aplicação de Aprendizado por Reforço no desenvolvimento de agentes de investimento automatizado no mercado de ações brasileiro (B3), com foco em estratégias de curto prazo (swing trading). Foram comparados os algoritmos DQN, PPO e suas versões recorrentes (R-DQN e R-PPO). O ambiente de simulação utilizou dados históricos de cinco empresas brasileiras, com indicadores técnicos como atributos de entrada. Para garantir comparação justa, controlou-se a exposição aos dados e adotou-se uma função recompensa padronizada. Os resultados mostraram que algoritmos baseados em DQN superaram os de PPO, e que as versões recorrentes tiveram desempenho superior às tradicionais.

Referências

Azhikodan, A. R., Bhat, A. G., and Jadhav, M. V. (2019). Stock trading bot using deep reinforcement learning. In Innovations in Computer Science and Engineering: Proceedings of the Fifth ICICSE 2017. Springer Singapore.

Bilgin, E. (2020). Mastering Reinforcement Learning with Python. Packt Publishing.

Conegundes, L. and Pereira, A. C. M. (2020). Beating the stock market with a deep reinforcement learning day trading system. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE.

Dempster, M. A. and Leemans, V. (2006). An automated fx trading system using adaptive reinforcement learning. Expert Systems with Applications, 30(3):543–552.

Deng, Y. et al. (2016). Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28(3):653–664.

Drori, I. (2023). The Science of Deep Learning. Cambridge University Press.

Du, X., Zhai, J., and Lv, K. (2016). Algorithm trading using q-learning and recurrent reinforcement learning. Positions, 1(1):1–7.

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.

IBM (2024). Estudo ibm: 41% das empresas no brasil já implementaram ativamente ia. [link]. Acessado em 15 de fevereiro de 2024.

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

Moody, J. and Saffell, M. (2001). Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks, 12(4):875–889.

Morales, M. (2020). Grokking Deep Reinforcement Learning. Manning Publications.

Pendharkar, P. C. and Cusatis, P. (2018). Trading financial indices with reinforcement learning agents. Expert Systems with Applications, 103:1–13.

Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3):210–229.

Sattarov, O. et al. (2020). Recommending cryptocurrency trading points with deep reinforcement learning approach. Applied Sciences, 10(4):1506.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

Sun, S., Wang, R., and An, B. (2023). Reinforcement learning for quantitative trading. ACM Transactions on Intelligent Systems and Technology, 14(3):1–29.

Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press, 2nd edition.

Théate, T. and Ernst, D. (2021). An application of deep reinforcement learning to algorithmic trading. Expert Systems with Applications, 173:114632.

Zhang, J. and Maringer, D. (2016). Using a genetic algorithm to improve recurrent reinforcement learning for equity trading. Computational Economics, 47:551–567.

Zhang, Z., Zohren, S., and Roberts, S. (2020). Deep reinforcement learning for trading. The Journal of Financial Data Science.