Toll-based Q-learning with non-cooperative agents

  • Timóteo Fonseca Santos Universidade Federal do Amazonas
  • Moisés Gomes de Carvalho Universidade Federal do Amazonas


Congestion is a recurring problem in cities that leads to productivity loss, pollution, and reduced quality of life. Existing traffic congestion resolution techniques are often ineffective or costly. Mathematical analysis and virtual simulation are useful tools to assess the cost-effectiveness of such approaches. Toll-based approaches offer a theoretical foundation for addressing this issue. However, the assumption that all drivers pay tolls may limit real-world efficiency due to non-compliance or economic constraints. This work explores the impacts of different levels of cooperation in toll systems, addressing these challenges. We adapt an existing toll-based approach to handle various scenarios and investigate the feasibility of gradual adoption. Our findings demonstrate that the toll system can be gradually implemented, yielding steady gains and avoiding chaotic behavior, even with non-cooperative agents.
Palavras-chave: machine learning, mct, q-learning, tq-learning, traffic congestion


Beckmann, M., McGuire, C. B., Winsten, C. B., and Koopmans, T. C. Studies in the economics of transportation. The Economic Journal 67 (265): 116–118, 1957.

Braess, D. Über ein paradoxon aus der verkehrsplanung. Unternehmensforschung Operations Research - Recherche Opérationnelle vol. 12, pp. 258–268, 12, 1968.

Colby, M., Duchow-Pressley, T., Chung, J. J., and Tumer, K. Local approximation of difference evaluation functions. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, pp. 521–529, 2016.

Hearn, D. W. and Ramana, M. V. pp. 109–124. In P. Marcotte and S. Nguyen (Eds.), Solving Congestion Toll Pricing Models. Springer US, Boston, MA, pp. 109–124, 1998.

Joshi, D. J., Kale, I., Gandewar, S., Korate, O., Patwari, D., and Patil, S. Reinforcement learning: A survey. Journal of Artificial Intelligence Research vol. 1311 AISC, pp. 297–308, 1996.

Knuth, D. E. Two notes on notation, 1992.

Leape, J. The london congestion charge. Journal of Economic Perspectives vol. 20, pp. 157–176, 9, 2006.

Mirzaei, H., Sharon, G., Boyles, S., Givargis, T., and Stone, P. Enhanced delta-tolling: Traffic optimization via policy gradient reinforcement learning. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC). pp. 47–52, 2018.

Pigou, A. C. The Economics of Welfare. Routledge, 1920.

Ramos, G. de. O. Regret Minimisation and System-Efficiency in Route Choice. Ph.D. thesis, Universidade Federal do Rio Grande do Sul, Brazil, 2018.

Ramos, G. de. O., da Silva, B. C., Rădulescu, R., Bazzan, A. L. C., and Nowé, A. Toll-based reinforcement learning for efficient equilibria in route choice. Knowledge Engineering Review , 2020.

Ramos, G. de. O., Rădulescu, R., Nowé, A., and Tavares, A. R. Toll-based learning for minimising congestion under heterogeneous preferences. In Proc. of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), B. An, N. Yorke-Smith, A. El Fallah Seghrouchni, and G. Sukthankar (Eds.). IFAAMAS, Auckland, New Zealand, pp. 1098–1106, 2020.

Sharon, G., Hanna, J. P., Rambha, T., Levin, M. W., Albert, M., Boyles, S. D., and Stone, P. Real-time adaptive tolling scheme for optimized social welfare in traffic networks. AAMAS ’17. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, pp. 828–836, 2017.

Somuyiwa, A. O., Fadare, S. O., and Ayantoyinbo, B. B. Analysis of the cost of traffic congestion on worker’s productivity in a mega city of a developing economy. International Review of Management and Business Research 4 (3): 644, 2015.

Stefanello, F., Buriol, L. S., Hirsch, M. J., Pardalos, P. M., Querido, T., Resende, M. G. C., and Ritt, M. On the minimization of traffic congestion in road networks with tolls. Annals of Operations Research 249 (1): 119–139, Feb, 2017.

Sutton, R. S. and Barto, A. G. Reinforcement learning: An introduction. IEEE Transactions on Neural Networks 9 (5): 1054–1054, 1998.

Watkins, C. J. C. H. Learning from Delayed Rewards. Cambridge University, 1989.

Watkins, C. J. C. H. and Dayan, P. Q-learning. Machine Learning 1992 8:3 vol. 8, pp. 279–292, 5, 1992.

Wolpert, D. H. and Tumer, K. An introduction to collective intelligence. CoRR vol. cs.LG/9908014, 1999.

Yen, J. Y. Finding the k shortest loopless paths in a network. Management Science 17 (11): 712–716, 1971.

Zhong, N., Cao, J., and Wang, Y. Traffic congestion, ambient air pollution, and health: Evidence from driving restrictions in Beijing. Journal of the Association of Environmental and Resource Economists 4 (3): 821–856, 2017.
Como Citar

Selecione um Formato
SANTOS, Timóteo Fonseca; DE CARVALHO, Moisés Gomes. Toll-based Q-learning with non-cooperative agents. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 11. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 57-64. ISSN 2763-8944. DOI: