An analysis of reward shaping for reinforcement learning in a multi-agent framework for combinatorial optimization

  • Jardell Fillipe da Silva CEFET-MG / UFV
  • Maria Amélia Lopes Silva CEFET-MG / UFV
  • Sérgio Ricardo de Souza CEFET-MG

Abstract


This article evaluates reward shaping for the reinforcement learning algorithm used by a multi-agent framework for combinatorial optimization. This evaluation consists of six different reward shaping scenarios, applied to a set of identical agents built on the framework, all implementing the Iterated Local Search (ILS) metaheuristic, to solve the Vehicle Routing Problem with Time Windows (VRPTW). Computational tests, applied to VRPTW instances from the literature, showed that the results obtained for the various shapes of reward are comparable in terms of the quality of the objective function values achieved, the execution time, and the learning speed to results already existing in the literature. The conclusions show that it is possible to define a reward shaping with greater autonomy concerning the knowledge degree of the addressed problem and, at the same time, efficient in terms of runtime and learning speed.

References

Agogino, A. K. and Tumer, K. (2008). Analyzing and visualizing multiagent rewards in dynamic and stochastic domains. Autonomous Agents and Multi-Agent Systems, 17(2):320-338.

Devlin, S., Yliniemi, L., Kudenko, D., and Tumer, K. (2014). Potential-based difference rewards for multiagent reinforcement learning. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, pages 165-172.

Feo, T. A. and Resende, M. G. C. (1995). Greedy randomized adaptive search procedures. Journal of Global Optimization, 6:109-133.

Grzes, M. (2017). Reward shaping in episodic reinforcement learning. Autonomous Agents and Multi-Agent Systems.

Lourenço, H. R., Martin, O. C., and Stützle, T. (2003). Iterated local search. In Glover, F. and Kochenberger, G. A., editors, Handbook of Metaheuristics, volume 57 of International Series in Operations Research & Management Science, chapter 11, pages 321-353. Kluwer Academic Publishers.

Mladenović, N. and Hansen, P. (1997). Variable neighborhood search. Computers & Operations Research, 24(11):1097-1100.

Silva, M. A. L., de Souza, S. R., Souza, M. J. F., and Bazzan, A. L. C. (2019). A reinforcement learning-based multi-agent framework applied for solving routing and scheduling problems. Expert Systems with Applications, 131:148 - 171.

Silver, D., Singh, S., Precup, D., and Sutton, R. S. (2021). Reward is enough. Artificial Intelligence, 299:103535.

Solomon, M. M. (1987). Algorithms for the vehicle routing and scheduling problems with time window constraints. Operations Research, 2(35):254-264.

Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.

Toth, P. and Vigo, D. (2014). Vehicle routing: problems, methods, and applications. SIAM.

Watkins, C. J. C. H. and Dayan, P. (1992). Q-learning. Machine Learning, 8(3):279-292.
Published
2022-11-28
SILVA, Jardell Fillipe da; SILVA, Maria Amélia Lopes; SOUZA, Sérgio Ricardo de. An analysis of reward shaping for reinforcement learning in a multi-agent framework for combinatorial optimization. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 19. , 2022, Campinas/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 130-141. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2022.227627.