Online Selection of Heuristic Operators with Deep Q-Network: A Study on the HyFlex Framework

  • Augusto Dantas UFPR
  • Aurora Pozo UFPR

Resumo


General and adaptive strategies have been a highly pursued goal of the optimization community, due to the domain-dependent set of configurations (operators and parameters) that is usually required for achieving high quality solutions. This work investigates a Deep Q-Network (DQN) selection strategy under an online selection Hyper-Heuristic algorithm and compares it with two state-of-the-art Multi-Armed Bandit (MAB) approaches. We conducted the experiments on all six problem domains from the HyFlex Framework. With our definition of state representation and reward scheme, the DQN was able to quickly identify the good and bad operators, which resulted on better performance than the MAB strategies on the problem instances that a more exploitative behavior deemed advantageous.
Palavras-chave: Hyper-Heuristic, Reinforcement Learning, Combinatorial Optimization
Publicado
29/11/2021
DANTAS, Augusto; POZO, Aurora. Online Selection of Heuristic Operators with Deep Q-Network: A Study on the HyFlex Framework. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 10. , 2021, Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . ISSN 2643-6264.