Online Selection of Heuristic Operators with Deep Q-Network: A Study on the HyFlex Framework

Augusto Dantas; Aurora Pozo

Online Selection of Heuristic Operators with Deep Q-Network: A Study on the HyFlex Framework

Augusto Dantas UFPR
Aurora Pozo UFPR

Resumo

General and adaptive strategies have been a highly pursued goal of the optimization community, due to the domain-dependent set of configurations (operators and parameters) that is usually required for achieving high quality solutions. This work investigates a Deep Q-Network (DQN) selection strategy under an online selection Hyper-Heuristic algorithm and compares it with two state-of-the-art Multi-Armed Bandit (MAB) approaches. We conducted the experiments on all six problem domains from the HyFlex Framework. With our definition of state representation and reward scheme, the DQN was able to quickly identify the good and bad operators, which resulted on better performance than the MAB strategies on the problem instances that a more exploitative behavior deemed advantageous.

Palavras-chave: Hyper-Heuristic, Reinforcement Learning, Combinatorial Optimization

Springer (English)

Publicado

29/11/2021

Como Citar

Selecione um Formato

DANTAS, Augusto; POZO, Aurora. Online Selection of Heuristic Operators with Deep Q-Network: A Study on the HyFlex Framework. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 10. , 2021, Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . ISSN 2643-6264.