A Comprehensive Study of Fitness Landscapes in AutoML: The Impact of Dataset Complexity and Search Space Neutrality

  • Thiago R. França UFPE
  • Ricardo B. C. Prudêncio UFPE
  • Péricles B. C. de Miranda UFRPE

Resumo


The search space in Automated Machine Learning (AutoML) consists of numerous combinations of machine learning pipelines. Understanding how optimal solutions are distributed within this space is key to improving search strategies and Fitness Landscape Analysis (FLA) provides useful tools for this purpose. However, traditional FLA metrics often fail to capture the complexity of AutoML tasks due to their high dimensionality and interdependencies. This study investigates the relationship between dataset complexity characteristics and FLA metrics in AutoML. While previous research has hinted at this connection, no comprehensive analysis has been presented. We focus on the concept of neutrality, which refers to regions where small pipeline configuration changes lead to minimal fitness changes. This property plays a critical role in the effectiveness of search algorithms. Our results reveal statistically significant correlations between neutrality and specific dataset complexity measures, particularly those related to class overlap and feature difficulty. These insights enhance the understanding of AutoML search space dynamics and support the development of more informed and adaptive optimization strategies.
Publicado
29/09/2025
FRANÇA, Thiago R.; PRUDÊNCIO, Ricardo B. C.; MIRANDA, Péricles B. C. de. A Comprehensive Study of Fitness Landscapes in AutoML: The Impact of Dataset Complexity and Search Space Neutrality. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 35. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 18-33. ISSN 2643-6264.