Collecting Meta-Data from the OpenML Public Repository

  • Nathan F. Carvalho Instituto Tecnológico de Aeronáutica
  • André A. Gonçalves Instituto Tecnológico de Aeronáutica
  • Ana C. Lorena Instituto Tecnológico de Aeronáutica


In Machine Learning (ML), selecting the most suitable algorithm for a problem is a challenge. Meta-Learning (MtL) offers an alternative approach by exploring the relationships between dataset characteristics and ML algorithmic performance. To conduct a MtL study, it is necessary to create a metadataset comprising datasets of varying characteristics and defying the ML algorithms at different levels. This study analyzes the information available in the OpenML public repository for building such meta-datasets, which provides a Python API for easy data importation. Assessing the content currently available in the platform, there is still no extensive meta-feature characterization for all datasets, limiting their complete characterization.

Palavras-chave: Meta-Learning, OpenML, meta-features


