Automating Machine Learning Pipeline Design via Metalearning

Edesio Alcobaça; André C. P. L. F. de Carvalho

doi:10.5753/ctd.2026.19538

Edesio Alcobaça USP
André C. P. L. F. de Carvalho USP

DOI: https://doi.org/10.5753/ctd.2026.19538

Resumo

Although Automated Machine Learning (AutoML) systems allow the use of Machine Learning (ML) to automate the design of ML pipelines, they typically search over fixed, task-agnostic configuration spaces, leading to high computational costs. This paper overviews a Ph.D. thesis that proposes a paradigm shift: using Metalearning (MtL) to dynamically build task-specific search spaces. Unlike prior approaches that either optimize within a fixed search space or directly recommend algorithms without an optimization step, this thesis introduces the Dynamic Pipeline CASH problem, which extends the CASH formulation to incorporate meta-model-driven search space creation for pipelines. The thesis contributes a systematic literature review identifying meta-knowledge as the unifying thread across AutoML subfields, applied studies reinforcing the importance of algorithm selection and tuning, a large-scale benchmark of over one million pipeline configurations, and the pymfe package for reproducible meta-feature extraction. These building blocks converge into a novel MtL framework that dynamically reduces search spaces while maintaining competitive performance.

Referências

Alcobaça, E. and de Carvalho, A. C. (2025a). Dynamic design of machine learning pipelines via metalearning. arXiv preprint arXiv:2508.13436.

Alcobaça, E. and de Carvalho, A. C. (2025b). Exploring one million machine learning pipelines: A benchmarking study. In International Conference on Automated Machine Learning, pages 22–1. PMLR.

Alcobaça, E. and de Carvalho, A. C. (2026). A literature review on automated machine learning. Artificial Intelligence Review, 59(1):1–39.

Alcobaça, E., Mantovani, R. G., Rossi, A. L., and De Carvalho, A. C. (2018). Dimensionality reduction for the algorithm recommendation problem. In 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pages 318–323. IEEE.

Alcobaça, E., Mastelini, S. M., Botari, T., Pimentel, B. A., Cassar, D. R., de Leon Ferreira, A. C. P., Zanotto, E. D., et al. (2020a). Explainable machine learning algorithms for predicting glass transition temperatures. Acta materialia, 188:92–100.

Alcobaça, E., Siqueira, F., Rivolli, A., Garcia, L. P. F., Oliva, J. T., de Carvalho, A. C., et al. (2020b). Mfe: Towards reproducible meta-feature extraction. Journal of Machine Learning Reseach, 21:111–1.

Borboudakis, G., Charonyktakis, P., Paraschakis, K., and Tsamardinos, I. (2023). A meta-level learning algorithm for sequential hyper-parameter space reduction in automl. arXiv preprint arXiv:2312.06305.

Brazdil, P., Van Rijn, J. N., Soares, C., and Vanschoren, J. (2022). Metalearning: Applications to automated machine learning and data mining. Springer Nature.

Cassar, D. R., Mastelini, S. M., Botari, T., Alcobaca, E., de Carvalho, A. C., and Zanotto, E. D. (2021). Predicting and interpreting oxide glass properties by machine learning using large datasets. Ceramics international, 47(17):23958–23972.

de Sá, A. G., Pinto, W. J. G., Oliveira, L. O. V., and Pappa, G. L. (2017). Recipe: a grammar-based framework for automatically evolving classification pipelines. In European Conference on Genetic Programming, pages 246–261. Springer.

El Baz, A., Ullah, I., Alcobaça, E., Carvalho, A. C., Chen, H., Ferreira, F., Gouk, H., Guan, C., Guyon, I., Hospedales, T., et al. (2022). Lessons learned from the neurips 2021 metadl challenge: Backbone fine-tuning without episodic meta-learning dominates for few-shot learning image classification. In NeurIPS 2021 Competitions and Demonstrations Track, pages 80–96. PMLR.

Fabris, F. and Freitas, A. A. (2019). Analysing the overfit of the auto-sklearn automated machine learning tool. In International Conference on Machine Learning, Optimization, and Data Science, pages 508–520. Springer.

Feurer, M., Eggensperger, K., Falkner, S., Lindauer, M., and Hutter, F. (2022). Autosklearn 2.0: Hands-free automl via meta-learning. Journal of Machine Learning Research, 23(261):1–61.

Feurer, M., Klein, A., Eggensperger, K., Springenberg, J. T., Blum, M., and Hutter, F. (2015a). Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 2962–2970.

Feurer, M., Springenberg, J. T., and Hutter, F. (2015b). Initializing bayesian hyperparameter optimization via meta-learning. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA, pages 1128–1135.

Garcia, L. P., Rivolli, A., Alcobaça, E., Lorena, A. C., and de Carvalho, A. C. (2020). Boosting meta-learning with simulated data complexity measures. Intelligent Data Analysis, 24(5):1011–1028.

Hutter, F., Kotthoff, L., and Vanschoren, J., editors (2019). Automated Machine Learning - Methods, Systems, Challenges. The Springer Series on Challenges in Machine Learning. Springer.

Kedziora, D. J., Nguyen, T.-D., Musial, K., and Gabrys, B. (2024). On taking advantage of opportunistic meta-knowledge to reduce configuration spaces for automated machine learning. Expert Systems with Applications, 239:122359.

Mantovani, R. G., Rossi, A. L., Alcobaça, E., Vanschoren, J., and de Carvalho, A. C. (2019). A meta-learning recommender system for hyperparameter tuning: Predicting when tuning improves svm classifiers. Information Sciences, 501:193–221.

Mantovani, R. G., Rossi, A. L. D., Alcobaça, E., Gertrudes, J. C., Junior, S. B., and de Carvalho, A. C. P. d. L. F. (2020). Rethinking default values: A low cost and efficient strategy to define hyperparameters. arXiv preprint arXiv:2008.00025.

Mastelini, S. M., Cassar, D. R., Alcobaça, E., Botari, T., de Carvalho, A. C., and Zanotto, E. D. (2022). Machine learning unveils composition-property relationships in chalcogenide glasses. Acta Materialia, 240:118302.

Mohr, F., Wever, M., and Hüllermeier, E. (2018). Ml-plan: Automated machine learning via hierarchical planning. Machine Learning, 107(8-10):1495–1515.

Olson, R. S., Bartley, N., Urbanowicz, R. J., and Moore, J. H. (2016). Evaluation of a tree-based pipeline optimization tool for automating data science. In Friedrich, T., Neumann, F., and Sutton, A. M., editors, Proceedings of the 2016 on Genetic and Evolutionary Computation Conference, pages 485–492. ACM.

Parmezan, A. R. S., Lee, H. D., and Wu, F. C. (2017). Metalearning for choosing feature selection algorithms in data mining: Proposal of a new framework. Expert Systems with Applications, 75:1–24.

Rice, J. R. (1976). The algorithm selection problem. Advances in Computers, 15:65–118.

Rivolli, A., Garcia, L. P., Soares, C., Vanschoren, J., and de Carvalho, A. C. (2022). Meta-features for meta-learning. Knowledge-Based Systems, 240:108101.

Thornton, C., Hutter, F., Hoos, H. H., and Leyton-Brown, K. (2013). Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In The 19th ACM SIGKDD, 2013, pages 847–855.

Xue, C., Hu, M., Huang, X., and Li, C.-G. (2022). Automated search space and search strategy selection for automl. Pattern Recognition, 124:108474.