Enhancing Auto-ML with Missing Value Imputation: A Case Study with TPOT2 Library and Industry 4.0

  • Joel Frank Huarayo Quispe UNIFESP
  • Didier A. Vega-Oliveros UNIFESP
  • Lilian Berton UNIFESP


Automated Machine Learning (AutoML) is increasingly important in industrial applications for democratizing the use of machine learning techniques, particularly in Industry 4.0, where robust model development is crucial. Addressing the challenge of missing data, we introduce a missing data imputation module integrated into the TPOT2 AutoML library—a rewrite of TPOT with additional features. This module incorporates SimpleImputer, IterativeImputer, and KNNImputer, enhancing TPOT2’s ability to handle datasets with missing values. We evaluate the module on three industrial datasets (Mercedes-Benz Greener Manufacturing, NASA Turbofan Jet Engine, Gearbox fault diagnosis) with classification and regression tasks, testing it with varying levels of missing data (5%, 10%, 15%). Our results demonstrate that the TPOT2 library, equipped with this imputation module, significantly improves predictive modeling accuracy in the presence of missing data, proving its practical utility and robustness in industrial contexts.
Palavras-chave: Missing Value Imputation, Auto-ML, Industry 4.-1


QUISPE, Joel Frank Huarayo; VEGA-OLIVEROS, Didier A.; BERTON, Lilian. Enhancing Auto-ML with Missing Value Imputation: A Case Study with TPOT2 Library and Industry 4.0. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 21. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 97-108. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2024.245232.

