TY - JOUR
AU - Oliveira, Douglas Nunes de
AU - Merschmann, Luiz Henrique de Campos
PY - 2022
TI - An Auto-ML Approach Applied to Text Classification
JF - Proceedings of the Brazilian Symposium on Multimedia and the Web (WebMedia); 2022: Proceedings of the 28th Brazilian Symposium on Multimedia and the Web
KW -
N2 - Automated Machine Learning (AutoML) is a research area that aims to help humans solve Machine Learning (ML) problems by automatically discovering good model pipelines (algorithms and their hyperparameters for every stage of a machine learning process) for a given dataset. Since we have a combinatorial optimization problem for which it is impossible to evaluate all possible pipelines, most AutoML systems use Evolutionary Algorithm (EA) or Bayesian Optimization (BO) to find a good solution. As these systems usually evaluate the pipelinesâ€™ performance using the k-fold cross-validation method, the chance of finding an overfitted solution increases with the number of pipelines evaluated. Therefore, to avoid the aforementioned issue, we propose an Auto-ML system, named Auto-ML System for Text Classification (ASTeC), that uses the Bootstrap Bias Corrected CV (BBC-CV) to evaluate the pipelinesâ€™ performance. More specifically, the proposed system combines EA, BO, and BBC-CV to find a good model pipeline for the text classification task. We evaluate our proposal by comparing it against two state-of-the-art systems, the Tree-based Pipeline Optimization Tool (TPOT) and Google Cloud AutoML service. To do so, we use seven public datasets composed of written Brazilian Portuguese texts from the sentiment analysis domain. Statistical tests show that our system is equivalent to or better than both of them in all evaluated datasets.
UR - https://sol.sbc.org.br/index.php/webmedia/article/view/22085