An Auto-ML Approach Applied to Text Classification

  • Douglas Nunes de Oliveira UFLA
  • Luiz Henrique de Campos Merschmann UFLA


Automated Machine Learning (AutoML) is a research area that aims to help humans solve Machine Learning (ML) problems by automatically discovering good model pipelines (algorithms and their hyperparameters for every stage of a machine learning process) for a given dataset. Since we have a combinatorial optimization problem for which it is impossible to evaluate all possible pipelines, most AutoML systems use Evolutionary Algorithm (EA) or Bayesian Optimization (BO) to find a good solution. As these systems usually evaluate the pipelines’ performance using the k-fold cross-validation method, the chance of finding an overfitted solution increases with the number of pipelines evaluated. Therefore, to avoid the aforementioned issue, we propose an Auto-ML system, named Auto-ML System for Text Classification (ASTeC), that uses the Bootstrap Bias Corrected CV (BBC-CV) to evaluate the pipelines’ performance. More specifically, the proposed system combines EA, BO, and BBC-CV to find a good model pipeline for the text classification task. We evaluate our proposal by comparing it against two state-of-the-art systems, the Tree-based Pipeline Optimization Tool (TPOT) and Google Cloud AutoML service. To do so, we use seven public datasets composed of written Brazilian Portuguese texts from the sentiment analysis domain. Statistical tests show that our system is equivalent to or better than both of them in all evaluated datasets.
Palavras-chave: automl, bias correction cross-validation, genetic algorithm, bayesian optimization


AIworx. 2017. Chocolate: A fully decentralized hyperparameter optimization framework. 17 de março de 2019

Saqib Alam and Nianmin Yao. 2019. The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Comput. Math. Organ. Theory 25, 3 (2019), 319–335.

Matheus Araújo, Júlio Cesar dos Reis, Adriano C. M. Pereira, and Fabrício Benevenuto. 2016. An Evaluation of Machine Translation for Multilingual Sentence-level Sentiment Analysis. In Proceedings of the Annual ACM Symposium on Applied Computing. ACM, Pisa, Italy, 1140–1145.

Hans-Georg Beyer and Hans-Paul Schwefel. 2002. Evolution strategies - A comprehensive introduction. Natural Computing 1, 1 (2002), 3–52.

Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python. O’Reilly Media, Inc., Sebastopol, Estados Unidos.

Henrico Bertini Brum and Maria das Graças Volpe Nunes. 2018. Building a Sentiment Corpus of Tweets in Brazilian Portuguese. In Proceedings of the International Conference on Language Resources and Evaluation. ELRA, Miyazaki, Japan.

Gavin C. Cawley and Nicola L. C. Talbot. 2010. On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation. J. Mach. Learn. Res. 11 (2010), 2079–2107.

Douglas Nunes de Oliveira and Luiz Henrique de Campos Merschmann. 2021. Joint evaluation of preprocessing tasks with classifiers for sentiment analysis in Brazilian Portuguese language. Multimedia Tools and Applications 80, 10 (Feb 2021), 15391–15412.

Alex Guimarães Cardoso de Sá, Walter José G. S. Pinto, Luiz Otávio Vilas Boas Oliveira, and Gisele L. Pappa. 2017. RECIPE: A Grammar-Based Framework for Automatically Evolving Classification Pipelines. In Proceedings of the European Conference on Genetic Programming. Springer International Publishing, Amsterdam, Netherlands, 246–261.

Fernando Leandro dos Santos and Marcelo Ladeira. 2014. The Role of Text Preprocessing in Opinion Mining on a Social Media Language Dataset. In Proceedings of the Brazilian Conference on Intelligent Systems. IEEE, São Paulo, Brazil, 50–54.

David M. Eberhard, Gary F. Simons, and Charles D. Fennig (Eds.). 2022. Ethnologue: Languages of the World (25 ed.). SIL International, Dallas, TX, USA.

Rodrigo Santana Ferreira. 2017. Análise de Sentimentos - Aprenda de uma vez por todas como funciona utilizando dados do Twitter. [link]. 3 de março de 2019

Matthias Feurer and Frank Hutter. 2019. Hyperparameter Optimization. Springer International Publishing, Cham, 3–33.

Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and Robust Automated Machine Learning. In Proceedings of the Neural Information Processing Systems Conference. Curran Associates, Inc., Montreal, Canada, 2962–2970. [link].

Erick Rocha Fonseca and João Luís Garcia Rosa. 2013. Mac-morpho revisited: Towards robust part-of-speech tagging. In 9th Brazilian Symposium in Information and Human Language Technology (STIL). SBC, Fortaleza, Brasil.

Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research 13 (2012).

Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D. Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Conference on Knowledge Discovery and Data Mining. ACM, Halifax, Canada, 1487–1495.

Google Cloud. 2019. Custom Machine Learning Models 3 de junho de 2019

Isabelle Guyon, Krisitn P. Bennett, Gavin C. Cawley, Hugo Jair Escalante, Sergio Escalera, Tin Kam Ho, Núria Macià, Bisakha Ray, Mehreen Saeed, Alexander R. Statnikov, and Evelyne Viegas. 2015. Design of the 2015 ChaLearn AutoML challenge. In Proceedings of the International Joint Conference on Neural Networks. IEEE, Killarney, Ireland, 1–8.

Nikolaus Hansen and Andreas Ostermeier. 2001. Completely Derandomized Self-Adaptation in Evolution Strategies. Evolutionary Computation 9 (2001).

Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2011. Sequential Model-Based Optimization for General Algorithm Configuration. In Proceedings of the Learning and Intelligent Optimization International Conference (Lecture Notes in Computer Science, Vol. 6683), Carlos A. Coello Coello (Ed.). Springer, Rome, Italy,507–523.

Milton Stiilpen Junior and Luiz Henrique de Campos Merschmann. 2016. A Methodology to Handle Social Media Posts in Brazilian Portuguese for Text Mining Applications. In Proceedings of the Brazilian Symposium on Multimedia and the Web. ACM, Teresina, Brazil, 239–246.

Ron Kohavi. 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2 (Montreal, Quebec, Canada) (IJCAI’95). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1137–1143.

Lars Kotthoff, Chris Thornton, Holger H. Hoos, Frank Hutter, and Kevin Leyton-Brown. 2017. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal of Machine Learning Research 18 (2017).

Renato F. Martins, Adriano C. M. Pereira, and Fabrício Benevenuto. 2015. An Approach to Sentiment Analysis of Web Applications in Portuguese. In Proceedings of the Brazilian Symposium on Multimedia and the Web. ACM, Manaus, Brazil, 105–112.

Sascha Narr, Michael Hülfenhaus, and Sahin Albayrak. 2012. Language-independent twitter sentiment analysis. In Proceedings of the Workshop on Knowledge Discovery, Data Mining and Machine Learning. Dortmund, Germany, 12–14.

Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore. 2016. Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, New York, USA.

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake VanderPlas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011).

Kumar Ravi and Vadlamani Ravi. 2015. A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowledge-Based Systems 89 (2015), 14–46.

Filipe N. Ribeiro, Matheus Araújo, Pollyanna Gonçalves, Marcos André Gonçalves, and Fabrício Benevenuto. 2016. SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Science 5, 1 (2016), 1–29.

Ismael Rodríguez-Fdez, Adrián Canosa, Manuel Mucientes, and Alberto Bugarín. 2015. STAC: a web platform for the comparison of algorithms using statistical tests. In Proceedings of the 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

Ismael Santana Silva, Janaína Gomide, Adriano Veloso, Wagner Meira Jr., and Renato Ferreira. 2011. Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In Proceedings of the International Conference on Research and Development in Information Retrieval. ACM, Beijing, China, 475–484.

Ellen Souza, Douglas Vitório, Dayvid Castro, Adriano L. I. Oliveira, and Cristine Gusmão. 2016. Characterizing Opinion Mining: A Systematic Mapping Study of the Portuguese Language. In Proceedings of the Computational Processing of the Portuguese Language (Lecture Notes in Computer Science, Vol. 9727). Springer, Tomar, Portugal, 122–127.

Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013.Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In Conference on Knowledge Discovery and Data Mining. ACM, Chicago, USA, 847–855.

Ryan J. Tibshirani and Robert Tibshirani. 2009. A Bias Correction for the Minimum Error Rate in Cross-Validation. The Annals of Applied Statistics 3, 2 (2009), 822–829.

Ioannis Tsamardinos, Elissavet Greasidou, and Giorgos Borboudakis. 2018. Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation. Machine Learning 107, 12 (2018), 1895–1922.

Ioannis Tsamardinos, Amin Rakhshani, and Vincenzo Lagani. 2015. Performance- Estimation Properties of Cross-Validation-Based Protocols with Simultaneous Hyper-Parameter Optimization. International Journal on Artificial Intelligence Tools 24, 5 (2015), 1–29.

Alper Kursat Uysal and Serkan Günal. 2014. The impact of preprocessing on text classification. Information Processing and Management 50, 1 (2014), 104–112

Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics bulletin 1, 6 (1945), 80–83.

Clarissa Xavier. 2018. Polarity Classification of Traffic Related Tweets. In Proceedings of Encontro Nacional de Inteligência Artificial e Computacional. São Paulo, Brazil.
Como Citar

Selecione um Formato
OLIVEIRA, Douglas Nunes de; MERSCHMANN, Luiz Henrique de Campos. An Auto-ML Approach Applied to Text Classification. In: SIMPÓSIO BRASILEIRO DE SISTEMAS MULTIMÍDIA E WEB (WEBMEDIA), 28. , 2022, Curitiba. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 115-123.