An Auto-ML Approach Applied to Text Classification
Resumo
Automated Machine Learning (AutoML) is a research area that aims to help humans solve Machine Learning (ML) problems by automatically discovering good model pipelines (algorithms and their hyperparameters for every stage of a machine learning process) for a given dataset. Since we have a combinatorial optimization problem for which it is impossible to evaluate all possible pipelines, most AutoML systems use Evolutionary Algorithm (EA) or Bayesian Optimization (BO) to find a good solution. As these systems usually evaluate the pipelines’ performance using the k-fold cross-validation method, the chance of finding an overfitted solution increases with the number of pipelines evaluated. Therefore, to avoid the aforementioned issue, we propose an Auto-ML system, named Auto-ML System for Text Classification (ASTeC), that uses the Bootstrap Bias Corrected CV (BBC-CV) to evaluate the pipelines’ performance. More specifically, the proposed system combines EA, BO, and BBC-CV to find a good model pipeline for the text classification task. We evaluate our proposal by comparing it against two state-of-the-art systems, the Tree-based Pipeline Optimization Tool (TPOT) and Google Cloud AutoML service. To do so, we use seven public datasets composed of written Brazilian Portuguese texts from the sentiment analysis domain. Statistical tests show that our system is equivalent to or better than both of them in all evaluated datasets.
Palavras-chave:
automl, bias correction cross-validation, genetic algorithm, bayesian optimization
Referências
AIworx. 2017. Chocolate: A fully decentralized hyperparameter optimization framework. https://github.com/AIworx-Labs/chocolate 17 de março de 2019
Saqib Alam and Nianmin Yao. 2019. The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Comput. Math. Organ. Theory 25, 3 (2019), 319–335. https://doi.org/10.1007/s10588-018-9266-8
Matheus Araújo, Júlio Cesar dos Reis, Adriano C. M. Pereira, and Fabrício Benevenuto. 2016. An Evaluation of Machine Translation for Multilingual Sentence-level Sentiment Analysis. In Proceedings of the Annual ACM Symposium on Applied Computing. ACM, Pisa, Italy, 1140–1145. https://doi.org/10.1145/2851613.2851817
Hans-Georg Beyer and Hans-Paul Schwefel. 2002. Evolution strategies - A comprehensive introduction. Natural Computing 1, 1 (2002), 3–52. https://doi.org/10.1023/A:1015059928466
Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python. O’Reilly Media, Inc., Sebastopol, Estados Unidos.
Henrico Bertini Brum and Maria das Graças Volpe Nunes. 2018. Building a Sentiment Corpus of Tweets in Brazilian Portuguese. In Proceedings of the International Conference on Language Resources and Evaluation. ELRA, Miyazaki, Japan.
Gavin C. Cawley and Nicola L. C. Talbot. 2010. On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation. J. Mach. Learn. Res. 11 (2010), 2079–2107. https://doi.org/10.5555/1756006.1859921
Douglas Nunes de Oliveira and Luiz Henrique de Campos Merschmann. 2021. Joint evaluation of preprocessing tasks with classifiers for sentiment analysis in Brazilian Portuguese language. Multimedia Tools and Applications 80, 10 (Feb 2021), 15391–15412. https://doi.org/10.1007/s11042-020-10323-8
Alex Guimarães Cardoso de Sá, Walter José G. S. Pinto, Luiz Otávio Vilas Boas Oliveira, and Gisele L. Pappa. 2017. RECIPE: A Grammar-Based Framework for Automatically Evolving Classification Pipelines. In Proceedings of the European Conference on Genetic Programming. Springer International Publishing, Amsterdam, Netherlands, 246–261. https://doi.org/10.1007/978-3-319-55696-3_16
Fernando Leandro dos Santos and Marcelo Ladeira. 2014. The Role of Text Preprocessing in Opinion Mining on a Social Media Language Dataset. In Proceedings of the Brazilian Conference on Intelligent Systems. IEEE, São Paulo, Brazil, 50–54. https://doi.org/10.1109/BRACIS.2014.20
David M. Eberhard, Gary F. Simons, and Charles D. Fennig (Eds.). 2022. Ethnologue: Languages of the World (25 ed.). SIL International, Dallas, TX, USA.
Rodrigo Santana Ferreira. 2017. Análise de Sentimentos - Aprenda de uma vez por todas como funciona utilizando dados do Twitter. [link]. 3 de março de 2019
Matthias Feurer and Frank Hutter. 2019. Hyperparameter Optimization. Springer International Publishing, Cham, 3–33. https://doi.org/10.1007/978-3-030-05318-5_1
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and Robust Automated Machine Learning. In Proceedings of the Neural Information Processing Systems Conference. Curran Associates, Inc., Montreal, Canada, 2962–2970. [link].
Erick Rocha Fonseca and João Luís Garcia Rosa. 2013. Mac-morpho revisited: Towards robust part-of-speech tagging. In 9th Brazilian Symposium in Information and Human Language Technology (STIL). SBC, Fortaleza, Brasil.
Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research 13 (2012). https://doi.org/10.5555/2503308.2503311
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D. Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Conference on Knowledge Discovery and Data Mining. ACM, Halifax, Canada, 1487–1495. https://doi.org/10.1145/3097983.3098043
Google Cloud. 2019. Custom Machine Learning Models https://cloud.google.com/automl/ 3 de junho de 2019
Isabelle Guyon, Krisitn P. Bennett, Gavin C. Cawley, Hugo Jair Escalante, Sergio Escalera, Tin Kam Ho, Núria Macià, Bisakha Ray, Mehreen Saeed, Alexander R. Statnikov, and Evelyne Viegas. 2015. Design of the 2015 ChaLearn AutoML challenge. In Proceedings of the International Joint Conference on Neural Networks. IEEE, Killarney, Ireland, 1–8. https://doi.org/10.1109/IJCNN.2015.7280767
Nikolaus Hansen and Andreas Ostermeier. 2001. Completely Derandomized Self-Adaptation in Evolution Strategies. Evolutionary Computation 9 (2001). https://doi.org/10.1162/106365601750190398
Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2011. Sequential Model-Based Optimization for General Algorithm Configuration. In Proceedings of the Learning and Intelligent Optimization International Conference (Lecture Notes in Computer Science, Vol. 6683), Carlos A. Coello Coello (Ed.). Springer, Rome, Italy,507–523. https://doi.org/10.1007/978-3-642-25566-3_40
Milton Stiilpen Junior and Luiz Henrique de Campos Merschmann. 2016. A Methodology to Handle Social Media Posts in Brazilian Portuguese for Text Mining Applications. In Proceedings of the Brazilian Symposium on Multimedia and the Web. ACM, Teresina, Brazil, 239–246. https://doi.org/10.1145/2976796.2976845
Ron Kohavi. 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2 (Montreal, Quebec, Canada) (IJCAI’95). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1137–1143.
Lars Kotthoff, Chris Thornton, Holger H. Hoos, Frank Hutter, and Kevin Leyton-Brown. 2017. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal of Machine Learning Research 18 (2017).
Renato F. Martins, Adriano C. M. Pereira, and Fabrício Benevenuto. 2015. An Approach to Sentiment Analysis of Web Applications in Portuguese. In Proceedings of the Brazilian Symposium on Multimedia and the Web. ACM, Manaus, Brazil, 105–112. https://doi.org/10.1145/2820426.2820446
Sascha Narr, Michael Hülfenhaus, and Sahin Albayrak. 2012. Language-independent twitter sentiment analysis. In Proceedings of the Workshop on Knowledge Discovery, Data Mining and Machine Learning. Dortmund, Germany, 12–14.
Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore. 2016. Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, New York, USA. https://doi.org/10.1145/2908812.2908918
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake VanderPlas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011). https://doi.org/10.5555/1953048.2078195
Kumar Ravi and Vadlamani Ravi. 2015. A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowledge-Based Systems 89 (2015), 14–46. https://doi.org/10.1016/j.knosys.2015.06.015
Filipe N. Ribeiro, Matheus Araújo, Pollyanna Gonçalves, Marcos André Gonçalves, and Fabrício Benevenuto. 2016. SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Science 5, 1 (2016), 1–29. https://doi.org/10.1140/epjds/s13688-016-0085-1
Ismael Rodríguez-Fdez, Adrián Canosa, Manuel Mucientes, and Alberto Bugarín. 2015. STAC: a web platform for the comparison of algorithms using statistical tests. In Proceedings of the 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). https://doi.org/10.1109/FUZZ-IEEE.2015.7337889
Ismael Santana Silva, Janaína Gomide, Adriano Veloso, Wagner Meira Jr., and Renato Ferreira. 2011. Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In Proceedings of the International Conference on Research and Development in Information Retrieval. ACM, Beijing, China, 475–484. https://doi.org/10.1145/2009916.2009981
Ellen Souza, Douglas Vitório, Dayvid Castro, Adriano L. I. Oliveira, and Cristine Gusmão. 2016. Characterizing Opinion Mining: A Systematic Mapping Study of the Portuguese Language. In Proceedings of the Computational Processing of the Portuguese Language (Lecture Notes in Computer Science, Vol. 9727). Springer, Tomar, Portugal, 122–127. https://doi.org/10.1007/978-3-319-41552-9_12
Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013.Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In Conference on Knowledge Discovery and Data Mining. ACM, Chicago, USA, 847–855. https://doi.org/10.1145/2487575.2487629
Ryan J. Tibshirani and Robert Tibshirani. 2009. A Bias Correction for the Minimum Error Rate in Cross-Validation. The Annals of Applied Statistics 3, 2 (2009), 822–829. https://doi.org/10.1109/18.490564
Ioannis Tsamardinos, Elissavet Greasidou, and Giorgos Borboudakis. 2018. Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation. Machine Learning 107, 12 (2018), 1895–1922. https://doi.org/10.1007/s10994-018-5714-4
Ioannis Tsamardinos, Amin Rakhshani, and Vincenzo Lagani. 2015. Performance- Estimation Properties of Cross-Validation-Based Protocols with Simultaneous Hyper-Parameter Optimization. International Journal on Artificial Intelligence Tools 24, 5 (2015), 1–29. https://doi.org/10.1142/S0218213015400230
Alper Kursat Uysal and Serkan Günal. 2014. The impact of preprocessing on text classification. Information Processing and Management 50, 1 (2014), 104–112 https://doi.org/10.1016/j.ipm.2013.08.006
Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics bulletin 1, 6 (1945), 80–83.
Clarissa Xavier. 2018. Polarity Classification of Traffic Related Tweets. In Proceedings of Encontro Nacional de Inteligência Artificial e Computacional. São Paulo, Brazil.
Saqib Alam and Nianmin Yao. 2019. The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Comput. Math. Organ. Theory 25, 3 (2019), 319–335. https://doi.org/10.1007/s10588-018-9266-8
Matheus Araújo, Júlio Cesar dos Reis, Adriano C. M. Pereira, and Fabrício Benevenuto. 2016. An Evaluation of Machine Translation for Multilingual Sentence-level Sentiment Analysis. In Proceedings of the Annual ACM Symposium on Applied Computing. ACM, Pisa, Italy, 1140–1145. https://doi.org/10.1145/2851613.2851817
Hans-Georg Beyer and Hans-Paul Schwefel. 2002. Evolution strategies - A comprehensive introduction. Natural Computing 1, 1 (2002), 3–52. https://doi.org/10.1023/A:1015059928466
Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python. O’Reilly Media, Inc., Sebastopol, Estados Unidos.
Henrico Bertini Brum and Maria das Graças Volpe Nunes. 2018. Building a Sentiment Corpus of Tweets in Brazilian Portuguese. In Proceedings of the International Conference on Language Resources and Evaluation. ELRA, Miyazaki, Japan.
Gavin C. Cawley and Nicola L. C. Talbot. 2010. On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation. J. Mach. Learn. Res. 11 (2010), 2079–2107. https://doi.org/10.5555/1756006.1859921
Douglas Nunes de Oliveira and Luiz Henrique de Campos Merschmann. 2021. Joint evaluation of preprocessing tasks with classifiers for sentiment analysis in Brazilian Portuguese language. Multimedia Tools and Applications 80, 10 (Feb 2021), 15391–15412. https://doi.org/10.1007/s11042-020-10323-8
Alex Guimarães Cardoso de Sá, Walter José G. S. Pinto, Luiz Otávio Vilas Boas Oliveira, and Gisele L. Pappa. 2017. RECIPE: A Grammar-Based Framework for Automatically Evolving Classification Pipelines. In Proceedings of the European Conference on Genetic Programming. Springer International Publishing, Amsterdam, Netherlands, 246–261. https://doi.org/10.1007/978-3-319-55696-3_16
Fernando Leandro dos Santos and Marcelo Ladeira. 2014. The Role of Text Preprocessing in Opinion Mining on a Social Media Language Dataset. In Proceedings of the Brazilian Conference on Intelligent Systems. IEEE, São Paulo, Brazil, 50–54. https://doi.org/10.1109/BRACIS.2014.20
David M. Eberhard, Gary F. Simons, and Charles D. Fennig (Eds.). 2022. Ethnologue: Languages of the World (25 ed.). SIL International, Dallas, TX, USA.
Rodrigo Santana Ferreira. 2017. Análise de Sentimentos - Aprenda de uma vez por todas como funciona utilizando dados do Twitter. [link]. 3 de março de 2019
Matthias Feurer and Frank Hutter. 2019. Hyperparameter Optimization. Springer International Publishing, Cham, 3–33. https://doi.org/10.1007/978-3-030-05318-5_1
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and Robust Automated Machine Learning. In Proceedings of the Neural Information Processing Systems Conference. Curran Associates, Inc., Montreal, Canada, 2962–2970. [link].
Erick Rocha Fonseca and João Luís Garcia Rosa. 2013. Mac-morpho revisited: Towards robust part-of-speech tagging. In 9th Brazilian Symposium in Information and Human Language Technology (STIL). SBC, Fortaleza, Brasil.
Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research 13 (2012). https://doi.org/10.5555/2503308.2503311
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D. Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Conference on Knowledge Discovery and Data Mining. ACM, Halifax, Canada, 1487–1495. https://doi.org/10.1145/3097983.3098043
Google Cloud. 2019. Custom Machine Learning Models https://cloud.google.com/automl/ 3 de junho de 2019
Isabelle Guyon, Krisitn P. Bennett, Gavin C. Cawley, Hugo Jair Escalante, Sergio Escalera, Tin Kam Ho, Núria Macià, Bisakha Ray, Mehreen Saeed, Alexander R. Statnikov, and Evelyne Viegas. 2015. Design of the 2015 ChaLearn AutoML challenge. In Proceedings of the International Joint Conference on Neural Networks. IEEE, Killarney, Ireland, 1–8. https://doi.org/10.1109/IJCNN.2015.7280767
Nikolaus Hansen and Andreas Ostermeier. 2001. Completely Derandomized Self-Adaptation in Evolution Strategies. Evolutionary Computation 9 (2001). https://doi.org/10.1162/106365601750190398
Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2011. Sequential Model-Based Optimization for General Algorithm Configuration. In Proceedings of the Learning and Intelligent Optimization International Conference (Lecture Notes in Computer Science, Vol. 6683), Carlos A. Coello Coello (Ed.). Springer, Rome, Italy,507–523. https://doi.org/10.1007/978-3-642-25566-3_40
Milton Stiilpen Junior and Luiz Henrique de Campos Merschmann. 2016. A Methodology to Handle Social Media Posts in Brazilian Portuguese for Text Mining Applications. In Proceedings of the Brazilian Symposium on Multimedia and the Web. ACM, Teresina, Brazil, 239–246. https://doi.org/10.1145/2976796.2976845
Ron Kohavi. 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2 (Montreal, Quebec, Canada) (IJCAI’95). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1137–1143.
Lars Kotthoff, Chris Thornton, Holger H. Hoos, Frank Hutter, and Kevin Leyton-Brown. 2017. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal of Machine Learning Research 18 (2017).
Renato F. Martins, Adriano C. M. Pereira, and Fabrício Benevenuto. 2015. An Approach to Sentiment Analysis of Web Applications in Portuguese. In Proceedings of the Brazilian Symposium on Multimedia and the Web. ACM, Manaus, Brazil, 105–112. https://doi.org/10.1145/2820426.2820446
Sascha Narr, Michael Hülfenhaus, and Sahin Albayrak. 2012. Language-independent twitter sentiment analysis. In Proceedings of the Workshop on Knowledge Discovery, Data Mining and Machine Learning. Dortmund, Germany, 12–14.
Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore. 2016. Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, New York, USA. https://doi.org/10.1145/2908812.2908918
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake VanderPlas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011). https://doi.org/10.5555/1953048.2078195
Kumar Ravi and Vadlamani Ravi. 2015. A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowledge-Based Systems 89 (2015), 14–46. https://doi.org/10.1016/j.knosys.2015.06.015
Filipe N. Ribeiro, Matheus Araújo, Pollyanna Gonçalves, Marcos André Gonçalves, and Fabrício Benevenuto. 2016. SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Science 5, 1 (2016), 1–29. https://doi.org/10.1140/epjds/s13688-016-0085-1
Ismael Rodríguez-Fdez, Adrián Canosa, Manuel Mucientes, and Alberto Bugarín. 2015. STAC: a web platform for the comparison of algorithms using statistical tests. In Proceedings of the 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). https://doi.org/10.1109/FUZZ-IEEE.2015.7337889
Ismael Santana Silva, Janaína Gomide, Adriano Veloso, Wagner Meira Jr., and Renato Ferreira. 2011. Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In Proceedings of the International Conference on Research and Development in Information Retrieval. ACM, Beijing, China, 475–484. https://doi.org/10.1145/2009916.2009981
Ellen Souza, Douglas Vitório, Dayvid Castro, Adriano L. I. Oliveira, and Cristine Gusmão. 2016. Characterizing Opinion Mining: A Systematic Mapping Study of the Portuguese Language. In Proceedings of the Computational Processing of the Portuguese Language (Lecture Notes in Computer Science, Vol. 9727). Springer, Tomar, Portugal, 122–127. https://doi.org/10.1007/978-3-319-41552-9_12
Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013.Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In Conference on Knowledge Discovery and Data Mining. ACM, Chicago, USA, 847–855. https://doi.org/10.1145/2487575.2487629
Ryan J. Tibshirani and Robert Tibshirani. 2009. A Bias Correction for the Minimum Error Rate in Cross-Validation. The Annals of Applied Statistics 3, 2 (2009), 822–829. https://doi.org/10.1109/18.490564
Ioannis Tsamardinos, Elissavet Greasidou, and Giorgos Borboudakis. 2018. Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation. Machine Learning 107, 12 (2018), 1895–1922. https://doi.org/10.1007/s10994-018-5714-4
Ioannis Tsamardinos, Amin Rakhshani, and Vincenzo Lagani. 2015. Performance- Estimation Properties of Cross-Validation-Based Protocols with Simultaneous Hyper-Parameter Optimization. International Journal on Artificial Intelligence Tools 24, 5 (2015), 1–29. https://doi.org/10.1142/S0218213015400230
Alper Kursat Uysal and Serkan Günal. 2014. The impact of preprocessing on text classification. Information Processing and Management 50, 1 (2014), 104–112 https://doi.org/10.1016/j.ipm.2013.08.006
Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics bulletin 1, 6 (1945), 80–83.
Clarissa Xavier. 2018. Polarity Classification of Traffic Related Tweets. In Proceedings of Encontro Nacional de Inteligência Artificial e Computacional. São Paulo, Brazil.
Publicado
07/11/2022
Como Citar
OLIVEIRA, Douglas Nunes de; MERSCHMANN, Luiz Henrique de Campos.
An Auto-ML Approach Applied to Text Classification. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 28. , 2022, Curitiba.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2022
.
p. 115-123.