Genetic Programming-based AutoML for EEG Signal Classification - A Comparative Study

I. M. Miranda; C. Aranha; A. P. L. de Carvalho; L. P. F.  Garcia

doi:10.5753/kdmile.2022.227815

I. M. Miranda Universidade de Brasília
C. Aranha University of Tsukuba
A. P. L. de Carvalho Universidade de São Paulo
L. P. F. Garcia Universidade de Brasília

DOI: https://doi.org/10.5753/kdmile.2022.227815

Resumo

End-to-end Machine Learning (ML) applications using complex data often need to investigate several alternatives for the data modeling pipeline before a good solution is found. This process, which is time-consuming and subjective, can benefit from an automated solution design by using Automated Machine Learning (AutoML). End-toend AutoML allows automated data preparation, modeling, and evaluation of ML pipelines, increasing the chances of arriving at a good solution. AutoML can implement this optimization with different strategies. Among them, Genetic Programming (GP) stands out for its ability to create pipelines of arbitrary format, allowing high interpretability and the customization of information from the data context. This paper proposes and compares two approaches of end-to-end AutoML optimized with GP for a time series classification problem, the classification of Electroencephalogram (EEG) signals. We selected this dataset because of the signals’ high complexity, spatial and temporal co-variance, and nonstationarity. For the AutoML experiments, four different domain-based data characterization measures are evaluated. The analysis of the data characterization measures shows that using only spectral or time-domain features does not lead to pipelines with good predictive performance. Our experimental results also reveal how AutoML can generate more accurate and interpretable solutions than the literature’s complex and ad hoc models. The proposed approach makes it easier to analyze dimensional reduction through fitness convergence, tree depth, and extracted features.

Palavras-chave: AutoML, Classification, EEG, End-to-end Machine Learning, Genetic Programming, Sleep Spindles

Referências

Ahmed, B., Redissi, A., and Tafreshi, R. A characterization of sleep spindles in eeg. In World Congress on Medical Physics and Biomedical Engineering, September 7 - 12, 2009, Munich, Germany. Springer Berlin Heidelberg, 2009.

Azevedo, A. I. R. L. and Santos, M. F. Kdd, semma and crisp-dm: a parallel overview. IADS-DM, 2008.

Bontempi, G., Taieb, S. B., and Le Borgne, Y.-A. Machine learning strategies for time series forecasting. In European business intelligence summer school. Springer, pp. 62–77, 2012.

Devuyst, S., Dutoit, T., Stenuit, P., and Kerkhofs, M. Automatic sleep spindles detection—overview and development of a standard proposal assessment method. In 2011 Annual international conference of the IEEE engineering in medicine and biology society. IEEE, pp. 1713–1716, 2011.

Eberhard, P., Schiehlen, W., and Bestle, D. Some advantages of stochastic methods in multicriteria optimization of multibody systems. Archive of Applied Mechanics 69 (8): 543–554, 1999.

Fabris, F. and Freitas, A. A. Analysing the overfit of the auto-sklearn automated machine learning tool. In International Conference on Machine Learning, Optimization, and Data Science. Springer, pp. 508–520, 2019.

Hutter, F., Kotthoff, L., and Vanschoren, J. Automated Machine Learning. Springer, 2019.

Kevric, J. and Subasi, A. Comparison of signal decomposition methods in classification of eeg signals for motor-imagery bci system. Biomedical Signal Processing and Control vol. 31, pp. 398–406, 2017.

Koza, J. R. Genetic programming as a means for programming computers by natural selection. Statistics and computing 4 (2): 87–112, 1994.

Lachner-Piza, D., Epitashvili, N., Schulze-Bonhage, A., Stieglitz, T., Jacobs, J., and Dümpelmann, M. A single channel sleep-spindle detector based on multivariate classification of eeg epochs: Mussdet. Journal of neuroscience methods vol. 297, pp. 31–43, 2018.

LeDell, E. and Poirier, S. H2O AutoML: Scalable automatic machine learning. 7th ICML Workshop on Automated Machine Learning (AutoML), July, 2020.

Lubba, C. H., Sethi, S. S., Knaute, P., Schultz, S. R., Fulcher, B. D., and Jones, N. S. catch22: Canonical time-series characteristics. Data Mining and Knowledge Discovery 33 (6): 1821–1852, 2019.

Miranda, Í. M., Aranha, C., and Ladeira, M. Classification of eeg signals using genetic programming for feature construction. In Proceedings of the Genetic and Evolutionary Computation Conference. pp. 1275–1283, 2019.

Motamedi-Fakhr, S., Moshrefi-Torbati, M., Hill, M., Hill, C. M., and White, P. R. Signal processing techniques applied to human sleep eeg signals—a review. Biomedical Signal Processing and Control, 2014.

Olson, R. S. and Moore, J. H. Tpot: A tree-based pipeline optimization tool for automating machine learning. In Automated Machine Learning. Springer, pp. 151–160, 2019.

Poli, R., Langdon, W. B., and McPhee, N. F. A field guide to genetic programming, 2008.

Tsanas, A. and Clifford, G. D. Stage-independent, single lead eeg sleep spindle detection using the continuous wavelet transform and local weighted smoothing. Frontiers in human neuroscience vol. 9, pp. 181, 2015.

Zhuang, X., Li, Y., and Peng, N. Enhanced automatic sleep spindle detection: a sliding window-based wavelet analysis and comparison using a proposal assessment method. In Applied Informatics. Vol. 3. SpringerOpen, 2016.

Zöller, M.-A. and Huber, M. F. Benchmark and survey of automated machine learning frameworks. Journal of Artificial Intelligence Research vol. 70, pp. 409–472, 2021.