Quantile Symbolic Aggregate approXimation: A guaranteed equiprobable SAX


Time series are broadly present in science and industry. In specific scenarios, it is useful to classify series in order to gain knowledge regarding a specific range of values. In such cases, we often use symbolic representation, as it can reduce the data dimensionality creating representative symbols, making the data discrete and allowing specialized algorithms to be applied to the data. One of the most prominent methods of this type of representation is the Symbolic Aggregate approXimation (SAX), which, in addition to generating the symbolic sequence, also reduces the data dimension. However, one of the problems of SAX is that, in order to guarantee the balance of symbols, it assumes the normality of the distribution, which fails in some distributions and causes the class imbalance problem. We propose a unique seamless approach to guarantee the balance among the classes, which may lead to better performance in classification algorithms.

Palavras-chave: Time Series, Class Imbalance, Dimensionality Reduction


Bountrogiannis, K., Tzagkarakis, G., and Tsakalides, P. (2021). Data-driven kernel-based probabilistic sax for time series dimensionality reduction. In 2020 28th European Signal Processing Conference (EUSIPCO), pages 2343–2347.

Bountrogiannis, K., Tzagkarakis, G., and Tsakalides, P. (2022). Distribution agnostic symbolic representations for time series dimensionality reduction and online anomaly detection. IEEE Transactions on Knowledge and Data Engineering, pages 1–1.

Espejo, P. G., Ventura, S., and Herrera, F. (2010). A survey on the application of genetic programming to classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(2):121–144.

Kloska, M. and Rozinajova, V. (2020). Distribution-wise symbolic aggregate approximation (dwsax). In Intelligent Data Engineering and Automated Learning – IDEAL 2020: 21st International Conference, Guimaraes, Portugal, November 4–6, 2020, Proceedings, Part I, page 304–315, Berlin, Heidelberg. Springer-Verlag.

Lin, J., Keogh, E., Lonardi, S., and Chiu, B. (2003). A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD ’03, page 2–11, New York, NY, USA. Association for Computing Machinery.

Lin, J., Keogh, E., Wei, L., and Lonardi, S. (2007). Experiencing sax: a novel symbolic representation of time series. Data Mining and knowledge discovery, 15(2):107–144.

Niaz, N. U., Shahariar, K. N., and Patwary, M. J. A. (2022). Class imbalance problems in machine learning: A review of methods and future challenges. In Proceedings of the 2nd International Conference on Computing Advancements, ICCA ’22, page 485–490, New York, NY, USA. Association for Computing Machinery.

Silveira, E. and Assunção, J. (2023). Coronavirus - Time Series - Vaccination by Attribute - RS, Brazil. Available at: https://doi.org/10.7910/DVN/KM5FOX.
SILVEIRA, Eduardo; ASSUNÇÃO, Joaquim; EMMENDORFER, Leonardo. Quantile Symbolic Aggregate approXimation: A guaranteed equiprobable SAX. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 38. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 396-401. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2023.232421.