Large-Scale Similarity-Based Time Series Mining

  • Diego F. Silva USP
  • Gustavo E. A. P. A. Batista USP
  • Eamonn Keogh University of California

Resumo


Measuring the (dis)similarity between time series is the main procedure of several algorithms for mining this kind of data, which is ubiquitous in the day-by-day of human beings. While providing satisfactory results, similarity-based methods usually suffer from a high time complexity. This work summarizes a thesis on developing algorithms that allow the similarity-based mining of temporal data in a large scale. The contributions of the thesis have implications in several data mining tasks, such as classification, clustering and motif discovery, as well as applications in music data science.

Referências

Dau, H. A., Silva, D. F., Petitjean, F., Forestier, G., Bagnall, A., and Keogh, E. (2017). Judicious setting of dynamic time warping’s window width allows more accurate classification of time series. In IEEE BigData Conference.

Giusti, R., Silva, D. F., and Batista, G. E. A. P. A. (2015). Time series classification with representation ensembles. Lecture Notes in Computer Science, 9385:108–119.

Giusti, R., Silva, D. F., and Batista, G. E. A. P. A. (2016). Improved time series classification with representation diversity and svm. In IEEE International Conference on Machine Learning and Applications, pages 1–6.

Lemes, C. I., Silva, D. F., and Batista, G. E. (2014). Adding diversity to rank examples in anytime nearest neighbor classification. In IEEE International Conference on Machine Learning and Application, pages 129–134.

Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., and Keogh, E. (2012). Searching and mining trillions of time series subsequences under dynamic time warping. In SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 262–270.

Silva, D. F. and Batista, G. E. A. P. A. (2016). Speeding up all-pairwise dynamic time warping matrix calculation. In SIAM International Conference on Data Mining, pages 837–845.

Silva, D. F., Batista, G. E. A. P. A., and Keogh, E. (2016a). Prefix and suffix invariant dynamic time warping. In IEEE International Conference on Data Mining, pages 1209–1214.

Silva, D. F., Giusti, R., Keogh, E., and Batista, G. E. A. P. A. (in press). Speeding up similarity search under dynamic time warping by pruning unpromising alignments. Data Mining and Knowledge Discovery, pages 1–32.

Silva, D. F., Rossi, R. G., Rezende, S. O., and Batista, G. E. A. P. A. (2014). Music classification by transductive learning using bipartite heterogeneous networks. In International Society for Music Information Retrieval Conference, pages 113–118.

Silva, D. F., Souza, V. M., Ellis, D. P., Keogh, E. J., and Batista, G. E. (2015a). Exploring low cost laser sensors to identify flying insect species. Journal of Intelligent & Robotic Systems, 80(1):313–330.

Silva, D. F., Souza, V. M. A., and Batista, G. E. A. P. A. (2015b). Music shapelets for fast cover song regognition. In International Society for Music Information Retrieval Conference, pages 441–447.

Silva, D. F., Yeh, C.-C. M., Batista, G. E. A. P. A., and Keogh, E. (2016b). SiMPle: assessing music similarity using subsequences joins. In International Society for Music Information Retrieval Conference, pages 23–29.

Souza, V. M. A., Silva, D. F., and Batista, G. E. A. P. A. (2014). Extracting texture features for time series classification. In International Conference on Pattern Recognition, pages 1425–1430.

Souza, V. M. A., Silva, D. F., Batista, G. E. A. P. A., and Gama, J. (2015a). Classification of evolving data streams with infinitely delayed labels. In IEEE International Conference on Machine Learning and Applications, pages 214–219.

Souza, V. M. A., Silva, D. F., Gama, J., and Batista, G. E. A. P. A. (2015b). Data stream classification guided by clustering on nonstationary environments and extreme verification latency. In SIAM International Conference on Data Mining, pages 873–881.

Yeh, C.-C. M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, H. A., Silva, D. F., Mueen, A., and Keogh, E. (2016). Matrix Profile I: All pairs similarity joins for time series: A unifying view that includes motifs, discords and shapelets. In IEEE International Conference on Data Mining, pages 1317–1322.

Yeh, C.-C. M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, H. A., Zimmerman, Z., Silva, D. F., Mueen, A., and Keogh, E. (2017). Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile. Data Mining and Knowledge Discovery, pages 1–41.
Publicado
26/07/2018
SILVA, Diego F.; BATISTA, Gustavo E. A. P. A.; KEOGH, Eamonn. Large-Scale Similarity-Based Time Series Mining. In: CONCURSO DE TESES E DISSERTAÇÕES (CTD), 31. , 2018, Natal. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 85-90. ISSN 2763-8820. DOI: https://doi.org/10.5753/ctd.2018.3656.