Uma análise experimental de métricas de similaridade na classificação de séries temporais
Resumo
Neste trabalho, avaliamos alternativas para o problema de classificação de séries temporais. Com o objetivo de aumentar a precisão do classificador do vizinho mais próximo, diversas métricas têm sido propostas como opção à distância Euclidiana. Avaliamos algumas opções e, baseado no teste de Wilcoxon para dados pareados, produzimos uma relação daquelas em que há evidências de melhoria na precisão do classificador.Referências
Agrawal, R., Faloutsos, C., and Swami, A. (1993). Efficient Similarity Search in Sequence Databases. Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms, pages 69–84.
Agrawal, R., lp Lin, K., Sawhney, H. S., and Shim, K. (1995). Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. Proceedings of the 21th International Conference on Very Large Data Bases, pages 490–501.
Antunes, C. M. and Oliveira, A. L. (2001). Temporal Data Mining: An Overview. In Proceedings of the Workshop on Temporal Data Mining, San Francisco, EUA. Knowledge Discovery and Data Mining (KDD 01).
Berndt, D. J. and Clifford, J. (1994). Using Dynamic Time Warping to Find Patterns in Time Series. In KDD Workshop, pages 359–370.
Bozkaya, T., Yazdani, N., and Özsoyoglu, M. (1997). Matching and Indexing Sequences of Different Lengths. In CIKM ’97: Proceedings of the sixth international conference on Information and knowledge management, pages 128–135, New York, NY, USA. ACM Press.
Demsar, J. (2006). Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research, 7(1):1–30.
Galassi, M., Davies, J., Theiler, J., Gough, B., Jungman, G., Booth, M., and Rossi, F. (2006). GNU Scientific Library: Reference Manual. Network Theory.
Geurts, P. (2002). Contributions to decision tree induction: bias/variance tradeoff and time series classification. PhD thesis, Department of Electrical Engineering and Computer Science, University of Liege, Belgium.
Hettich, S. and Bay, S. D. (1999). The UCI KDD Archive. [link].
Keogh, E. and Kasetty, S. (2003). On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. Data Mining and Knowledge Discovery, 7(4):349–371.
Keogh, E., Xi, X., Wei, L., and Ratanamahatana, C. A. (2006). The UCR Time Series Classification/Clustering. [link].
Mörchen, F. (2003). Time series feature extraction for data mining using DWT and DFT. Technical report, Departement of Mathematics and Computer Science Philipps-University Marburg.
Nanopoulos, A., Alcock, R., and Manolopoulos, Y. (2001). Feature-based Classification of Time-series Data. pages 49–61.
Olszewski, R. T. (2001). Generalized Feature Extraction for Structural Pattern Recognition in Time-series Data. PhD thesis, Carnegie Mellon University, Pittsburgh, PA. Co-Chair-Roy Maxion and Co-Chair-Dan Siewiorek.
Perng, C.-S., Wang, H., Zhang, S. R., and Parker, D. S. (2000). Landmarks: a new model for similarity-based pattern querying in time series databases. In Proceedings 16th International Conference on Data Engineering, pages 33–42, San Diego, CA.
Povinelli, R. J., Johnson, M. T., Lindgren, A. C., and Ye, J. (2004). Time series classification using Gaussian mixture models of reconstructed phase spaces. IEEE Transactions on Knowledge and Data Engineering, 16(6):779–783.
Pyle, D. (1999). Data preparation for data mining. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
Savary, L. (2002). Notion of Similarity in (Spatio-)Temporal Data Mining. In ECAI’02 Workshop on Knowledge Discovery from (Spatio-)Temporal Data, pages 63–71.
Theodoridis, S. and Koutroumbas, K. (2006). Pattern Recognition. Elsevier/Academic Press, Amsterdam.
Xi, X., Keogh, E., Shelton, C., Wei, L., and Ratanamahatana, C. A. (2006). Fast time series classification using numerosity reduction. In ICML ’06: Proceedings of the 23rd international conference on Machine learning, pages 1033–1040, New York, NY, USA. ACM Press.
Yamada, Y., Suzuki, E., Yokoi, H., and Takabayashi, K. (2003). Decision-tree Induction from Time-series Data Based on a Standard-example Split Test. In Proceedings of the 12th International Conference on Machine Learning, pages 840–847.
Agrawal, R., lp Lin, K., Sawhney, H. S., and Shim, K. (1995). Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. Proceedings of the 21th International Conference on Very Large Data Bases, pages 490–501.
Antunes, C. M. and Oliveira, A. L. (2001). Temporal Data Mining: An Overview. In Proceedings of the Workshop on Temporal Data Mining, San Francisco, EUA. Knowledge Discovery and Data Mining (KDD 01).
Berndt, D. J. and Clifford, J. (1994). Using Dynamic Time Warping to Find Patterns in Time Series. In KDD Workshop, pages 359–370.
Bozkaya, T., Yazdani, N., and Özsoyoglu, M. (1997). Matching and Indexing Sequences of Different Lengths. In CIKM ’97: Proceedings of the sixth international conference on Information and knowledge management, pages 128–135, New York, NY, USA. ACM Press.
Demsar, J. (2006). Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research, 7(1):1–30.
Galassi, M., Davies, J., Theiler, J., Gough, B., Jungman, G., Booth, M., and Rossi, F. (2006). GNU Scientific Library: Reference Manual. Network Theory.
Geurts, P. (2002). Contributions to decision tree induction: bias/variance tradeoff and time series classification. PhD thesis, Department of Electrical Engineering and Computer Science, University of Liege, Belgium.
Hettich, S. and Bay, S. D. (1999). The UCI KDD Archive. [link].
Keogh, E. and Kasetty, S. (2003). On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. Data Mining and Knowledge Discovery, 7(4):349–371.
Keogh, E., Xi, X., Wei, L., and Ratanamahatana, C. A. (2006). The UCR Time Series Classification/Clustering. [link].
Mörchen, F. (2003). Time series feature extraction for data mining using DWT and DFT. Technical report, Departement of Mathematics and Computer Science Philipps-University Marburg.
Nanopoulos, A., Alcock, R., and Manolopoulos, Y. (2001). Feature-based Classification of Time-series Data. pages 49–61.
Olszewski, R. T. (2001). Generalized Feature Extraction for Structural Pattern Recognition in Time-series Data. PhD thesis, Carnegie Mellon University, Pittsburgh, PA. Co-Chair-Roy Maxion and Co-Chair-Dan Siewiorek.
Perng, C.-S., Wang, H., Zhang, S. R., and Parker, D. S. (2000). Landmarks: a new model for similarity-based pattern querying in time series databases. In Proceedings 16th International Conference on Data Engineering, pages 33–42, San Diego, CA.
Povinelli, R. J., Johnson, M. T., Lindgren, A. C., and Ye, J. (2004). Time series classification using Gaussian mixture models of reconstructed phase spaces. IEEE Transactions on Knowledge and Data Engineering, 16(6):779–783.
Pyle, D. (1999). Data preparation for data mining. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
Savary, L. (2002). Notion of Similarity in (Spatio-)Temporal Data Mining. In ECAI’02 Workshop on Knowledge Discovery from (Spatio-)Temporal Data, pages 63–71.
Theodoridis, S. and Koutroumbas, K. (2006). Pattern Recognition. Elsevier/Academic Press, Amsterdam.
Xi, X., Keogh, E., Shelton, C., Wei, L., and Ratanamahatana, C. A. (2006). Fast time series classification using numerosity reduction. In ICML ’06: Proceedings of the 23rd international conference on Machine learning, pages 1033–1040, New York, NY, USA. ACM Press.
Yamada, Y., Suzuki, E., Yokoi, H., and Takabayashi, K. (2003). Decision-tree Induction from Time-series Data Based on a Standard-example Split Test. In Proceedings of the 12th International Conference on Machine Learning, pages 840–847.
Publicado
30/06/2007
Como Citar
DRAGO, Idilio; VAREJÃO, Flávio Miguel.
Uma análise experimental de métricas de similaridade na classificação de séries temporais. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 6. , 2007, Rio de Janeiro/RJ.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2007
.
p. 1092-1101.
ISSN 2763-9061.
