Challenges on Classifying Data Streams with Concept Drift

  • Eduardo Victor Lima Barboza Universidade Federal do Paraná
  • Paulo Ricardo Lisboa de Almeida Universidade Federal do Paraná

Resumo


Concept Drift é um problema comum quando estamos trabalhando com Aprendizado de Máquina. Refere-se a uma mudança de conceito em um intervalo de tempo, o que pode deteriorar a acurácia do modelo. Um problema recorrente em concept drift é achar datasets que reflitam cenários do mundo real. Neste trabalho, mostramos algumas bases de dados, onde sabe-se que existe Concept Drift, e propomos algumas mudanças em um método existente (Dynse), que inclui fazê-lo capaz de lidar com fluxos de dados, ao invés de lotes, e colocar algum gatilho nele, para deixar sua janela adaptativa, com detecção de concept drift.

Palavras-chave: Concept Drift, Data Streams, Datasets, Machine Learning

Referências

Almeida, P. R., Oliveira, L. S., Britto, A. S., and Sabourin, R. (2018). Adapting dynamic classifier selection for concept drift. Expert Systems with Applications, 104:67-85.

Almeida, P. R. L. d., Oliveira, L. S., Britto, A. D. S., and Sabourin, R. (2016). Handling concept drifts using dynamic selection of classifiers. In 28th ICTAI, pages 989-995.

Almeida, P. R. L. d., Oliveira, L. S., Souza Britto, A. d., and Paul Barddal, J. (2020). Naïve approaches to deal with concept drifts. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 1052-1059.

Baena-García, M., Campo-Ávila, J., Fidalgo-Merino, R., Bifet, A., Gavald, R., and Morales-Bueno, R. (2006). Early drift detection method. 4th WKDDS.

Bifet, A. and Gavaldà, R. (2007). Learning from time-changing data with adaptive windowing. In SDM.

Bifet, A., Holmes, G., and Pfahringer, B. (2010). Leveraging bagging for evolving data streams. In Balcázar, J. L., Bonchi, F., Gionis, A., and Sebag, M., editors, Machine Learning and Knowledge Discovery in Databases, pages 135-150, Berlin, Heidelberg. Springer Berlin Heidelberg.

Ditzler, G., Roveri, M., Alippi, C., and Polikar, R. (2015). Learning in nonstationary environments: A survey. IEEE Computational Intelligence Magazine, 10(4):12-25.

Domingos, P. and Hulten, G. (2000). Mining high-speed data streams. Association for Computing Machinery, page 71-80.

Gama, J., Medas, P., Castillo, G., and Rodrigues, P. (2004). Learning with drift detection. In Bazzan, A. L. C. and Labidi, S., editors, Advances in Artificial Intelligence-SBIA 2004, pages 286-295, Berlin, Heidelberg. Springer Berlin Heidelberg.

Gama, J. a., Zliobait, I., Bifet, A., Pechenizkiy, M., and Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Comput. Surv., 46(4).

Hulten, G., Spencer, L., and Domingos, P. (2001). Mining time-changing data streams. In 7th SIGKDD, KDD ’01, page 97-106, New York, NY, USA. ACM.

Jordaney, R., Sharad, K., Dash, S. K., Wang, Z., Papini, D., Nouretdinov, I., and Cavallaro, L. (2017). Transcend: Detecting concept drift in malware classification models. In 26th USENIX, pages 625-642, Vancouver, BC. USENIX Association.

Kozal, J., Guzy, F., and Wozniak, M. (2021). Employing chunk size adaptation to overcome concept drift.

Kubát, M. (1989). Floating approximation in time-varying knowledge bases. Pattern Recognition Letters, 10(4):223-227.

Kuncheva, L. I. (2004). Classifier ensembles for changing environments. In Roli, F., Kittler, J., and Windeatt, T., editors, Multiple Classifier Systems, pages 1-15, Berlin, Heidelberg. Springer Berlin Heidelberg.

Lu, J., Liu, A., Song, Y., and Zhang, G. (2020). Data-driven decision support under concept drift in streamed big data. Complex & Intelligent Systems 6, pages 157-163.

Muller, M. and Salathé, M. (2020). Addressing machine learning concept drift reveals declining vaccine sentiment during the covid-19 pandemic.

Oza, N. (2005). Online bagging and boosting. In 2005 IEEE International Conference on Systems, Man and Cybernetics, volume 3, pages 2340-2345 Vol. 3.

Schlimmer, J. C. and Granger, R. H. (1986). Incremental learning from noisy data. Mach. Learn., 1(3):317-354.

Souza, V. M. A., dos Reis, D. M., Maletzke, A. G., and Batista, G. E. A. P. A. (2020). Challenges in benchmarking stream learning algorithms with real-world data. Data Mining and Knowledge Discovery, 34(6):1805-1858.

Street, W. N. and Kim, Y. (2001). A streaming ensemble algorithm (SEA) for large-scale classification. In 7th SIGKDD, KDD ’01, page 377-382, New York, NY, USA. ACM.

Woods, K., Kegelmeyer, W., and Bowyer, K. (1997). Combination of multiple classifiers using local accuracy estimates. Pattern Analysis and Machine Intellig., 19(4):405-410.

Yang, L. and Shami, A. (2021). A lightweight concept drift detection and adaptation framework for IoT data streams. IEEE Internet of Things Magazine, 4(2):96-101.
Publicado
19/09/2022
BARBOZA, Eduardo Victor Lima; DE ALMEIDA, Paulo Ricardo Lisboa. Challenges on Classifying Data Streams with Concept Drift. In: WORKSHOP DE TESES E DISSERTAÇÕES (WTDBD) - SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 37. , 2022, Búzios. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 126-132. DOI: https://doi.org/10.5753/sbbd_estendido.2022.21854.