Drift Detection Methods on Machine Learning Systems: a Discussion over Discrete Live Data
Resumo
Context: Machine learning has become an essential tool for addressing complex problems in information systems, encompassing industrial, commercial, and residential applications. Problem: Machine learning systems without frequent retraining are prone to data and concept drift, compromising predictive accuracy. This issue is particularly critical in scenarios where retraining is infeasible due to high computational costs or data unavailability. Solution: This study evaluates the performance of drift detection methods in discrete time series with controlled changes in mean and standard deviation using synthetic Gaussian signals. IS Theory: The General Systems Theory underpins the study by emphasizing how the interplay between drift detection and adaptive systems contributes to maintaining stability and efficiency in dynamic environments. Method: Experiments were conducted with variations in mean, standard deviation, and both parameters simultaneously in order to obtain qualitative patterns of drift detectors behaviors. The detectors ADWIN, KSWIN, and Page-Hinkley were tested under this scenario. Summary of Results: The findings reveal that ADWIN and Page-Hinkley exhibited greater precision and robustness, while KSWIN showed excessive sensitivity, leading to a high number of false positives. Contributions to the IS Field: This research offers a comprehensive analysis of drift detectors’ performance, specifically in scenarios involving changes in mean and standard deviation, providing useful reference for designing resilient machine learning-based forecasting systems. Impacts on the IS Field: The study advances the development of information systems that can adapt to dynamic data environments characterized by shifts in mean and standard deviation, with direct applications in industrial contexts and energy management.
Palavras-chave:
ADWIN, Concept Drift, Data Drift, KSWIN, Machine Learning, Page-Hinkley
Referências
Rajesh Arora et al. 2021. Prediction and forecasting of COVID-19 outbreak using regression and ARIMA models. Journal of Interdisciplinary Mathematics 24 (2021), 227–243.
Reuben E. Bawack et al. 2022. Artificial intelligence in e-commerce: A bibliometric study and literature review. Electronic Markets 32 (2022), 1–42.
Albert Bifet and Ricard Gavaldà. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining. 443–448.
Ken Binmore. 1994. Game theory and business ethics. Business Ethics Quarterly 4 (1994), 125–143.
George E. P. Box, Gwilym M. Jenkins, and Gregory C. Reinsel. 2008. Time Series Analysis: Forecasting and Control (4th ed.). Wiley, Hoboken, USA.
Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. 2016. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Applied Mathematics 113 (2016), 3932–3937.
Kalyan Chatterjee and William Samuelson (Eds.). 2014. Game Theory and Business Applications (2 ed.). Springer, New York, USA.
Laura Bragante Corssac and Juliano Araujo Wickboldt. 2022. A digital twin-based smart home: A proof of concept study. arXiv preprint arXiv:2212.14238 (2022).
João Gama et al. 2014. A survey on concept drift. ACM Computing Surveys (CSUR) 46 (2014), 1–37.
João Gama et al. 2014. A survey on concept drift adaptation. ACM Computing Surveys (CSUR) 46 (2014), 1–37.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press, Cambridge, USA.
Xiao Guo et al. 2019. Review on the application of artificial intelligence in smart homes. Smart Cities 2 (2019), 402–420.
Stefan Hajkowicz et al. 2023. Artificial intelligence adoption in the physical sciences, natural sciences, life sciences, social sciences and the arts and humanities: A bibliometric analysis of research publications from 1960-2021. Technology in Society 74 (2023), 102260.
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9 (1997), 1735–1780.
Michael C. Horowitz et al. 2024. Adopting AI: how familiarity breeds both trust and contempt. AI & Society 39 (2024), 1721–1735.
Michael I. Jordan and Tom M. Mitchell. 2015. Machine learning: Trends, perspectives, and prospects. Science 349 (2015), 255–260.
Lionel P. Joseph et al. 2024. Short-term wind speed forecasting using a hybrid model. Energy 119 (2024), 122624.
Guolin Ke et al. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems 30 (NIPS 2017).
Andrey Kolmogorov. 1933. Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Istituto Italiano degli Attuari 4 (1933), 83–91.
Vaia I. Kontopoulou et al. 2023. A review of ARIMA vs. machine learning approaches for time series forecasting in data driven networks. Future Internet 15 (2023), 255.
Konstantina Kourou et al. 2015. Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal 13 (2015), 8–17.
Bartosz Krawczyk et al. 2017. Ensemble learning for data stream analysis: A survey. Information Fusion 37 (2017), 132–156.
Jay Lee et al. 2018. Industrial artificial intelligence for industry 4.0-based manufacturing systems. Manufacturing Letters 18 (2018), 20–23.
Tae Kyun Lee et al. 2019. Global stock market investment strategies based on financial network indicators using machine learning techniques. Expert Systems with Applications 117 (2019), 228–242.
Olof Leimar and John M. McNamara. 2023. Game theory in biology: 50 years and onwards. Philosophical Transactions of the Royal Society B: Biological Sciences 378 (2023), 20210509.
Jing Li et al. 2019. A machine learning based method for customer behavior prediction. Tehnički vjesnik 26 (2019), 1670–1676.
Tianyang Lin, Yuxin Wang, and Xiangyang Liu. 2022. A survey of transformers. AI Open 3 (2022), 111–132.
Fábio Silva Lopes, Leandro Augusto da Silva, and Vivaldo José Breternitz. 2018. Research and Education in Data Science: Challenges for the Area of Information Systems. In Grandes Desafios da Pesquisa em Sistemas de Informação no Brasil 2016–2026, Henrique Freitas, Rita Suzana Pitangueira Maciel, and Davi Viana (Eds.). SBC—Sociedade Brasileira de Computação, Porto Alegre, RS, Brasil.
Jing Lu et al. 2018. Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering 31 (2018), 2346–2363.
M.J. Mlacnik et al. 2004. State-of-the-art of the windowing technique. Journal of Canadian Petroleum Technology 43 (2004).
Jacob Montiel et al. 2018. Scikit-Multiflow: A Multi-output Streaming Framework. Journal of Machine Learning Research 19 (2018), 1–5. [link] Accessed: 2024-08-12.
José Moura and David Hutchison. 2019. Game theory for multi-access edge computing: Survey, use cases, and future trends. IEEE Communications Surveys & Tutorials 21 (2019), 255–312.
Pham Van Nam et al. 2024. Using Artificial Intelligence (AI) for Monitoring and Diagnosing Electric Motor Faults Based on Vibration Signals. In 2024 International Conference on Information Networking (ICOIN). 1–6.
Nima Norouzi, Maryam Fani, and Atefeh Behzadi Forough. 2022. Green tax as a path to greener economy: A game theory approach on energy and final goods in Iran. Renewable and Sustainable Energy Reviews 156 (2022), 111968.
E. S. Page. 1954. Continuous inspection schemes. Biometrika 41 (1954), 100–115.
Roberto Pereira and Maria Cecília Calani Baranauskas. 2018. Systemic and Socially Aware Perspective for Information Systems. In Grandes Desafios da Pesquisa em Sistemas de Informação no Brasil 2016–2026, Henrique Freitas, Rita Suzana Pitangueira Maciel, and Davi Viana (Eds.). SBC—Sociedade Brasileira de Computação, Porto Alegre, RS, Brasil.
Agbessi Akuété Pierre et al. 2023. Peak electrical energy consumption prediction by ARIMA, LSTM, GRU, ARIMA-LSTM and ARIMA-GRU approaches. Energies 16 (2023), 4739.
Frédy Pokou, Jules Sadefo Kamdem, and François Benhmad. 2024. Hybridization of ARIMA with learning models for forecasting of stock market time series. Computational Economics 63 (2024), 1349–1399.
Jose Prieto-Gutierrez, Francisco Segado-Boj, and Fabiana da Silva França. 2023. Artificial intelligence in social science: A study based on bibliometrics analysis. Human Technology 19 (2023), 149–162.
André C. Rocha and Luiz H. A. Monteiro. 2023. On the spread of charitable behavior in a social network: A model based on game theory. Networks and Heterogeneous Media 18 (2023), 842–854.
Stephen Schecter and Herbert Gintis. 2016. Game Theory in Action: An Introduction to Classical and Evolutionary Models. Princeton University Press, Princeton, USA.
Weisong Shi et al. 2016. Edge computing: Vision and challenges. IEEE Internet of Things Journal 3 (2016), 637–646.
Sima Siami-Namini, Nasim Siami Tavakoli, and Akbar Siami Namin. 2019. A Comparison of ARIMA and LSTM in Forecasting Time Series. In 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET). IEEE, 1–7.
Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (2nd ed.). MIT Press, Cambridge, USA.
Andreas Svoboda. 2023. The impact of artificial intelligence on the banking industry. Journal of Banking and Finance Management 4 (2023), 7–13.
James W. Taylor. 2003. Short-term electricity demand forecasting using double seasonal exponential smoothing. Journal of the Operational Research Society 54 (2003), 799–805.
Kifayat Ullah et al. 2024. Application of game theory in modern electrical power system (a review). AIP Advances 14 (2024), 010701.
Ashish Vaswani et al. 2017. Attention Is All You Need. In Advances in Neural Information Processing Systems, Vol. 30. 5998–6008.
Melvin M. Vopson. 2020. The information catastrophe. AIP Advances 10 (2020), 085014.
Keith Weigelt and Colin Camerer. 1988. Reputation and corporate strategy: A review of recent theory and applications. Strategic Management Journal 9 (1988), 443–454.
Gerhard Widmer and Miroslav Kubat. 1996. Learning in the presence of concept drift and hidden contexts. Machine Learning 23 (1996), 69–101.
Janith R. Wijesingha et al. 2021. Smart Residential Energy Management System (REMS) Using Machine Learning. In 2021 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE). IEEE, 1–6.
Thorsten Wuest et al. 2016. Machine learning in manufacturing: Advantages, challenges, and applications. Production & Manufacturing Research 4 (2016), 23–45.
Yanbo Xu et al. 2018. RAIM: Recurrent Attentive and Intensive Model of Multimodal Patient Monitoring Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2565–2573.
G. Peter Zhang. 2003. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50 (2003), 159–175.
Qi Zhang, Yao Li, Jing Tan, Lei Lei, and Xi Xiao. 2023. Are Transformers Effective for Time Series Forecasting? Proceedings of the AAAI Conference on Artificial Intelligence 37, 9 (2023), 10901–10909. DOI: 10.1609/aaai.v37i9.26317
Zhi-Hua Zhou. 2012. Ensemble Methods in Machine Learning. Springer, Boca Raton, USA. 317–336 pages.
Cui-Li Zong and Lei Wang. 2018. Prediction of urban residents’ travel rate in China based on ARIMA models. Journal of Interdisciplinary Mathematics 50 (2018), 159–175.
Reuben E. Bawack et al. 2022. Artificial intelligence in e-commerce: A bibliometric study and literature review. Electronic Markets 32 (2022), 1–42.
Albert Bifet and Ricard Gavaldà. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining. 443–448.
Ken Binmore. 1994. Game theory and business ethics. Business Ethics Quarterly 4 (1994), 125–143.
George E. P. Box, Gwilym M. Jenkins, and Gregory C. Reinsel. 2008. Time Series Analysis: Forecasting and Control (4th ed.). Wiley, Hoboken, USA.
Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. 2016. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Applied Mathematics 113 (2016), 3932–3937.
Kalyan Chatterjee and William Samuelson (Eds.). 2014. Game Theory and Business Applications (2 ed.). Springer, New York, USA.
Laura Bragante Corssac and Juliano Araujo Wickboldt. 2022. A digital twin-based smart home: A proof of concept study. arXiv preprint arXiv:2212.14238 (2022).
João Gama et al. 2014. A survey on concept drift. ACM Computing Surveys (CSUR) 46 (2014), 1–37.
João Gama et al. 2014. A survey on concept drift adaptation. ACM Computing Surveys (CSUR) 46 (2014), 1–37.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press, Cambridge, USA.
Xiao Guo et al. 2019. Review on the application of artificial intelligence in smart homes. Smart Cities 2 (2019), 402–420.
Stefan Hajkowicz et al. 2023. Artificial intelligence adoption in the physical sciences, natural sciences, life sciences, social sciences and the arts and humanities: A bibliometric analysis of research publications from 1960-2021. Technology in Society 74 (2023), 102260.
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9 (1997), 1735–1780.
Michael C. Horowitz et al. 2024. Adopting AI: how familiarity breeds both trust and contempt. AI & Society 39 (2024), 1721–1735.
Michael I. Jordan and Tom M. Mitchell. 2015. Machine learning: Trends, perspectives, and prospects. Science 349 (2015), 255–260.
Lionel P. Joseph et al. 2024. Short-term wind speed forecasting using a hybrid model. Energy 119 (2024), 122624.
Guolin Ke et al. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems 30 (NIPS 2017).
Andrey Kolmogorov. 1933. Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Istituto Italiano degli Attuari 4 (1933), 83–91.
Vaia I. Kontopoulou et al. 2023. A review of ARIMA vs. machine learning approaches for time series forecasting in data driven networks. Future Internet 15 (2023), 255.
Konstantina Kourou et al. 2015. Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal 13 (2015), 8–17.
Bartosz Krawczyk et al. 2017. Ensemble learning for data stream analysis: A survey. Information Fusion 37 (2017), 132–156.
Jay Lee et al. 2018. Industrial artificial intelligence for industry 4.0-based manufacturing systems. Manufacturing Letters 18 (2018), 20–23.
Tae Kyun Lee et al. 2019. Global stock market investment strategies based on financial network indicators using machine learning techniques. Expert Systems with Applications 117 (2019), 228–242.
Olof Leimar and John M. McNamara. 2023. Game theory in biology: 50 years and onwards. Philosophical Transactions of the Royal Society B: Biological Sciences 378 (2023), 20210509.
Jing Li et al. 2019. A machine learning based method for customer behavior prediction. Tehnički vjesnik 26 (2019), 1670–1676.
Tianyang Lin, Yuxin Wang, and Xiangyang Liu. 2022. A survey of transformers. AI Open 3 (2022), 111–132.
Fábio Silva Lopes, Leandro Augusto da Silva, and Vivaldo José Breternitz. 2018. Research and Education in Data Science: Challenges for the Area of Information Systems. In Grandes Desafios da Pesquisa em Sistemas de Informação no Brasil 2016–2026, Henrique Freitas, Rita Suzana Pitangueira Maciel, and Davi Viana (Eds.). SBC—Sociedade Brasileira de Computação, Porto Alegre, RS, Brasil.
Jing Lu et al. 2018. Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering 31 (2018), 2346–2363.
M.J. Mlacnik et al. 2004. State-of-the-art of the windowing technique. Journal of Canadian Petroleum Technology 43 (2004).
Jacob Montiel et al. 2018. Scikit-Multiflow: A Multi-output Streaming Framework. Journal of Machine Learning Research 19 (2018), 1–5. [link] Accessed: 2024-08-12.
José Moura and David Hutchison. 2019. Game theory for multi-access edge computing: Survey, use cases, and future trends. IEEE Communications Surveys & Tutorials 21 (2019), 255–312.
Pham Van Nam et al. 2024. Using Artificial Intelligence (AI) for Monitoring and Diagnosing Electric Motor Faults Based on Vibration Signals. In 2024 International Conference on Information Networking (ICOIN). 1–6.
Nima Norouzi, Maryam Fani, and Atefeh Behzadi Forough. 2022. Green tax as a path to greener economy: A game theory approach on energy and final goods in Iran. Renewable and Sustainable Energy Reviews 156 (2022), 111968.
E. S. Page. 1954. Continuous inspection schemes. Biometrika 41 (1954), 100–115.
Roberto Pereira and Maria Cecília Calani Baranauskas. 2018. Systemic and Socially Aware Perspective for Information Systems. In Grandes Desafios da Pesquisa em Sistemas de Informação no Brasil 2016–2026, Henrique Freitas, Rita Suzana Pitangueira Maciel, and Davi Viana (Eds.). SBC—Sociedade Brasileira de Computação, Porto Alegre, RS, Brasil.
Agbessi Akuété Pierre et al. 2023. Peak electrical energy consumption prediction by ARIMA, LSTM, GRU, ARIMA-LSTM and ARIMA-GRU approaches. Energies 16 (2023), 4739.
Frédy Pokou, Jules Sadefo Kamdem, and François Benhmad. 2024. Hybridization of ARIMA with learning models for forecasting of stock market time series. Computational Economics 63 (2024), 1349–1399.
Jose Prieto-Gutierrez, Francisco Segado-Boj, and Fabiana da Silva França. 2023. Artificial intelligence in social science: A study based on bibliometrics analysis. Human Technology 19 (2023), 149–162.
André C. Rocha and Luiz H. A. Monteiro. 2023. On the spread of charitable behavior in a social network: A model based on game theory. Networks and Heterogeneous Media 18 (2023), 842–854.
Stephen Schecter and Herbert Gintis. 2016. Game Theory in Action: An Introduction to Classical and Evolutionary Models. Princeton University Press, Princeton, USA.
Weisong Shi et al. 2016. Edge computing: Vision and challenges. IEEE Internet of Things Journal 3 (2016), 637–646.
Sima Siami-Namini, Nasim Siami Tavakoli, and Akbar Siami Namin. 2019. A Comparison of ARIMA and LSTM in Forecasting Time Series. In 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET). IEEE, 1–7.
Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (2nd ed.). MIT Press, Cambridge, USA.
Andreas Svoboda. 2023. The impact of artificial intelligence on the banking industry. Journal of Banking and Finance Management 4 (2023), 7–13.
James W. Taylor. 2003. Short-term electricity demand forecasting using double seasonal exponential smoothing. Journal of the Operational Research Society 54 (2003), 799–805.
Kifayat Ullah et al. 2024. Application of game theory in modern electrical power system (a review). AIP Advances 14 (2024), 010701.
Ashish Vaswani et al. 2017. Attention Is All You Need. In Advances in Neural Information Processing Systems, Vol. 30. 5998–6008.
Melvin M. Vopson. 2020. The information catastrophe. AIP Advances 10 (2020), 085014.
Keith Weigelt and Colin Camerer. 1988. Reputation and corporate strategy: A review of recent theory and applications. Strategic Management Journal 9 (1988), 443–454.
Gerhard Widmer and Miroslav Kubat. 1996. Learning in the presence of concept drift and hidden contexts. Machine Learning 23 (1996), 69–101.
Janith R. Wijesingha et al. 2021. Smart Residential Energy Management System (REMS) Using Machine Learning. In 2021 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE). IEEE, 1–6.
Thorsten Wuest et al. 2016. Machine learning in manufacturing: Advantages, challenges, and applications. Production & Manufacturing Research 4 (2016), 23–45.
Yanbo Xu et al. 2018. RAIM: Recurrent Attentive and Intensive Model of Multimodal Patient Monitoring Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2565–2573.
G. Peter Zhang. 2003. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50 (2003), 159–175.
Qi Zhang, Yao Li, Jing Tan, Lei Lei, and Xi Xiao. 2023. Are Transformers Effective for Time Series Forecasting? Proceedings of the AAAI Conference on Artificial Intelligence 37, 9 (2023), 10901–10909. DOI: 10.1609/aaai.v37i9.26317
Zhi-Hua Zhou. 2012. Ensemble Methods in Machine Learning. Springer, Boca Raton, USA. 317–336 pages.
Cui-Li Zong and Lei Wang. 2018. Prediction of urban residents’ travel rate in China based on ARIMA models. Journal of Interdisciplinary Mathematics 50 (2018), 159–175.
Publicado
19/05/2025
Como Citar
ROCHA, A. C.; OLIVEIRA, M. B. P.; SAITO, L. A. M.; LIMA, B. L. S.; SILVA, L. A..
Drift Detection Methods on Machine Learning Systems: a Discussion over Discrete Live Data. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 21. , 2025, Recife/PE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 808-817.
DOI: https://doi.org/10.5753/sbsi.2025.246657.