skip to main content
10.1145/3615366.3615423acmotherconferencesArticle/Chapter ViewAbstractPublication PagesladcConference Proceedingsconference-collections
research-article
Open Access

Leveraging Time Series Autocorrelation Through Numerical Differentiation for Improving Failure Prediction

Authors Info & Claims
Published:17 October 2023Publication History

ABSTRACT

Given the complexity of modern software systems, it is no longer possible to detect every fault before deployment. Such faults can eventually lead to failures at runtime, compromising the business process and causing significant risk or losses. Online Failure Prediction (OFP) is a complementary fault-tolerance technique that tries to predict failures in the near future, by using past data and the current state of the system. However, modern systems are comprised of many components and thus a proper characterization of its state requires hundreds of system metrics. As the system evolves through time, these data can be seen as multivariate time series, where the value of a system metric at a given time is related to its previous value. Although various techniques exist for leveraging this autocorrelation, they are often either simplistic (e.g., sliding-window), or too complex (e.g., Long-Short Term Memory (LSTM)). In this paper we propose the use of numerical differentiation, computing the first and second derivative, as a means to extract information concerning the underlying function of each system metric to support the development of predictive models for OFP. We conduct a comprehensive case using a Linux failure dataset that was generated through fault injection. Results suggest that numerical differentiation can be a promising approach to improve the performance of Machine Learning (ML) models for dependability-related problems with similar sequential characteristics (e.g., intrusion detection).

References

  1. Nesreen K. Ahmed, Amir F. Atiya, Neamat El Gayar, and Hisham El-Shishiny. 2010. An empirical comparison of machine learning models for time series forecasting. Econometric Reviews 29, 5 (2010), 594–621. https://doi.org/10.1080/07474938.2010.481556Google ScholarGoogle ScholarCross RefCross Ref
  2. Landwehr Carl Algirdas Avižienis, Laprie Jean-Claude, Randell Brian. 2004. Basic Concepts and Taxonomy of Dependable and Secure Computing. IEEE Trans. Depend. Sec. Comput. 1, 1 (2004), 11–33. https://doi.org/10.1109/TDSC.2004.2Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ethem Alpaydin. 2014. Introduction to Machine Learning, 3rd ed., ser. Adaptive Computation and Machine Learning. The MIT Press.Google ScholarGoogle Scholar
  4. Gianluca Bontempi, Souhaib Ben Taieb, and Yann Aël Le Borgne. 2013. Machine learning strategies for time series forecasting. Lecture Notes in Business Information Processing 138 LNBIP (2013), 62–77. https://doi.org/10.1007/978-3-642-36318-4_3 arxiv:z0037Google ScholarGoogle ScholarCross RefCross Ref
  5. Ben Brown. 2023. Facebook’s Catastrophic Blackout Could Cost $90 Million in Lost Revenue. https://www.ccn.com/facebooks-blackout-90-million-lost-revenue/ Accessed 2023-05-24.Google ScholarGoogle Scholar
  6. João R Campos and Ernesto Costa. 2020. Fault Injection to Generate Failure Data for Failure Prediction: A Case Study. In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE, 115–126.Google ScholarGoogle ScholarCross RefCross Ref
  7. João R Campos, Ernesto Costa, and Marco Vieira. 2022. A Dataset of Linux Failure Data for Dependability Evaluation and Improvement. In 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 88–95.Google ScholarGoogle ScholarCross RefCross Ref
  8. João R Campos, Ernesto Costa, and Marco Vieira. 2022. On the Applicability of Machine Learning-based Online Failure Prediction for Modern Complex Systems. In 2022 18th European Dependable Computing Conference (EDCC). IEEE, 49–56.Google ScholarGoogle Scholar
  9. João R Campos, Ernesto Costa, and Marco Vieira. 2022. Online Failure Prediction for Complex Systems: Methodology and Case Studies. IEEE Transactions on Dependable and Secure Computing (2022).Google ScholarGoogle Scholar
  10. João R Campos, Marco Vieira, and Ernesto Costa. 2019. Propheticus: Machine learning framework for the development of predictive models for reliable and secure software. In 30th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 173–182.Google ScholarGoogle ScholarCross RefCross Ref
  11. Xin Chen, Charng-Da Lu, and Karthik Pattabiraman. 2014. Failure prediction of jobs in compute clouds: A google cluster case study. In 2014 IEEE International Symposium on Software Reliability Engineering Workshops. IEEE, 341–346.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jan G. De Gooijer and Rob J. Hyndman. 2006. 25 Years of Time Series Forecasting. International Journal of Forecasting 22, 3 (2006), 443–473. https://doi.org/10.1016/j.ijforecast.2006.01.001 arxiv:Rodgers, J. L., & Nicewander, W. A. (2008). Thirteen Ways to Look at the Correlation Coefficient, 42(1), 59–66.Google ScholarGoogle ScholarCross RefCross Ref
  13. J. P. Marques de Sá. 2001. Pattern recognition ; concepts, methods and applications. Springer. ISBN: 3540422978.Google ScholarGoogle Scholar
  14. Andy Field. 2013. Discovering Statistics Using IBM SPSS Statistics (4th ed.). Sage Publications Ltd.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Fisher. 2023. Boeing found another software bug on the 737 Max. http://www.engadget.com/2020-02-06-boeing-737-max-software-bug.html Accessed 2023-05-24.Google ScholarGoogle Scholar
  16. T. Hastie, R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning. Springer, New York.Google ScholarGoogle Scholar
  17. Shilin He, Pinjia He, Zhuangbin Chen, Tianyi Yang, Yuxin Su, and Michael R Lyu. 2021. A survey on automated log analysis for reliability engineering. ACM Computing Surveys (CSUR) 54, 6 (2021), 1–37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Christian Herff and Dean J Krusienski. 2019. Extracting features from time series. Fundamentals of Clinical Data Science (2019), 85–100.Google ScholarGoogle Scholar
  19. Ivano Irrera and Marco Vieira. 2015. Towards assessing representativeness of fault injection-generated failure data for online failure prediction. In 2015 IEEE International Conference on Dependable Systems and Networks Workshops. IEEE, 75–80.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mohammad Jassas and Qusay H Mahmoud. 2018. Failure analysis and characterization of scheduling jobs in google cluster trace. In IECON 2018-44th Annual Conference of the IEEE Industrial Electronics Society. IEEE, 3102–3107.Google ScholarGoogle ScholarCross RefCross Ref
  21. Mohammad S Jassas and Qusay H Mahmoud. 2020. Evaluation of a Failure Prediction Model for Large Scale Cloud Applications. In Canadian Conference on Artificial Intelligence. Springer, 321–327.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Aziliz Le Glaz, Yannis Haralambous, Deok-Hee Kim-Dufor, Philippe Lenca, Romain Billot, Taylor C Ryan, Jonathan Marsh, Jordan Devylder, Michel Walter, Sofian Berrouiguet, 2021. Machine learning and natural language processing in mental health: Systematic review. Journal of Medical Internet Research 23, 5 (2021), e15708.Google ScholarGoogle ScholarCross RefCross Ref
  23. Qingwei Lin, Tianci Li, Pu Zhao, Yudong Liu, Minghua Ma, Lingling Zheng, Murali Chintalapati, Bo Liu, Paul Wang, Hongyu Zhang, 2023. EDITS: An Easy-to-difficult Training Strategy for Cloud Failure Prediction. In Companion Proceedings of the ACM Web Conference 2023. 371–375.Google ScholarGoogle Scholar
  24. Gabriel Resende Machado, Eugênio Silva, and Ronaldo Ribeiro Goldschmidt. 2021. Adversarial Machine Learning in Image Classification: A Survey Toward the Defender’s Perspective. ACM Computing Surveys (CSUR) 55, 1 (2021), 1–38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Miquel Martinez, Juan Carlos Ruiz, Nuno Antunes, David De Andres, and Marco Vieira. 2020. A Multi-criteria Analysis of Benchmark Results With Expert Support for Security Tools. IEEE Transactions on Dependable and Secure Computing (2020).Google ScholarGoogle Scholar
  26. Ram K Mazumder, Abdullahi M Salman, and Yue Li. 2021. Failure risk analysis of pipelines using data-driven machine learning algorithms. Structural Safety 89 (2021), 102047.Google ScholarGoogle ScholarCross RefCross Ref
  27. Morgan McFall-Johnsen. 2023. Catastrophic software errors doomed Boeing’s airplanes. https://www.businessinsider.com/boeing-software-errors-jeopardized-starliner-spaceship-737-max-planes-2020-2 Accessed 2023-05-24.Google ScholarGoogle Scholar
  28. Roberto Natella, Domenico Cotroneo, João Durães, and Henrique Madeira. 2010. Representativeness analysis of injected software faults in complex software. In Proceedings of the International Conference on Dependable Systems and Networks. 437–446. https://doi.org/10.1109/DSN.2010.5544282Google ScholarGoogle ScholarCross RefCross Ref
  29. Netdata. n.d.. Netdata. https://www.netdata.cloud/ Accessed 2023-05-01.Google ScholarGoogle Scholar
  30. P. Nunes, I. Medeiros, J. C. Fonseca, N. Neves, M. Correia, and M. Vieira. 2018. Benchmarking Static Analysis Tools for Web Security. IEEE Trans. Rel. 67, 3 (Sep. 2018), 1159–1175. https://doi.org/10.1109/TR.2018.2839339Google ScholarGoogle ScholarCross RefCross Ref
  31. Ajoy K Palit and Dobrivoje Popovic. 2006. Computational intelligence in time series forecasting: theory and engineering applications. Springer Science & Business Media.Google ScholarGoogle Scholar
  32. Yashwant Singh Patel and Jatin Bedi. 2023. MAG-D: A multivariate attention network based approach for cloud workload forecasting. Future Generation Computer Systems (2023).Google ScholarGoogle Scholar
  33. Teerat Pitakrat, Jonas Grunert, Oliver Kabierschke, Fabian Keller, and André Van Hoorn. 2014. A framework for system event classification and prediction by means of machine learning. In Proceedings of the 8th International Conference on Performance Evaluation Methodologies and Tools. 173–180.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Teerat Pitakrat, Dušan Okanović, André van Hoorn, and Lars Grunske. 2018. Hora: Architecture-aware online failure prediction. Journal of Systems and Software 137 (2018), 669–685.Google ScholarGoogle ScholarCross RefCross Ref
  35. David MW Powers. 2020. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061 (2020).Google ScholarGoogle Scholar
  36. Alfio Quarteroni, Fausto Saleri, and Paola Gervasio. 2016. Scientific Computing with MATLAB and Octave. Springer Publishing Company, Incorporated.Google ScholarGoogle Scholar
  37. Alicia Robles-Velasco, Pablo Cortés, Jesús Muñuzuri, and Bernard De Baets. 2023. Prediction of pipe failures in water supply networks for longer time periods through multi-label classification. Expert Systems with Applications 213 (2023), 119050.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Stuart Russell and Peter Norvig. 2021. Artificial Intelligence: A Modern Approach, Global Edition (4 ed.). Pearson.Google ScholarGoogle Scholar
  39. Felix Salfner, Maren Lenk, and Miroslaw Malek. 2010. A survey of online failure prediction methods. ACM Computing Surveys (CSUR) 42, 3 (2010), 10:1–10:42. https://doi.org/10.1145/1670679.1670680Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Hyungjun Seo, Jaechun No, and Sung-soon Park. 2023. ml-SFP: System Failure Prediction Method Based on Machine Learning. In Intelligent Sustainable Systems: Selected Papers of WorldS4 2022, Volume 2. Springer, 195–203.Google ScholarGoogle Scholar
  41. Ubuntu. n.d.. stress-ng. https://manpages.ubuntu.com/manpages/artful/man1/stress-ng.1.html Accessed 2023-05-15.Google ScholarGoogle Scholar
  42. Usenix and Carnegie Mellon University. n.d.. Computer Failure Data Repository. https://www.usenix.org/cfdr. Accessed 2023-05-01.Google ScholarGoogle Scholar
  43. Juan Manuel Vilar. 2009. Classifying Time Series Data : A Nonparametric Approach. Journal of Classification 8, April (2009), 3–28. https://doi.org/10.1007/s00357-00Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Pin Wang, En Fan, and Peng Wang. 2021. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recognition Letters 141 (2021), 61–67.Google ScholarGoogle ScholarCross RefCross Ref
  45. Zhilong Wang, Min Zhang, Danshi Wang, Chuang Song, Min Liu, Jin Li, Liqi Lou, and Zhuo Liu. 2017. Failure prediction using machine learning and time series in optical network. Optics Express 25, 16 (2017), 18553–18565.Google ScholarGoogle ScholarCross RefCross Ref
  46. Ji Zhang, Ke Zhou, Ping Huang, Xubin He, Ming Xie, Bin Cheng, Yongguang Ji, and Yin hu Wang. 2020. Minority Disk Failure Prediction based on Transfer Learning in Large Data Centers of Heterogeneous Disk Systems. IEEE Transactions on Parallel and Distributed Systems (2020).Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Leveraging Time Series Autocorrelation Through Numerical Differentiation for Improving Failure Prediction

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              LADC '23: Proceedings of the 12th Latin-American Symposium on Dependable and Secure Computing
              October 2023
              242 pages
              ISBN:9798400708442
              DOI:10.1145/3615366

              Copyright © 2023 Owner/Author

              This work is licensed under a Creative Commons Attribution International 4.0 License.

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 17 October 2023

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited
            • Article Metrics

              • Downloads (Last 12 months)97
              • Downloads (Last 6 weeks)20

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format