research-article

Open Access

Leveraging Time Series Autocorrelation Through Numerical Differentiation for Improving Failure Prediction

Authors:
João R. Campos

University of Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Portugal

University of Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Portugal

0000-0002-4623-764X
View Profile

,
Rodrigo Machado

University of Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Portugal

University of Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Portugal

0000-0002-5409-6242
View Profile

,
Marco Vieira

University of Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Portugal

University of Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Portugal

0000-0001-5103-8541
View Profile

LADC '23: Proceedings of the 12th Latin-American Symposium on Dependable and Secure ComputingOctober 2023Pages 70–79https://doi.org/10.1145/3615366.3615423

Published:17 October 2023Publication History

LADC '23: Proceedings of the 12th Latin-American Symposium on Dependable and Secure Computing

Pages 70–79

ABSTRACT

Given the complexity of modern software systems, it is no longer possible to detect every fault before deployment. Such faults can eventually lead to failures at runtime, compromising the business process and causing significant risk or losses. Online Failure Prediction (OFP) is a complementary fault-tolerance technique that tries to predict failures in the near future, by using past data and the current state of the system. However, modern systems are comprised of many components and thus a proper characterization of its state requires hundreds of system metrics. As the system evolves through time, these data can be seen as multivariate time series, where the value of a system metric at a given time is related to its previous value. Although various techniques exist for leveraging this autocorrelation, they are often either simplistic (e.g., sliding-window), or too complex (e.g., Long-Short Term Memory (LSTM)). In this paper we propose the use of numerical differentiation, computing the first and second derivative, as a means to extract information concerning the underlying function of each system metric to support the development of predictive models for OFP. We conduct a comprehensive case using a Linux failure dataset that was generated through fault injection. Results suggest that numerical differentiation can be a promising approach to improve the performance of Machine Learning (ML) models for dependability-related problems with similar sequential characteristics (e.g., intrusion detection).

References

Nesreen K. Ahmed, Amir F. Atiya, Neamat El Gayar, and Hisham El-Shishiny. 2010. An empirical comparison of machine learning models for time series forecasting. Econometric Reviews 29, 5 (2010), 594–621. https://doi.org/10.1080/07474938.2010.481556Google ScholarCross Ref
Landwehr Carl Algirdas Avižienis, Laprie Jean-Claude, Randell Brian. 2004. Basic Concepts and Taxonomy of Dependable and Secure Computing. IEEE Trans. Depend. Sec. Comput. 1, 1 (2004), 11–33. https://doi.org/10.1109/TDSC.2004.2Google ScholarDigital Library
Ethem Alpaydin. 2014. Introduction to Machine Learning, 3rd ed., ser. Adaptive Computation and Machine Learning. The MIT Press.Google Scholar
Gianluca Bontempi, Souhaib Ben Taieb, and Yann Aël Le Borgne. 2013. Machine learning strategies for time series forecasting. Lecture Notes in Business Information Processing 138 LNBIP (2013), 62–77. https://doi.org/10.1007/978-3-642-36318-4_3 arxiv:z0037Google ScholarCross Ref
Ben Brown. 2023. Facebook’s Catastrophic Blackout Could Cost $90 Million in Lost Revenue. https://www.ccn.com/facebooks-blackout-90-million-lost-revenue/ Accessed 2023-05-24.Google Scholar
João R Campos and Ernesto Costa. 2020. Fault Injection to Generate Failure Data for Failure Prediction: A Case Study. In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE, 115–126.Google ScholarCross Ref
João R Campos, Ernesto Costa, and Marco Vieira. 2022. A Dataset of Linux Failure Data for Dependability Evaluation and Improvement. In 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 88–95.Google ScholarCross Ref
João R Campos, Ernesto Costa, and Marco Vieira. 2022. On the Applicability of Machine Learning-based Online Failure Prediction for Modern Complex Systems. In 2022 18th European Dependable Computing Conference (EDCC). IEEE, 49–56.Google Scholar
João R Campos, Ernesto Costa, and Marco Vieira. 2022. Online Failure Prediction for Complex Systems: Methodology and Case Studies. IEEE Transactions on Dependable and Secure Computing (2022).Google Scholar
João R Campos, Marco Vieira, and Ernesto Costa. 2019. Propheticus: Machine learning framework for the development of predictive models for reliable and secure software. In 30th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 173–182.Google ScholarCross Ref
Xin Chen, Charng-Da Lu, and Karthik Pattabiraman. 2014. Failure prediction of jobs in compute clouds: A google cluster case study. In 2014 IEEE International Symposium on Software Reliability Engineering Workshops. IEEE, 341–346.Google ScholarDigital Library
Jan G. De Gooijer and Rob J. Hyndman. 2006. 25 Years of Time Series Forecasting. International Journal of Forecasting 22, 3 (2006), 443–473. https://doi.org/10.1016/j.ijforecast.2006.01.001 arxiv:Rodgers, J. L., & Nicewander, W. A. (2008). Thirteen Ways to Look at the Correlation Coefficient, 42(1), 59–66.Google ScholarCross Ref
J. P. Marques de Sá. 2001. Pattern recognition ; concepts, methods and applications. Springer. ISBN: 3540422978.Google Scholar
Andy Field. 2013. Discovering Statistics Using IBM SPSS Statistics (4th ed.). Sage Publications Ltd.Google ScholarDigital Library
C. Fisher. 2023. Boeing found another software bug on the 737 Max. http://www.engadget.com/2020-02-06-boeing-737-max-software-bug.html Accessed 2023-05-24.Google Scholar
T. Hastie, R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning. Springer, New York.Google Scholar
Shilin He, Pinjia He, Zhuangbin Chen, Tianyi Yang, Yuxin Su, and Michael R Lyu. 2021. A survey on automated log analysis for reliability engineering. ACM Computing Surveys (CSUR) 54, 6 (2021), 1–37.Google ScholarDigital Library
Christian Herff and Dean J Krusienski. 2019. Extracting features from time series. Fundamentals of Clinical Data Science (2019), 85–100.Google Scholar
Ivano Irrera and Marco Vieira. 2015. Towards assessing representativeness of fault injection-generated failure data for online failure prediction. In 2015 IEEE International Conference on Dependable Systems and Networks Workshops. IEEE, 75–80.Google ScholarDigital Library
Mohammad Jassas and Qusay H Mahmoud. 2018. Failure analysis and characterization of scheduling jobs in google cluster trace. In IECON 2018-44th Annual Conference of the IEEE Industrial Electronics Society. IEEE, 3102–3107.Google ScholarCross Ref
Mohammad S Jassas and Qusay H Mahmoud. 2020. Evaluation of a Failure Prediction Model for Large Scale Cloud Applications. In Canadian Conference on Artificial Intelligence. Springer, 321–327.Google ScholarDigital Library
Aziliz Le Glaz, Yannis Haralambous, Deok-Hee Kim-Dufor, Philippe Lenca, Romain Billot, Taylor C Ryan, Jonathan Marsh, Jordan Devylder, Michel Walter, Sofian Berrouiguet, 2021. Machine learning and natural language processing in mental health: Systematic review. Journal of Medical Internet Research 23, 5 (2021), e15708.Google ScholarCross Ref
Qingwei Lin, Tianci Li, Pu Zhao, Yudong Liu, Minghua Ma, Lingling Zheng, Murali Chintalapati, Bo Liu, Paul Wang, Hongyu Zhang, 2023. EDITS: An Easy-to-difficult Training Strategy for Cloud Failure Prediction. In Companion Proceedings of the ACM Web Conference 2023. 371–375.Google Scholar
Gabriel Resende Machado, Eugênio Silva, and Ronaldo Ribeiro Goldschmidt. 2021. Adversarial Machine Learning in Image Classification: A Survey Toward the Defender’s Perspective. ACM Computing Surveys (CSUR) 55, 1 (2021), 1–38.Google ScholarDigital Library
Miquel Martinez, Juan Carlos Ruiz, Nuno Antunes, David De Andres, and Marco Vieira. 2020. A Multi-criteria Analysis of Benchmark Results With Expert Support for Security Tools. IEEE Transactions on Dependable and Secure Computing (2020).Google Scholar
Ram K Mazumder, Abdullahi M Salman, and Yue Li. 2021. Failure risk analysis of pipelines using data-driven machine learning algorithms. Structural Safety 89 (2021), 102047.Google ScholarCross Ref
Morgan McFall-Johnsen. 2023. Catastrophic software errors doomed Boeing’s airplanes. https://www.businessinsider.com/boeing-software-errors-jeopardized-starliner-spaceship-737-max-planes-2020-2 Accessed 2023-05-24.Google Scholar
Roberto Natella, Domenico Cotroneo, João Durães, and Henrique Madeira. 2010. Representativeness analysis of injected software faults in complex software. In Proceedings of the International Conference on Dependable Systems and Networks. 437–446. https://doi.org/10.1109/DSN.2010.5544282Google ScholarCross Ref
Netdata. n.d.. Netdata. https://www.netdata.cloud/ Accessed 2023-05-01.Google Scholar
P. Nunes, I. Medeiros, J. C. Fonseca, N. Neves, M. Correia, and M. Vieira. 2018. Benchmarking Static Analysis Tools for Web Security. IEEE Trans. Rel. 67, 3 (Sep. 2018), 1159–1175. https://doi.org/10.1109/TR.2018.2839339Google ScholarCross Ref
Ajoy K Palit and Dobrivoje Popovic. 2006. Computational intelligence in time series forecasting: theory and engineering applications. Springer Science & Business Media.Google Scholar
Yashwant Singh Patel and Jatin Bedi. 2023. MAG-D: A multivariate attention network based approach for cloud workload forecasting. Future Generation Computer Systems (2023).Google Scholar
Teerat Pitakrat, Jonas Grunert, Oliver Kabierschke, Fabian Keller, and André Van Hoorn. 2014. A framework for system event classification and prediction by means of machine learning. In Proceedings of the 8th International Conference on Performance Evaluation Methodologies and Tools. 173–180.Google ScholarDigital Library
Teerat Pitakrat, Dušan Okanović, André van Hoorn, and Lars Grunske. 2018. Hora: Architecture-aware online failure prediction. Journal of Systems and Software 137 (2018), 669–685.Google ScholarCross Ref
David MW Powers. 2020. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061 (2020).Google Scholar
Alfio Quarteroni, Fausto Saleri, and Paola Gervasio. 2016. Scientific Computing with MATLAB and Octave. Springer Publishing Company, Incorporated.Google Scholar
Alicia Robles-Velasco, Pablo Cortés, Jesús Muñuzuri, and Bernard De Baets. 2023. Prediction of pipe failures in water supply networks for longer time periods through multi-label classification. Expert Systems with Applications 213 (2023), 119050.Google ScholarDigital Library
Stuart Russell and Peter Norvig. 2021. Artificial Intelligence: A Modern Approach, Global Edition (4 ed.). Pearson.Google Scholar
Felix Salfner, Maren Lenk, and Miroslaw Malek. 2010. A survey of online failure prediction methods. ACM Computing Surveys (CSUR) 42, 3 (2010), 10:1–10:42. https://doi.org/10.1145/1670679.1670680Google ScholarDigital Library
Hyungjun Seo, Jaechun No, and Sung-soon Park. 2023. ml-SFP: System Failure Prediction Method Based on Machine Learning. In Intelligent Sustainable Systems: Selected Papers of WorldS4 2022, Volume 2. Springer, 195–203.Google Scholar
Ubuntu. n.d.. stress-ng. https://manpages.ubuntu.com/manpages/artful/man1/stress-ng.1.html Accessed 2023-05-15.Google Scholar
Usenix and Carnegie Mellon University. n.d.. Computer Failure Data Repository. https://www.usenix.org/cfdr. Accessed 2023-05-01.Google Scholar
Juan Manuel Vilar. 2009. Classifying Time Series Data : A Nonparametric Approach. Journal of Classification 8, April (2009), 3–28. https://doi.org/10.1007/s00357-00Google ScholarDigital Library
Pin Wang, En Fan, and Peng Wang. 2021. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recognition Letters 141 (2021), 61–67.Google ScholarCross Ref
Zhilong Wang, Min Zhang, Danshi Wang, Chuang Song, Min Liu, Jin Li, Liqi Lou, and Zhuo Liu. 2017. Failure prediction using machine learning and time series in optical network. Optics Express 25, 16 (2017), 18553–18565.Google ScholarCross Ref
Ji Zhang, Ke Zhou, Ping Huang, Xubin He, Ming Xie, Bin Cheng, Yongguang Ji, and Yin hu Wang. 2020. Minority Disk Failure Prediction based on Transfer Learning in Large Data Centers of Heterogeneous Disk Systems. IEEE Transactions on Parallel and Distributed Systems (2020).Google ScholarCross Ref

Index Terms

Leveraging Time Series Autocorrelation Through Numerical Differentiation for Improving Failure Prediction

Recommendations

Towards Assessing Representativeness of Fault Injection-Generated Failure Data for Online Failure Prediction
DSN-W '15: Proceedings of the 2015 IEEE International Conference on Dependable Systems and Networks Workshops

Online Failure Prediction allows improving system dependability by foreseeing incoming failures at runtime, enabling mitigation actions to be taken in advance, though prediction systems' learning and assessing is hard due to the scarcity of failure ...
Read More
Adaptive Failure Prediction for Computer Systems: A Framework and a Case Study
HASE '15: Proceedings of the 2015 IEEE 16th International Symposium on High Assurance Systems Engineering

Online Failure Prediction allows improving system dependability by foreseeing incoming failures at runtime, enabling mitigation actions to be taken in advance. Despite advances in the last years, Online Failure Prediction is still not adopted due to the ...
Read More
Increasing Dependability of Component-Based Software Systems by Online Failure Prediction (Short Paper)
EDCC '14: Proceedings of the 2014 Tenth European Dependable Computing Conference

Online failure prediction for large-scale software systems is a challenging task. One reason is the complex structure of many-partially inter-dependent-hardware and software components. State-of-the-art approaches use separate prediction models for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

LADC '23: Proceedings of the 12th Latin-American Symposium on Dependable and Secure Computing
October 2023
242 pages
ISBN:9798400708442
DOI:10.1145/3615366

Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2023
Check for updates
Author Tags
Dependability
Machine Learning
Numerical Differentiation
Online Failure Prediction
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 97
  Total Downloads
- Downloads (Last 12 months)97
- Downloads (Last 6 weeks)20
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Leveraging Time Series Autocorrelation Through Numerical Differentiation for Improving Failure Prediction

LADC '23: Proceedings of the 12th Latin-American Symposium on Dependable and Secure Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Towards Assessing Representativeness of Fault Injection-Generated Failure Data for Online Failure Prediction

Adaptive Failure Prediction for Computer Systems: A Framework and a Case Study

Increasing Dependability of Component-Based Software Systems by Online Failure Prediction (Short Paper)

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Leveraging Time Series Autocorrelation Through Numerical Differentiation for Improving Failure Prediction

LADC '23: Proceedings of the 12th Latin-American Symposium on Dependable and Secure Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Towards Assessing Representativeness of Fault Injection-Generated Failure Data for Online Failure Prediction

Adaptive Failure Prediction for Computer Systems: A Framework and a Case Study

Increasing Dependability of Component-Based Software Systems by Online Failure Prediction (Short Paper)

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media