An Empirical Exploratory Analysis of Failure Sequences in a Commodity Operating System

  • Caio Augusto Rodrigues dos Santos UFU
  • Rivalino Matias Jr. UFU
  • Kishor Trivedi Duke University

Resumo


A fundamental aspect of software reliability engineering is to comprehend how software systems fail, which means understand the dynamics that govern the different types of failure manifestations. In this paper, we present an exploratory study on multiple-event failures, looking for systematic patterns of sequences of failures in logs of a commodity operating system. This study is based on real failure data collected from hundreds of computers. The major contribution of this work is the method proposed to discover patterns of failure sequences and their attributes, which is generic enough to be applied to any other software systems, with minor changes. The empirical findings of this study include 153 different patterns of OS failure sequences discovered, along with statistical analyses of their properties.

Palavras-chave: Fault Tolerance and Dependability

Referências

B. Schroeder G. Gibson "A large-scale study of failures in high-performance computing systems" Proc. of the Int’l Conference on Dependable Systems and Networks pp. 249-258 2006.

A. Ganapathi V. Ganapathi D. Patterson "Windows XP kernel crash analysis" Proc. of the Conference on Large Installation System Administration pp. 149-159 2006.

M. M. Swift B. N. Bershad H. M. Levy "Improving the reliability of commodity operating systems" Proc. of the ACM Symposium on Operating Systems Principles pp. 207-222 2003.

P. L. Li M. Ni S. Xue J. P. Mullally M. Garzia M. Khambatti "Reliability assessment of Mass-Market software: insights from Windows Vista" Proc. of the Int’l Symp. on Software Reliability Engineering pp. 265-270 2008.

A. Avižienis J.-C. Laprie B. Randell C. Landwehr "Basic Concepts and Taxonomy of Dependable and Secure Computing" IEEE Transactions on Dependable and Secure Computing vol. 1 pp. 11-33 October 2004.

J. Xu Z. Kalbarczyk R. Iyer "Networked Windows NT system field failure data analysis" Proc. of Pacific Rim International Symp. on Dependable Computing pp. 178-185 1999.

R. Matias G. Oliveira L. Araujo "Operating system reliability from the quality of experience viewpoint: an exploratory study" Proc. of the ACM Symp. on Applied Comp. pp. 1644-1649 2013.

C.A.R. Dos Santos M. Antunes R. Matias L. Assunção V. Maciel "Reliability Assessment of Commercial Off-the-Shelf Operating System Software: An Empirical Study" Proc. of the Brazilian Symp. on Computing Systems Engineering 2018.

Microsoft Security Essentials [online] Available: https://www.microsoft.com/en-us/download/details.aspx?id=5201.

M. Golemati A. Katifori E. Giannopoulou I. Daradimos C. Vassilakis "Evaluating the significance of the Windows Explorer visualization in personal information management browsing tasks" Proc. of Int’l Conf. on Information Visualization pp. 93-100 2007.

Windows Error Reporting [online] Available: http://msdn.microsoft.com/en-us/library/bb513613(v=vs.85).aspx.

Windows Installer [online] Available: http://msdn.microsoft.com/en-us/library/cc185688%28VS.85%29.aspx.

"Reliability analysis component" [online] Available: http://technet.microsoft.com/en-us/library/cc774636(v=ws.10).aspx.

Desktop Operating System Market Share 2019 [online] Available: https://netmarketshare.com/operating/system-market-share.aspx?id=platformsDesktopVersions.

Win_32 ReliabilityRecord class [online] Available: https://msdn.microsoft.com/en-us/library/windows/desktop/ee706630%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396.

Survey on OS Failures [online] Available: http://hpdcs.facom.ufu.br/osr-team/index.php.

M. Russinovich D. A. Solomon A. Ionescu Microsoft Windows Internals Microsoft Press 2009.

Windows update showing error 80070643 [online] Available: http://answers.microsoft.com/en-us/insider/forum/insider_wintp-insider_update/windows-update-showing-error-80070643/0e53bf0f-8843-45a1-b3c4-0940c516c8d9.

Windows 7 Update Problems - Code 800B0100 [online] Available: https://answers.microsoft.com/en-us/windows/forum/all/windows-7-update-problems-code-800b0100/c2d0f18b-dbef-455d-a32e-b730ebd2370f.

Windows Update error 80242016 [online] Available: https://docs.microsoft.com/en-us/windows/deployment/update/windows-update-error-reference.

Windows Update 0x80073712 [online] Available: https://answers.microsoft.com/en-us/windows/forum/all/windows-update-0x80073712/143c3cde-9eec-4655-a4b7-eda258edb5b5.

K. S. Trivedi Probability and Statistics with Reliability Queuing and Computer Science Applications Wiley vol. 1 2001.

K. Goseva-Popstojanova K.S. Trivedi "Failure Correlation in Software Reliability Models" IEEE Trans. On Reliability vol. 49 pp. 37-48.

K. Goseva-Popstojanova K.S. Trivedi "The Effects of Failure Correlation on Software Reliability and Performability" Proc. of the Int’l Symp. Fault Tolerant Computing pp. 45-46 1999.

K. Goseva-Popstojanova K.S. Trivedi "Effects of Failure Correlation on Software in Operation" Proc. Pacific Rim International Symposium on Dependable Computing pp. 69-76 2000.

A.P. Nikora M.R. Lyu M.R. Lyu "Software Reliability Measurement Experience" in Handbook of Software Reliability Engineering McGraw-Hill 1996.

C.A.R. Dos Santos R. Matias "Failure Patterns in Operating Systems: An Exploratory and Observational Study" Elsevier Journal of Systems and Software vol. 137 pp. 512-530 March 2018.
Publicado
19/11/2019
Como Citar

Selecione um Formato
DOS SANTOS, Caio Augusto Rodrigues; MATIAS JR., Rivalino; TRIVEDI, Kishor . An Empirical Exploratory Analysis of Failure Sequences in a Commodity Operating System. In: SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SISTEMAS COMPUTACIONAIS (SBESC), 9. , 2019, Natal. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 185-192. ISSN 2237-5430.