Reliability Assessment of Commercial Off-the-shelf Operating System Software: An Empirical Study

Caio Augusto Rodrigues dos Santos; Marcela Antunes; Rivalino Matias Jr; Lucas Miranda Assunção; Vinicius Maciel

Caio Augusto Rodrigues dos Santos UFU
Marcela Antunes UFU
Rivalino Matias Jr UFU
Lucas Miranda Assunção UFU
Vinicius Maciel UFU

Resumo

According to the literature, the main cause of failures in computer systems are defects in software. Surveying the research works in software reliability, we observe that studies in reliability of operating system (OS) software are not abundant. Note that a computer system with highly reliable hardware and applications may not be dependable enough if its OS does not show an equivalent level of reliability. This paper presents an empirical study on the reliability of a commercial off-the-shelf OS software. We analyzed 5,351 records of real OS failures, collected from different computers and workplace environments. Based on the more frequent OS failures observed, we created reliability stochastic models to assess the sensitivity of the OS reliability with respect to these failures. The empirical evidence and analytical results show that the OS service of software update had the highest reliability importance, i.e., this is the OS component investigated whose improvement should be prioritized to have the greatest improvement on the OS reliability as a whole.

Palavras-chave: Operating systems, failures, reliability modeling, empirical study

Referências

N. G. Leveson, and C. S. Turner, “An investigation of the Therac-25 accidents,” in Computer, vol. 26, pp. 18-41, July 1993.

Z. Li, L. Tan, X. Wang, S. Lu, Y. Zhou, and C. Zhai, “Have things changed now? An empirical study of bug characteristics in modern open source software,” in Proc. of the 1st Workshop on Architectural and System Support for Improving Software Dependability (ASID’06), 2006, pp. 25-33.

M. Sullivan, and R. Chillarege, “Software defects and their impact on system availability - a study of field failures in operating systems,” in Proc. of the 21th Int’l Symposium on Fault-Tolerant Computing (FTCS’91), 1991, pp. 2-9.

M. R. Lyu, “Software reliability engineering: a roadmap,” in Proc. of the Future of Software Engineering (FOSE’07), 2007, pp.153-170.

J. Xavier, A. Macedo, R. Matias, and L. Araujo, “A survey on research in software reliability engineering in the last decade,” in Proc. of the 29th ACM Symposium on Applied Computing (SAC’14), 2014, pp. 1190-1191.

A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler, “An empirical study of operating systems errors,” Proc. of the 18th ACM Symposium on Operating Systems Principles (SOSP’01), 2001, pp. 73-88.

A. Ganapathi, and D. Patterson, “Crash data collection: a Windows case study,” in Proc. of the Int’l Conference on Dependable Systems and Networks (DSN’05), 2005, pp. 280-285.

A. Ganapathi, V. Ganapathi and D. Patterson, “Windows XP kernel crash analysis,” in Proc. of the 20th Conference on Large Installation System Administration (LISA’06), 2006, pp. 149-159.

M. M. Swift, B. N. Bershad, and H. M. Levy, “Improving the reliability of commodity operating systems,” in Proc. of the 19th ACM Symposium on Operating Systems Principles (SOSP’03), 2003, pp.207-222.

P. L. Li, M. Ni, S. Xue, J. P. Mullally, M. Garzia, and M. Khambatti, “Reliability assessment of Mass-Market software: insights from Windows Vista,” in Proc. of the 19th Int’l Symp. on Software Reliability Engineering (ISSRE’08), 2008, pp. 265-270.

R. Matias, G. Oliveira, and L. Araujo, “Operating system reliability from the quality of experience viewpoint: an exploratory study,” in Proc. of the 28th ACM Symposium on Applied Computing (SAC’13), 2013, pp.1644-1649.

R. Matias, M. Antunes, L. Araujo, C. Sousa, and L. Henrique, “An empirical exploratory study on operating system reliability,” in Proc. of the 29th ACM Symposium on Applied Computing (SAC’14), 2014, pp. 1523-1528.

C.A.R. Dos Santos, and R. Matias, “Failure patterns in operating systems: An exploratory and observational study,” The Journal of Systems and Software, vol. 137, pp. 512-530, April 2017.

NetMarketShare, "Operating System Share by Version," [Online]. Available: https://netmarketshare.com/operating-system-market- share.aspx?id=platformsDesktopVersions. [Accessed Jun. 29, 2018].

H. Akaike, “A new look at the statistical model identification,” IEEE Transactions on Automatic Control, vol. 19, pp. 716–723, Dec. 1974.

R. Billinton, and R. N. Allan, Reliability Evaluation of Engineering Systems: Concepts and Techniques, 1st ed., vol. 1. Springer US, 1983.

K. S. Trivedi, Probability and Statistics with Reliability, Queuing, and Computer Science Applications, 2nd ed., vol. 1. John Wiley and Sons, 2001.

DHAAL, “SHARPE Portal,” [Online]. Available: http://sharpe.pratt.duke.edu. [Accessed Feb. 11, 2018].

K. S. Trivedi, and R. Sahner, “SHARPE at the age of twenty two,” ACM SIGMETRICS Performance Evaluation Review, vol. 36, pp. 52-57, March 2009.

M. Russinovich, D. A. Solomon, and A. Ionescu, Microsoft Windows Internals , 4th ed., vol. 1. Microsoft Press, 2009.

Microsoft, “Did system uptime error cause chkdisk?,” [Online]. Available: http://answers.microsoft.com/en-us/windows/forum/all/did -system-uptime-error-cause-chkdisk/8d55525c-9f5b-4278-bd87- 1598bce009ee. [Accessed Feb. 11, 2018].

B. De Smet, C# 5.0 Unleashed, 1st ed., vol. 1. Sams Publishing, 2013.

Microsoft, "Assemblies in the Common Language Runtime," [Online]. Available: https://msdn.microsoft.com/en-us/library/ hk5f40ct(v=vs.90).aspx. [Accessed Feb. 11, 2018].

Microsoft, “Dynamic-Link Libraries,” [Online]. Available: http://msdn.microsoft.com/en-us/library/ms682589.aspx. [Accessed Feb. 11, 2018].

L. M. Leemis, Reliability - Probabilistic Models and Statistical Methods, 1st ed., vol. 1. Prentice Hall, 1995.