Availability Evaluation and Maintenance Policy of Data Center Infrastructure

Kádna Camboim; Carlos Melo; Jean Araujo; Fernanda Alencar

doi:10.5753/sbesc_estendido.2020.13113

Kádna Camboim UFPE/UFAPE
Carlos Melo UFPE
Jean Araujo UFAPE
Fernanda Alencar UFPE

DOI: https://doi.org/10.5753/sbesc_estendido.2020.13113

Resumo

The convergence of communication networks and the demand for storage and processing capacities for large amounts of information, especially in recent years, has driven requests for everything-as-a-service and has been generating, on an increasing scale, demands for new data center constructions. However, to meet dependability attributes, the design of these infrastructures needs to consider, at least, the system’s availability to be achieved. In this paper, we evaluate the availability of a Tier 1 data center infrastructure, considering the use of blade systems. We use modeling techniques based on reliability block diagrams and stochastic Petri nets to simulate a maintenance policy encompassed at different service levels (SLA). The results show dependability metrics, focusing on the availability and maintenance of these networks. We highlight the most severe difficulties in achieving high availability when there is no component redundancy, and the intervals between maintenance are long.

Palavras-chave: Data center, availability evaluation, maintenance policy, blade server, service level agreements (SLA)

Referências

W. P. Turner IV, J. PE, P. Seader, and K. Brill, “Tier classification define site infrastructure performance,” Uptime Institute, vol. 17, 2006.

P. S. Marin, “Data centers- desvendando cada passo: conceitos, projeto, infraestrutura física e eficiência energética,” São Paulo: Érica, 2011.

W. E. Smith, K. S. Trivedi, L. A. Tomek, and J. Ackaret, “Availability analysis of blade server systems,” IBM Systems Journal, vol. 47, no. 4, pp. 621–640, 2008.

G. Callou, E. Sousa, P. Maciel, E. Tavares, C. Araujo, B. Silva, N. Rosa, M. Marwah, R. Sharma, A. Shahet al., “Impact analysis of maintenance policies on data center power infrastructure,” in 2010 IEEE international conference on systems, man and cybernetics. IEEE, 2010, pp. 526–533.

J. Ferreira, G. Callou, A. Josua, and P. Maciel, “Estimating the environmental impact of data centers,” in 2018 IEEE 17th International Symposium on Network Computing and Applications (NCA). IEEE, 2018, pp. 1–4.

G. Schulz, The green and virtual data center. CRC Press, 2016.

K. Leigh and P. Ranganathan, “Blades as a general-purpose infrastruc-ture for future system architectures: Challenges and solutions,” HP LabsTech. Rep. HPL-2006-182, 2007.

C. Heisinget al., “Ieee recommended practice for the design of reliable industrial and commercial power systems,” IEEE Inc., New York, 2007.

K. Trivedi, D. S. Kim, A. Roy, and D. Medhi, “Dependability and security models,” in Design of Reliable Communication Networks, 2009. DRCN 2009. 7th International Workshop on Design of Reliable Communication Networks, oct. 2009, pp. 11 –20.

W. Kuo and M. J. Zuo, Optimal reliability modeling: principles and applications. John Wiley & Sons, 2003.

P. R. M. Maciel, K. Trivedi, R. Mathias JR, and D. Kim, "Dependability modeling." In Performance and Dependability in Service Computing: Concepts, Techniques and Research Directions, pp. 53-97. IGI Global, 2012.

W. R. Blischke and D. N. P. Murthy, Case studies in reliability and maintenance/edited by Wallace R. Blischke, D.N. Prabhakar Murthy. John Wiley, Hoboken, NJ: 2003. [Online]. Available: http://www.loc.gov/catdir/toc/wiley031/2002191075.html

K. Trivedi and M. Malhotra, “Reliability and performability techniques and tools: A survey,” in Messung, Modellierung und Bewertung von Rechenund Kommunikation ssystemen, 1993, pp. 27 – 48.

M. Xie, K.-L. Poh, and Y.-S. Dai, Computing System Reliability: Models and Analysis. Springer, 2004.

Murata, Tadao. "Petri nets: Properties, analysis and applications." Proceedings of the IEEE 77, no. 4 (1989): 541-580.

R. Gaeta, and M. A. Marsan. "SWN analysis and simulation of large knockout ATM switches." In International Conference on Application and Theory of Petri Nets, pp. 326-344. Springer, Berlin, Heidelberg, 1998.

R. German, “Markov regenerative stochastic Petri nets with general execution policies: supplementary variable analysis and a prototype tool,” Performance Evaluation, vol. 39, no. 1-4, pp. 165–188, 2000.

D. J. Smith, Reliability, maintainability, and risk: practical methods for engineers. Butterworth-Heinemann, 2017.

Cisco 4000 Family Integrated Services Router Data Sheet, Cisco, 2020. [Online]. Available: https://www.cisco.com/c/en/us/products/collateral/routers/4000-series-integrated-services-routers-isr/data sheet-c78-732542.html

Mean Time Between Failures (MTBF), Extreme, 2020. [Online]. Available: https://www.extremenetworks.com/support/mean-time-between-failures/

Cisco Catalyst 4500-X Series Fixed 10 Gigabit Ethernet Aggregation Switch Data Sheet, Cisco, 2020. [Online]. Available: https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-4500-x-series-switches/data sheet c78-696791.htmldtid=osscdc000283