Byzantine Fault Tolerance in the Partitioned Synchronous System Model

Wellington Lacerda; Marco Antonio Ramos; Raimundo Jose de Araujo  Macedo

Wellington Lacerda União Metropolitana de Educação e Cultura
Marco Antonio Ramos Universidade Estadual do Sudoeste da Bahia
Raimundo Jose de Araujo Macedo UFBA

Resumo

Byzantine fault tolerance was introduced to implement systems capable of tolerating arbitrary component failures, where n - f replicated state machines maintain their state consistent despite the action of up to f arbitrarily or Byzantine faulty state machines, for n ≥ 3f + 1. This notion was extended later for tolerating malicious attacks or intrusions when some of the systems components get compromised by a malicious intruder or attacker. Due to its high costs in terms of minimum needed redundancy (minimum 3f + 1 replicas), and related latency, several authors have turned their attention to alternative approaches where faulty processes can be excluded from the system adapting the current quorum of operational processes. In this paper, we explore the Partitioned Synchronous distributed system model, which suits existing real systems, such as large computational grids and distributed industrial plants, to propose a new Byzantine failure detector and related consensus algorithm for such a model. From our simulations we show that our approach indeed improves resilience when compared to static quorum approaches, with the continuous detection of faulty processes.

Referências

L. Lamport, M. Pease, and R. Shostak, “The Byzantine Generals Problem,” ACM Transactions on Programming Languages and Systems, vol. 4, no. 3, pp. 382–401, 1982.

Q. Nguyen and A. Sood, “A comparison of intrusion-tolerant system architectures,” IEEE Security Privacy, vol. 9, no. 4, pp. 24–31, July 2011.

S. A. P. Kumar, B. Bhargava, R. Macêdo, and G. Mani, “Securing iot-based cyber-physical human systems against collaborative attacks,” in 2017 IEEE International Congress on Internet of Things (ICIOT), June 2017, pp. 9–16.

F. B. Schneider, “Implementing fault-tolerant services using the state machine approach: a tutorial,” ACM Comput. Surv., vol. 22, no. 4, 1990.

A. S. de Sá, A. E. Silva Freitas, and R. J. de Araújo Macêdo, “Adaptive request batching for byzantine replication,” ACM SIGOPS Operating Systems Review, vol. 47, no. 1, p. 35, 2013.

R. Guerraoui, N. Knezevic, V. Quéma, M. Vukolic, P.-l. Aublin, and I. Lyon, “The Next 700 BFT Protocols,” Proceedings of the 5th European Conference on Computer Systems, vol. 32, no. 4, pp. 363–376, 2010.

M. J. Fischer, N. A. Lynch, and M. S. Paterson, “Impossibility of Distributed Consensus with One Faulty Process,” Journal of the ACM, vol. 32, no. 2, pp. 374–382, 1985.

C. Dwork, N. Lynch, and L. Stockmeyer, “Consensus in the Presence of Partial Synchrony,” J. ACM, vol. 35, no. 2, pp. 288–323, 1988.

P. Sousa, A. Bessani, M. Correia, N. Neves, and P. Verissimo, “Highly available intrusion-tolerant services with proactive-reactive recovery,” IEEE Transactions on Parallel and Distributed Systems, vol. 21, no. 4, pp. 452 – 465, 2010.

M. O. Rabin, “Randomized byzantine generals,” in Proceedings of the 24th Annual Symposium on Foundations of Computer Science, ser. SFCS ’83. Washington, DC, USA: IEEE Computer Society, 1983, pp. 403– 409.

R. J. d. A. Macêdo and S. Gorender, “Detectores Perfeitos em Sistemas Distribuídos Não Síncronos,” in Anais do IX Workshop de Testes e Tolerância a Falhas, 2008, pp. 127–140.

R. J. d. A. Macêdo and S. S. Gorender, “Perfect failure detection in the partitioned synchronous distributed system model,” Proceedings - International Conference on Availability, Reliability and Security, ARES 2009, pp. 273–280, 2009.

R. J. d. A. Macêdo and S. Gorender, “Exploiting Partitioned Synchrony to Implement Accurate Failure Detectors,” International Journal of Critical Computer-Based Systems (IJCCBS) - To appear, vol. 3, no. 1988, pp. 1–12, 2010.

S. Gorender and R. J. d. A. Macêdo, “Consenso Distribuído Eficiente no Modelo Síncrono Particionado,” Anais do XXIX Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos, pp. 587–600, 2011.

K.P. Kihlstrom, L.E. Moser, and P.M. Melliar-Smith, “Byzantine Fault Detectors for Solving Consensus.” Comput. J., vol. 46, no. 1, pp. 16–35, 2003.

A. Doudou and A. Schiper, “Muteness detectors for consensus with Byzantine processes,” EPFL/Departament D’Informatique, Tech. Rep., 1997.

A. Haeberlen, P. Kouznetsov, and P. Druschel, “The Case for Byzantine Fault Detection,” in Proc. of the 2nd Conference on Hot Topics in System Dependability. USENIX., vol. 2, United States, 2006.

S. W. da, R. M.A., and R. d. A. Macêdo, “Uma Experiência de Adaptação de Quóruns para Detecção de Falhas em Consenso Byzantino no Modelo Síncrono Particionado,” in Extended Abstract in Proc. of the VII Workshop on Autonomic Distributed System (WoSIDA 2017), SBRC 2017, vol. 1, Belém, Pará, Brazil, 2017.

P. Druschel, A. Haeberlen, and P. Kouznetsov, “Abstracting out Byzantine Behavior.” in From Security to Dependability, ser. Dagstuhl Seminar Proceedings, C. Cachin, F. C. Freiling, and J.-H. Hoepman, Eds., vol. 06371. Internationales Begegnungs- und Forschungszentrum fuer Informatik (IBFI), Schloss Dagstuhl, Germany, 2006, pp. 1–12.

C. Martn and M. Larrea, “Eventual leader election in the crash-recovery failure model,” in Proceedings of the 14th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2008, 12 2008, pp. 208–215.

H. Kopetz and G. Bauer, “The Time-Triggered Architecture,” in PROCEEDINGS OF THE IEEE, 2003, pp. 112–126.

A. Doudou, B. Garbinato, and R. Guerraoui, “Encapsulating Failure Detection: from Crash to Byzantine Failures,” in Proc. International Conference on Reliable Software Technologies. Springer-Verlag, 2002, pp. 24–50.

T. D. Chandra and S. Toueg, “Unreliable Failure Detectors for Reliable Distributed Systems,” Journal of the ACM, vol. 43, no. 2, pp. 225–267, 1996.

A. E. F. Silva and R. J. d. A. Macêdo, “Um Ambiente para Testes e Simulações de Protocolos Confiáveis em Sistemas Distribuídos Híbridos e Dinâmicos,” in 27o Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos. SBC, 2009, pp. 826–839.