Suportando Execuções Paralelas no Bft-SmaRt

Eduardo Adilio Pelinson Alchieri

doi:10.5753/wtf.2015.22934

Eduardo Adilio Pelinson Alchieri UnB

DOI: https://doi.org/10.5753/wtf.2015.22934

Resumo

A replicação Máquina de Estados é uma das abordagens mais usadas na implementação de sistemas tolerantes a falhas, tanto por parada quanto bizantinas. Esta abordagem consiste em replicar os servidores e coordenar as interações entre os clientes e as réplicas dos servidores, com o intuito de que as várias réplicas apresentem a mesma evolução em seus estados. Para isso, as requisições dos clientes devem ser ordenadas e executadas seguindo esta ordem em todas as replicas. Este requisito fez com que a maioria dos trabalhos utilizassem uma única thread de execução em cada réplica. Com o objetivo de melhorar o desempenho do sistema, novas abordagens foram introduzidas para suportar varias threads de execução por réplica. Dando seguimento a estes trabalhos, este artigo descreve como um protocolo que possibilita o emprego de varias threads de execução nas réplicas foi adaptado e implementado no BFT-SMART, além de analisar uma série de experimentos realizados.

Referências

Abd-El-Malek, M., Ganger, G., Goodson, G., Reiter, M., and Wylie, J. (2005). Fault-scalable Byzantine fault-tolerant services. In ACM Symposium on Operating Systems Principles.

Alchieri, E. A. P., Bessani, A. N., and da Silva Fraga, J. (2013). Replicação máquina de estados dinâmica. In Anais do XIV Workshop de Teste e Tolerância a Falhas- WTF 2013.

Bessani, A., Santos, M., Felix, J. a., Neves, N., and Correia, M. (2013). On the efficiency of durable state machine replication. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference, USENIX ATC’13, pages 169–180. USENIX Association.

Bessani, A., Sousa, J., and Alchieri, E. (2014). State machine replication for the masses with BFT-SMaRt. In Proc. of the International Conference on Dependable Systems and Networks.

Bessani, A. N., Alchieri, E. A. P., Correia, M., and da Silva Fraga, J. (2008). Depspace: a byzantine fault-tolerant coordination service. SIGOPS Oper. Syst. Rev., 42(4):163–176.

Castro, M. and Liskov, B. (2002). Practical Byzantine fault-tolerance and proactive recovery. ACM Transactions on Computer Systems, 20(4):398–461.

Cowling, J., Myers, D., Liskov, B., Rodrigues, R., and Shrira, L. (2006). HQ-Replication: A hybrid quorum protocol for Byzantine fault tolerance. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation.

Gelernter, D. (1985). Generative communication in Linda. ACM Transactions on Programing Languages and Systems, 7(1):80–112.

Guerraoui, R., Knežević, N., Quéma, V., and Vukolić, M. (2010). The next 700 BFT protocols. In Proceedings of the ACM SIGOPS/EuroSys European Systems Conference.

Hadzilacos, V. and Toueg, S. (1994). A modular approach to the specification and implementation of fault-tolerant broadcasts. Technical report, Department of Computer Science, Cornell.

Kotla, R., Alvisi, L., Dahlin, M., Clement, A., and Wong, E. (2009). Zyzzyva: Speculative Byzantine fault tolerance. ACM Transactions on Computer Systems, 27(4):7:1–7:39.

Kotla, R. and Dahlin, M. (2004). High throughput byzantine fault tolerance. In Proceedings of the International Conference on Dependable Systems and Networks.

Lamport, L. (2006). Fast paxos. Distributed Computing, 19(2).

Liskov, B. and Cowling, J. (2012). Viewstamped replication revisited. Technical Report MIT-CSAIL-TR-2012-021, MIT.

Marandi, P. J., Bezerra, C. E., and Pedone, F. (2014). Rethinking state-machine replication for parallelism. In Proceedings of the 34th Int. Conference on Distributed Computing Systems.

Marandi, P. J. and Pedone, F. (2014). Optimistic parallel state-machine replication. In Proceedings of the 33rd International Symposium on Reliable Distributed Systems.

Marandi, P. J., Primi, M., Schiper, N., and Pedone, F. (2010). Ring paxos: A high-throughput atomic broadcast protocol. In Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks.

Mendizabal, O., Marandi, P. J., Dotti, F., and Pedone, F. (2014). Recovery in parallel state-machine replication. In Proc. of Int. Conference on Principles of Distributed Systems.

Oki, B. and Liskov, B. (1988). Viewstamped replication: A new primary copy method to support highly-available distributed systems. In Proceedings of the 7th Annual ACM Symposium on Principles of Distributed Computing, pages 8–17.

Schneider, F. B. (1990). Implementing fault-tolerant service using the state machine aproach: A tutorial. ACM Computing Surveys, 22(4):299–319.

White, B., Lepreau, J., Stoller, L., Ricci, R., Guruprasad, S., Newbold, M., Hibler, M., Barb, C., and Joglekar, A. (2002). An Integrated Experimental Environment for Distributed Systems and Networks. In Proc. of 5th Symp. on Operating Systems Design and Implementations.

Zbierski, M. (2015). Parallel byzantine fault tolerance. In Wilinski, A., Fray, I. E., and Pejas, J., editors, Soft Computing in Computer and Information Science, volume 342 of Advances in Intelligent Systems and Computing, pages 321–333. Springer International Publishing.