Replicação Máquina de Estados Paralelas com Escalonamento Híbrido

  • Aldênio Burgos UnB
  • Eduardo Alchieri UnB
  • Fernando Dotti PUCRS
  • Fernando Pedone Università della Svitzzera italiana

Abstract


State Machine Replication (SMR) is an approach used to implement fault-tolerant systems. In this approach, servers are replicated and client requests are deterministically executed in the same order across replicas. To improve its performance, parallel SMR uses a scheduler to allow parallel execution of some requests. The late scheduler schedules requests for execution after they are ordered, while in the early scheduling part of the scheduling decisions are made before requests are ordered and must be respected during execution. This work proposes a protocol for hybrid scheduling that leverage from the advantages of each of the previous approaches. Experiments show that hybrid scheduler outperforms previous approaches in many scenarios.

References

Alchieri, E., Dotti, F., Mendizabal, O. M., and Pedone, F. (2017). Reconfiguring parallel state machine replication. In SRDS.

Alchieri, E., Dotti, F., and Pedone, F. (2018). Early scheduling in parallel state machine replica. In ACM SoCC.

Bessani, A., Sousa, J., and Alchieri, E. (2014). State machine replication for the masses with bft-smart. In DSN.

Castro, M. and Liskov, B. (2002). Practical byzantine fault-tolerance and proactive recovery. ACM Transactions on Computer Systems, 20(4):398–461.

Défago, X., Schiper, A., and Urbán, P. (2004). Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Computing Surveys, 36(4):372–421.

Escobar, I. A., Alchieri, E., Dotti, F. L., and Pedone, F. (2019). Boosting concurrency in parallel state machine replication. In Proc. 20th International Middleware Conference.

Fischer, M. J., Lynch, N. A., and Paterson, M. S. (1985). Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374–382.

Hadzilacos, V. and Toueg, S. (1993). Fault-tolerant broadcasts and related problems. In Mullender, S., editor, Distributed Systems, pages 97–145. ACM Press/AddisonWesley, New York, NY, USA.

Herlihy, M. and Wing, J. M. (1990). Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programing Languages and Systems, 12(3):463– 492.

Kotla, R. and Dahlin, M. (2004). High throughput byzantine fault tolerance. In IEEE/IFIP Int. Conference on Dependable Systems and Networks.

Lamport, L. (1998). The part-time parliament. ACM Transactions on Computer Systems, 16(2):133–169.

Maffione, V., Lettieri, G., and Rizzo, L. (2018). Cache-aware design of general-purpose single-producer-single-consumer queues. Software: Practice and Experience, 49.

Marandi, P. J., Bezerra, C. E., and Pedone, F. (2014). Rethinking state machine replication for parallelism. In ICDCS.

Mendizabal, O. M., Moura, R. T. S., Dotti, F. L., and Pedone, F. (2017). Efficient and deterministic scheduling for parallel state machine replication. In IPDPS.

Schneider, F. B. (1990). Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22(4):299–319.
Published
2021-08-16
BURGOS, Aldênio; ALCHIERI, Eduardo; DOTTI, Fernando; PEDONE, Fernando. Replicação Máquina de Estados Paralelas com Escalonamento Híbrido. In: BRAZILIAN SYMPOSIUM ON COMPUTER NETWORKS AND DISTRIBUTED SYSTEMS (SBRC), 39. , 2021, Uberlândia. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 476-489. ISSN 2177-9384. DOI: https://doi.org/10.5753/sbrc.2021.16741.

Most read articles by the same author(s)

1 2 > >>