Replicação Máquina de Estados Paralelas com Escalonamento Híbrido
Abstract
State Machine Replication (SMR) is an approach used to implement fault-tolerant systems. In this approach, servers are replicated and client requests are deterministically executed in the same order across replicas. To improve its performance, parallel SMR uses a scheduler to allow parallel execution of some requests. The late scheduler schedules requests for execution after they are ordered, while in the early scheduling part of the scheduling decisions are made before requests are ordered and must be respected during execution. This work proposes a protocol for hybrid scheduling that leverage from the advantages of each of the previous approaches. Experiments show that hybrid scheduler outperforms previous approaches in many scenarios.
References
Alchieri, E., Dotti, F., and Pedone, F. (2018). Early scheduling in parallel state machine replica. In ACM SoCC.
Bessani, A., Sousa, J., and Alchieri, E. (2014). State machine replication for the masses with bft-smart. In DSN.
Castro, M. and Liskov, B. (2002). Practical byzantine fault-tolerance and proactive recovery. ACM Transactions on Computer Systems, 20(4):398–461.
Défago, X., Schiper, A., and Urbán, P. (2004). Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Computing Surveys, 36(4):372–421.
Escobar, I. A., Alchieri, E., Dotti, F. L., and Pedone, F. (2019). Boosting concurrency in parallel state machine replication. In Proc. 20th International Middleware Conference.
Fischer, M. J., Lynch, N. A., and Paterson, M. S. (1985). Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374–382.
Hadzilacos, V. and Toueg, S. (1993). Fault-tolerant broadcasts and related problems. In Mullender, S., editor, Distributed Systems, pages 97–145. ACM Press/AddisonWesley, New York, NY, USA.
Herlihy, M. and Wing, J. M. (1990). Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programing Languages and Systems, 12(3):463– 492.
Kotla, R. and Dahlin, M. (2004). High throughput byzantine fault tolerance. In IEEE/IFIP Int. Conference on Dependable Systems and Networks.
Lamport, L. (1998). The part-time parliament. ACM Transactions on Computer Systems, 16(2):133–169.
Maffione, V., Lettieri, G., and Rizzo, L. (2018). Cache-aware design of general-purpose single-producer-single-consumer queues. Software: Practice and Experience, 49.
Marandi, P. J., Bezerra, C. E., and Pedone, F. (2014). Rethinking state machine replication for parallelism. In ICDCS.
Mendizabal, O. M., Moura, R. T. S., Dotti, F. L., and Pedone, F. (2017). Efficient and deterministic scheduling for parallel state machine replication. In IPDPS.
Schneider, F. B. (1990). Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22(4):299–319.
