Parallel State Machine Replication with Hybrid Scheduling

  • Aldênio Burgos UnB
  • Eduardo Alchieri UnB

Abstract


State Machine Replication (SMR) is an approach used to implement fault-tolerant systems. In this approach, servers are replicated and client requests are deterministically executed in the same order across replicas. Parallel SMR uses a scheduler to allow parallel execution of some requests, improving performance. The late scheduler schedules requests for execution after they are ordered, while in the early scheduling part of the scheduling decisions are made before requests are ordered and must be respected during execution. This work proposes a protocol for hybrid scheduling that leverage from the advantages of each of the previous approaches. Experiments show that hybrid scheduler outperforms previous approaches in many scenarios. Moreover, through the implementation of a payment system, this work shows that the proposed approach for parallel SMRs circumvents previously described performance limitations in the implementation of blockchains that uses an SMR as an underline building block.
Keywords: Parallel State Machine Replication, Scheduling, Dependability

References

Abraham, I., Malkhi, D., Nayak, K., Ren, L., and Spiegelman, A. (2017). Solida: A Blockchain Protocol Based on Reconfigurable Byzantine Consensus. In Proceedings of the 21st International Conference on Principles of Distributed Systems.

Alchieri, E., Dotti, F., Mendizabal, O. M., and Pedone, F. (2017). Reconfiguring parallel state machine replication. In SRDS.

Alchieri, E., Dotti, F., and Pedone, F. (2018). Early scheduling in parallel state machine replica. In ACM SoCC.

Bessani, A., Alchieri, E., Sousa, J., Oliveira, A., and Pedone, F. (2020). From byzantine replication to blockchain: Consensus is only the beginning. In 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

Bessani, A., Sousa, J. a., and Alchieri, E. E. P. (2014). State machine replication for the masses with bft-smart. In International Conf. on Dependable Systems and Networks.

Burgos, A., Alchieri, E., and Dotti, F. (2021). On the performance of using parallel state machine replication to implement blockchains. In 2021 10th Latin-American Symposium on Dependable Computing (LADC), pages 1–6.

Burgos, A., Alchieri, E., Dotti, F., and Pedone, F. (2022). Exploiting concurrency in sharded parallel state machine replication. IEEE Transactions on Parallel and Distributed Systems, 33(9):2133–2147.

Cachin, C. and Vukolic, M. (2017). Blockchain consensus protocol in the wild (invited paper). In Proceedings of the 31th International Symposium on Distributed Computing.

Castro, M. and Liskov, B. (1999). Practical Byzantine fault tolerance. In 3rd Symposium on Operating Systems Design and Implementation. USENIX Association.

Castro, M. and Liskov, B. (2002). Practical byzantine fault-tolerance and proactive recovery. ACM Transactions on Computer Systems, 20(4):398–461.

Cui, H., Gu, R., Liu, C., Chen, T., and Yang, J. (2015). Paxos made transparent. In 25th Symposium on Operating Systems Principles.

Dwork, C., Lynch, N. A., and Stockmeyer, L. (1988). Consensus in the presence of partial synchrony. Journal of ACM, 35(2):288–322.

Escobar, I. A., Alchieri, E., Dotti, F. L., and Pedone, F. (2019). Boosting concurrency in parallel state machine replication. In Proc. 20th International Middleware Conference.

Guo, Z., Hong, C., Yang, M., Zhou, D., Zhou, L., and Zhuang, L. (2014). Rex: Replication at the speed of multi-core. In European Conference on Computer Systems.

Habiger, G., Hauck, F. J., Reiser, H. P., and Köstler, J. (2020). Self-optimising applicationagnostic multithreading for replicated state machines. In SRDS.

Herlihy, M. and Wing, J. M. (1990). Linearizability: A correctness condition for concurrent objects. ACM Trans. on Programing Languages and Systems, 12(3):463–492.

Kapritsos, M., Wang, Y., Quema, V., Clement, A., Alvisi, L., and Dahlin, M. (2012). All about eve: execute-verify replication for multi-core servers. In OSDI.

Kotla, R. and Dahlin, M. (2004a). High throughput byzantine fault tolerance. In IEEE/IFIP Int. Conference on Dependable Systems and Networks.

Kotla, R. and Dahlin, M. (2004b). High throughput byzantine fault tolerance. In DSN.

Lamport, L. (1998). The part-time parliament. ACM Transactions on Computer Systems, 16(2):133–169.

Marandi, P. J., Bezerra, C. E., and Pedone, F. (2014a). Rethinking state machine replication for parallelism. In ICDCS.

Marandi, P. J., Bezerra, C. E. B., and Pedone, F. (2014b). Rethinking state-machine replication for parallelism. In ICDCS.

Marandi, P. J. and Pedone, F. (2014). Optimistic parallel state-machine replication. In SRDS.

Mendizabal, O. M., Moura, R. T. S., Dotti, F. L., and Pedone, F. (2017). Efficient and deterministic scheduling for parallel state machine replication. In IPDPS.

Pass, R. and Shi, E. (2017). Hybrid Consensus: Efficient Consensus in the Permissionless Model. In Proceedings of the 31st International Symposium on Distributed Computing.

Schneider, F. B. (1990). Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22(4):299–319.
Published
2022-07-31
BURGOS, Aldênio; ALCHIERI, Eduardo. Parallel State Machine Replication with Hybrid Scheduling. In: THESIS AND DISSERTATION CONTEST (CTD), 35. , 2022, Niterói. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 71-80. ISSN 2763-8820. DOI: https://doi.org/10.5753/ctd.2022.222821.