Benchmark TPC-C Aplicado em Replicação Máquina de Estados

Kayel L. Serafim; Eduardo Alchieri; Fernando Dotti

doi:10.5753/wtf.2023.803

Kayel L. Serafim PUCRS
Eduardo Alchieri UnB
Fernando Dotti PUCRS

DOI: https://doi.org/10.5753/wtf.2023.803

Resumo

A Replicação Máquina de Estados (RME) é uma abordagem importante para prover serviços de alta disponibilidade. No entanto, o aumento da vazão em RME é um desafio devido ao seu modelo determinístico. Uma lacuna importante é a falta de cargas de trabalho que permitam avaliar diferentes mecanismos para RME sob os mesmos referenciais, representativos de aplicações de interesse. Identificaram-se aspectos comuns relevantes no contexto de transações online, e propõe-se o uso do benchmark TPC-C do Transaction Processing Performance Comittee para avaliar RMEs. A arquitetura é discutida e implementada em uma plataforma de replicação, no contexto de RME. Os resultados obtidos foram reportados utilizando o modelo clássico de RME. Além disso, foi discutida e implementada uma abordagem para RME paralela com essa carga de trabalho, e os resultados também foram reportados.

Referências

Alchieri, E., Dotti, F., and Pedone, F. (2018). Early scheduling in parallel state machine replica. In ACM SoCC.

Attiya, H. and Welch, J. (2004). Distributed Computing: Fundamentals, Simulations, and Advanced Topics. Wiley-Interscience.

Batista, E., Alchieri, E., Dotti, F., and Pedone, F. (2022). Early scheduling on steroids: Boosting parallel state machine replication. Journal of Parallel and Distributed Computing, 163:269–282.

Bessani, A., Sousa, J. a., and Alchieri, E. E. P. (2014). State machine replication for the masses with bft-smart. In Proceedings of the 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN ’14, pages 355–362, Washington, DC, USA. IEEE Computer Society.

Bezerra, C. E., Pedone, F., and Renesse, R. V. (2014). Scalable state-machine replication. In DSN, pages 331–342.

Burgos, A., Alchieri, E., Dotti, F., and Pedone, F. (2022). Exploiting concurrency in sharded parallel state machine replication. IEEE Transactions on Parallel and Distributed Systems, 33(9):2133–2147.

Burrows, M. (2006). The chubby lock service for loosely-coupled distributed systems. In OSDI.

Castro, M. and Liskov, B. (2002). Practical byzantine fault-tolerance and proactive recovery. ACM Transactions on Computer Systems, 20(4):398–461.

Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. (2008). Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):1–26.

Cowling, J. and Liskov, B. (2012). Granola: Low-Overhead distributed transaction coordination. In USENIX Annual Technical Conference. USENIX Association.

Escobar, I., Alchieri, E., Dotti, F., and Pedone, F. (2019). Boosting concurrency in parallel state machine replication. In Middleware.

Hadzilacos, V. and Toueg, S. (1994). A modular approach to the specification and implementation of fault-tolerant broadcasts. Technical report, Department of Computer Science, Cornell.

Herlihy, M. P. and Wing, J. M. (1990). Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 12(3):463–492.

Hunt, P., Konar, M., Junqueira, F. P., and Reed, B. (2010). Zookeeper: wait-free coordination for internet-scale systems. In ATC, volume 8.

J. C. Corbett, J. D. and et al, M. E. (2012). Spanner: Google’s globally distributed database. In OSDI.

Kapritsos, M., Wang, Y., Quema, V., Clement, A., Alvisi, L., and Dahlin, M. (2012). All about eve: execute-verify replication for multi-core servers. In OSDI.

Kotla, R. and Dahlin, M. (2004). High throughput byzantine fault tolerance. In DSN.

Lakshman, A. and Malik, P. (2010). Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35–40.

Lamport, L. (1978). Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558–565.

Le, L. H. (2020). Scaling State Machine Replication. PhD thesis, Università della Svizzera italiana.

Li, Z., Van Roy, P., and Romano, P. (2017). Enhancing throughput of partially replicated state machines via multi-partition operation scheduling. In 2017 IEEE 16th International Symposium on Network Computing and Applications (NCA), pages 1–10.

Marandi, P. J., Bezerra, C. E. B., and Pedone, F. (2014). Rethinking state-machine replication for parallelism. In ICDCS.

Marandi, P. J. and Pedone, F. (2014). Optimistic parallel state-machine replication. In SRDS.

Mendizabal, O. M., Moura, R. T. S., Dotti, F. L., and Pedone, F. (2017). Efficient and deterministic scheduling for parallel state machine replication. In IPDPS.

Pedone, F., Alchieri, E., Dotti, F., Marandi, P., and Mendizabal, O. (2018). Boosting state machine replication with parallel execution. In Anais do VIII Latin-American Symposium on Dependable Computing, pages 77–86, Porto Alegre, RS, Brasil. SBC.

Schiper, N., Sutra, P., and Pedone, F. (2010). P-store: Genuine partial replication in wide area networks. In Symposium on Reliable Distributed Systems (SRDS).

Schneider, F. B. (1990). Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22(4):299–319.

Shvachko, K., Kuang, H., Radia, S., and Chansler, R. (2010). The hadoop distributed file system. In MSST.

Thomson, A., Diamond, T., Weng, S.-C., Ren, K., Shao, P., and Abadi, D. J. (2012). Calvin: Fast distributed transactions for partitioned database systems. In Proceedings of ACM SIGMOD International Conference on Management of Data, SIGMOD ’12.

Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., and Wilkes, J. (2015). Large-scale cluster management at google with borg. In EuroSys.