Uma Abordagem Experimental para Avaliar os Níveis de Consistência do Banco de Dados NoSQL Cassandra

  • Saulo Ferreira UFRPE
  • Ermeson Andrade UFRPE
  • Júlio Mendonça IFAL

Abstract


Distributed computing allows the communication between multiple computers, making possible the data distribution between them, for example. In spite of that, this technology raises some architecture problems, like, for instance, the data consistency. The consistency of the data replicated among different servers aims to ensure that the same data is accessed on all running computers. However, ensuring consistency can affect the performance, since each level of consistency has its advantages and disadvantages. Therefore, this work aims at evaluating the impact of the consistency levels on the performance of the NoSQL (Not Only SQL) database Cassandra, where different scenarios and workloads are considered to study the trade-offs that emerge because of such levels. We embrace an experimental approach to evaluate and analyze the system response time when those different consistency levels and workloads are used. The results show that a high load of simultaneous users increases the disparity between the response times that each level presents, as well as the amount of data involved in the requests.

References

Abadi, D. (2012). Consistency tradeoffs in modern distributed database system design: Cap is only part of the story. Computer, 45(2):37–42.

Abramova, V. and Bernardino, J. (2013). Nosql databases: Mongodb vs cassandra. In Proceedings of the international C* conference on computer science and software engineering, pages 14–22.

Bermbach, D. and Tai, S. (2011). Eventual consistency: How soon is eventual? an evaluation of amazon s3’s consistency behavior. In Proceedings of the 6th Workshop on Middleware for Service Oriented Computing, pages 1–6.

Bermbach, D. and Tai, S. (2014). Benchmarking eventual consistency: Lessons learned from long-term experimental studies. In 2014 IEEE International Conference on Cloud Engineering, pages 47–56. IEEE.

Bhagwan, R., Savage, S., and Voelker, G. M. (2003). Understanding availability. In International Workshop on Peer-to-Peer Systems, pages 256–267. Springer.

Borthakur, D. (2007). The hadoop distributed file system: Architecture and design. Hadoop Project Website, 11(2007):21.

Brewer, E. (2012). Cap twelve years later: How the ”rules” have changed. Computer, 45(2):23–29.

Burckhardt, S. (2014). Principles of eventual consistency.

Cheng, C.-S. (2016). Theory of Factorial Design. Chapman and Hall/CRC.

Chodorow, K. (2013). MongoDB: the definitive guide: powerful and scalable data storage. ”O’Reilly Media, Inc.”.

DB-Engines (2021). DB-Engines Ranking. https://db-engines.com/en/ranking. [Online; accessed 17-jan-2021].

Dede, E., Sendir, B., Kuzlu, P., Hartog, J., and Govindaraju, M. (2013). An evaluation of cassandra for hadoop. In 2013 IEEE Sixth International Conference on Cloud Computing, pages 494–501. IEEE.

Diogo, M., Cabral, B., and Bernardino, J. (2019). Consistency models of nosql databases. Future Internet, 11(2):43.

Gomes, C., Borba, E., Tavares, E., and Junior, M. N. d. O. (2019). Performability model for assessing nosql dbms consistency. In 2019 IEEE International Systems Conference (SysCon), pages 1–6. IEEE.

Gorbenko, A., Romanovsky, A., and Tarasyuk, O. (2019). Fault tolerant internet computing: Benchmarking and modelling trade-offs between availability, latency and consistency. Journal of Network and Computer Applications, 146:102412.

Halili, E. H. (2008). Apache JMeter: A practical beginner’s guide to automated testing and performance measurement for your websites. Packt Publishing Ltd.

Hewitt, E. (2010). Cassandra: the definitive guide. ”O’Reilly Media, Inc.”.

Lakshman, A. and Malik, P. (2010). Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35–40.

Le Lann, G. (1977). Distributed systems-towards a formal approach. In IFIP congress, volume 7, pages 155–160. Toronto.

Liu, S., Nguyen, S., Ganhotra, J., Rahman, M. R., Gupta, I., and Meseguer, J. (2015). Quantitative analysis of consistency in nosql key-value stores. In International Conference on Quantitative Evaluation of Systems, pages 228–243. Springer.

Membrey, P., Plugge, E., Hawkins, T., and Hawkins, D. (2010). The definitive guide to MongoDB: the noSQL database for cloud and desktop computing. Springer.

Nadiminti, K., De Assunçao, M. D., and Buyya, R. (2006). Distributed systems and recent innovations: Challenges and benefits. InfoNet Magazine, 16(3):1–5.

Nejati Sharif Aldin, H., Deldari, H., Moattar, M. H., and Razavi Ghods, M. (2019). Consistency models in distributed systems: A survey on definitions, disciplines, challenges and applications. arXiv e-prints, pages arXiv–1902.

Özsu, M. T. and Valduriez, P. (1996). Distributed and parallel database systems. ACM Computing Surveys (CSUR), 28(1):125–128.

Schultz, W., Avitabile, T., and Cabral, A. (2019). Tunable consistency in mongodb. Proceedings of the VLDB Endowment, 12(12):2071–2081.

Simon, S. (2000). Brewer’s cap theorem. CS341 Distributed Information Systems, University of Basel (HS2012).

Wang, H., Li, J., Zhang, H., and Zhou, Y. (2014). Benchmarking replication and consistency strategies in cloud serving databases: Hbase and cassandra. In Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware, pages 71–82. Springer.
Published
2021-10-26
FERREIRA, Saulo; ANDRADE, Ermeson; MENDONÇA, Júlio. Uma Abordagem Experimental para Avaliar os Níveis de Consistência do Banco de Dados NoSQL Cassandra. In: BRAZILIAN SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS (SSCAD), 22. , 2021, Belo Horizonte. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 156-167. DOI: https://doi.org/10.5753/wscad.2021.18520.