ABSTRACT
Cloud native database systems provide highly available and scalable services as part of cloud platforms by transparently replicating and partitioning data across automatically managed resources. Some systems, such as Google Spanner, are designed and implemented from scratch. Others, such as Amazon Aurora, derive from traditional database systems for better compatibility but disaggregate storage to cloud services. Unfortunately, because they follow an open-box approach and fork the original code base, they are difficult to implement and maintain.
We address this problem with Loom, a replicated and partitioned database system built on top of PostgreSQL that delegates durable storage to a distributed log native to the cloud. Unlike previous disaggregation proposals, Loom is a closed-box approach that uses the original server through existing interfaces to simplify implementation and improve robustness and maintainability. Experimental evaluation achieves 6 × higher throughput and 5 × lower response time than standard replication and competes with the state of the art in cloud and HPC hardware.
- Divyakant Agrawal, Gustavo Alonso, Amr El Abbadi, and Ioana Stanoi. 1997. Exploiting atomic broadcast in replicated databases. In European Conference on Parallel Processing. Springer, 496–503.Google Scholar
- Deepthi Devaki Akkoorath, Alejandro Z Tomsic, Manuel Bravo, Zhongmiao Li, Tyler Crain, Annette Bieniusa, Nuno Preguiça, and Marc Shapiro. 2016. Cure: Strong Semantics Meets High Availability and Low Latency. In 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS). 405–414. https://doi.org/10.1109/ICDCS.2016.98Google ScholarCross Ref
- Panagiotis Antonopoulos, Alex Budovski, Cristian Diaconu, Alejandro Hernandez Saenz, Jack Hu, Hanuma Kodavalla, Donald Kossmann, Sandeep Lingam, Umar Farooq Minhas, Naveen Prakash, Vijendra Purohit, Hugh Qu, Chaitanya Sreenivas Ravella, Krystyna Reisteter, Sheetal Shrotri, Dixin Tang, and Vikram Wakade. 2019. Socrates: The New SQL Server in the Cloud. In Proceedings of the 2019 International Conference on Management of Data(SIGMOD ’19). Association for Computing Machinery, New York, NY, USA, 1743–1756. https://doi.org/10.1145/3299869.3314047Google ScholarDigital Library
- [4] Apache BookKeeper - A scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads 2018. https://bookkeeper.apache.org/Google Scholar
- Augeo Software. [n.d.]. Augeo V-JDBC: Virtual remote access for JDBC datasources.https://github.com/AugeoSoftware/VJDBCGoogle Scholar
- David F Bacon, Nathan Bales, Nico Bruno, Brian F Cooper, Adam Dickinson, Andrew Fikes, Campbell Fraser, Andrey Gubarev, Milind Joshi, Eugene Kogan, Alexander Lloyd, Sergey Melnik, Rajesh Rao, David Shue, Christopher Taylor, Marcel van der Holst, and Dale Woodford. 2017. Spanner: Becoming a SQL System. In Proceedings of the 2017 ACM International Conference on Management of Data(SIGMOD ’17). Association for Computing Machinery, New York, NY, USA, 331–343. https://doi.org/10.1145/3035918.3056103Google ScholarDigital Library
- Mahesh Balakrishnan, Dahlia Malkhi, John D. Davis, Vijayan Prabhakaran, Michael Wei, and Ted Wobber. 2013. CORFU: A Distributed Shared Log. ACM Trans. Comput. Syst. 31, 4, Article 10 (dec 2013), 24 pages. https://doi.org/10.1145/2535930Google ScholarDigital Library
- Sergey Blagodurov, Mike Ignatowski, and Valentina Salapura. 2021. The Time is Ripe for Disaggregated Systems. ACM SIGARCH Blog. https://www.sigarch.org/the-time-is-ripe-for-disaggregated-systems/Google Scholar
- Navin Budhiraja, Keith Marzullo, Fred B Schneider, and Sam Toueg. 1993. The primary-backup approach. Distributed systems 2 (1993), 199–216.Google ScholarDigital Library
- Wei Cao, Yingqiang Zhang, Xinjun Yang, Feifei Li, Sheng Wang, Qingda Hu, Xuntao Cheng, Zongzhi Chen, Zhenjun Liu, Jing Fang, Bo Wang, Yuhui Wang, Haiqing Sun, Ze Yang, Zhushi Cheng, Sen Chen, Jian Wu, Wei Hu, Jianwei Zhao, Yusong Gao, Songlu Cai, Yunyang Zhang, and Jiawang Tong. 2021. PolarDB Serverless: A Cloud Native Database for Disaggregated Data Centers. In Proceedings of the 2021 International Conference on Management of Data(SIGMOD ’21). Association for Computing Machinery, New York, NY, USA, 2477–2489. https://doi.org/10.1145/3448016.3457560Google ScholarDigital Library
- N. Carvalho, A. Correia Jr., J. Pereira, L. Rodrigues, R. Oliveira, and S. Guedes. 2007. On the use of a reflective architecture to augment database management systems. Journal Of Universal Computer Science 13, 8 (2007), 1110–1135. https://doi.org/10.3217/jucs-013-08-1110Google ScholarCross Ref
- Emmanuel Cecchet, Julie Marguerite, and Willy Zwaenepoel. 2004. C-JDBC: Flexible Database Clustering Middleware. In Proceedings of the FREENIX Track: 2004 USENIX Annual Technical Conference, June 27 - July 2, 2004, Boston Marriott Copley Place, Boston, MA, USA. USENIX, 9–18. http://www.usenix.org/publications/library/proceedings/usenix04/tech/freenix/cecchet.htmlGoogle Scholar
- James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J J Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2013. Spanner: Google’s Globally Distributed Database. ACM Trans. Comput. Syst. 31, 3 (Aug. 2013), 1–22. https://doi.org/10.1145/2491245Google ScholarDigital Library
- Flaviu Cristian, Houtan Aghili, Raymond Strong, and Danny Dolev. 1986. Atomic broadcast: From simple message diffusion to Byzantine agreement. Citeseer.Google Scholar
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s highly available key-value store. Oper. Syst. Rev. 41, 6 (Oct. 2007), 205–220. https://doi.org/10.1145/1323293.1294281Google ScholarDigital Library
- [16] decoder_raw, Output plugin for logical replication 2018. https://github.com/michaelpq/pg_plugins/tree/master/decoder_rawGoogle Scholar
- [17] docker 2018. https://www.docker.com/Google Scholar
- Ira R Forman, Nate Forman, and John Vlissides Ibm. 2004. Java reflection in action. (2004).Google Scholar
- David K Gifford. 1979. Weighted voting for replicated data. In Proceedings of the seventh symposium on Operating systems principles - SOSP ’79. ACM Press, New York, New York, USA. https://doi.org/10.1145/800215.806583Google ScholarDigital Library
- The PostgreSQL Global Development Group. 2023. PostgreSQL 15 Documentation – Replication. Retrieved 2023-05-30 from https://www.postgresql.org/docs/current/runtime-config-replication.htmlGoogle Scholar
- Daniel Gómez Ferro, Flavio Junqueira, Ivan Kelly, Benjamin Reed, and Maysam Yabandeh. 2014. Omid: Lock-free transactional support for distributed data stores. In 2014 IEEE 30th International Conference on Data Engineering. 676–687. https://doi.org/10.1109/ICDE.2014.6816691Google ScholarCross Ref
- Stavros Harizopoulos, Daniel J Abadi, Samuel Madden, and Michael Stonebraker. 2018. OLTP through the looking glass, and what we found there., 409–439 pages. https://doi.org/10.1145/3226595.3226635Google ScholarDigital Library
- Flavio P. Junqueira, Ivan Kelly, and Benjamin Reed. 2013. Durability with BookKeeper. SIGOPS Oper. Syst. Rev. 47, 1 (jan 2013), 9–15. https://doi.org/10.1145/2433140.2433144Google ScholarDigital Library
- Justin Levandoski, David Lomet, and Sudipta Sengupta. 2013. LLAMA: A cache/storage subsystem for modern hardware. (2013).Google ScholarDigital Library
- Justin Levandoski, David Lomet, Sudipta Sengupta, Ryan Stutsman, and Rui Wang. 2015. High performance transactions in deuteronomy. (2015).Google Scholar
- Justin J Levandoski, David B Lomet, and Sudipta Sengupta. 2013. The Bw-Tree: A B-tree for new hardware platforms. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 302–313.Google ScholarDigital Library
- Wyatt Lloyd, Michael J Freedman, Michael Kaminsky, and David G Andersen. 2011. Don’t settle for eventual: Scalable causal consistency for wide-area storage with COPS. In Proc. 23rd ACM Symposium on Operating Systems Principles. https://www.cs.cmu.edu/ dga/papers/cops-sosp2011.pdfGoogle ScholarDigital Library
- Michael. [n.d.]. V-JDBC: Virtual remote access for JDBC datasources.https://vjdbc.sourceforge.net/Google Scholar
- Ravi Murthy and Gurmeet Goindi. 2022. AlloyDB for PostgreSQL under the hood: Intelligent, database-aware storage. Google Cloud Blog. https://cloud.google.com/blog/products/databases/alloydb-for-postgresql-intelligent-scalable-storageGoogle Scholar
- Diego Ongaro and John Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In 2014 USENIX Annual Technical Conference (USENIX ATC 14). 305–319. https://www.usenix.org/system/files/conference/atc14/atc14-paper-ongaro.pdfGoogle ScholarDigital Library
- Oracle. 2019. Mysql 8.0 Reference Manual - Group Replication. Retrieved 2023-05-30 from https://dev.mysql.com/doc/refman/8.0/en/group-replication.htmlGoogle Scholar
- Codership Oy. 2021. Galera Cluster. Retrieved 2023-05-30 from https://galeracluster.com/Google Scholar
- Marta Patiño Martinez, Ricardo Jiménez-Peris, Bettina Kemme, and Gustavo Alonso. 2005. MIDDLE-R: Consistent Database Replication at the Middleware Level. ACM Trans. Comput. Syst. 23, 4 (nov 2005), 375–423. https://doi.org/10.1145/1113574.1113576Google ScholarDigital Library
- Fernando Pedone, Rachid Guerraoui, and André Schiper. 1998. Exploiting atomic broadcast in replicated databases. In European Conference on Parallel Processing. Springer, 513–520.Google ScholarCross Ref
- Fernando Pedone, Rachid Guerraoui, and André Schiper. 2003. The database state machine approach. Distributed and Parallel Databases 14, 1 (2003), 71–98.Google ScholarDigital Library
- [36] Postgres Operator [n.d.]. https://postgres-operator.readthedocs.io/en/latest/Google Scholar
- David Powell, Marc Chérèque, and David Drackley. 1991. Fault-tolerance in Delta-4. ACM SIGOPS Operating Systems Review 25, 2 (1991), 122–125.Google ScholarDigital Library
- Mendel Rosenblum and John K Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems (TOCS) 10, 1 (1992), 26–52.Google ScholarDigital Library
- Fred B Schneider. 1993. Replication management using the state-machine approach. Distributed systems 2 (1993), 169–198.Google Scholar
- Yair Sovran, Russell Power, Marcos K Aguilera, and Jinyang Li. 2011. Transactional storage for geo-replicated systems. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles(SOSP ’11). Association for Computing Machinery, New York, NY, USA, 385–400. https://doi.org/10.1145/2043556.2043592Google ScholarDigital Library
- Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jordan Lewis, Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, Paul Bardea, Amruta Ranade, Ben Darnell, Bram Gruneir, Justin Jaffray, Lucy Zhang, and Peter Mattis. 2020. CockroachDB: The Resilient Geo-Distributed SQL Database. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data(SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 1493–1509. https://doi.org/10.1145/3318464.3386134Google ScholarDigital Library
- [42] TPC-C Benchmark - Standard Specification February 2010. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdfGoogle Scholar
- Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. 2017. Amazon aurora: Design considerations for high throughput cloud-native relational databases. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 1041–1052.Google ScholarDigital Library
- Luiz Gustavo Xavier, Fernando Dotti, Cristina Meinhardt, and Odorico Mendizabal. 2020. Scalable and Decoupled Logging for State Machine Replication. In Anais do XXXVIII Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos. SBC, Porto Alegre, RS, Brasil, 267–280. https://doi.org/10.5753/sbrc.2020.12288Google ScholarCross Ref
Index Terms
- Loom: A Closed-Box Disaggregated Database System
Recommendations
PolarDB-IMCI: A Cloud-Native HTAP Database System at Alibaba
PACMMODCloud-native databases have become the de-facto choice for mission-critical applications on the cloud due to the need for high availability, resource elasticity, and cost efficiency. Meanwhile, driven by the increasing connectivity between data ...
Graph database benchmarking on cloud environments with XGDBench
Online graph database service providers have started migrating their operations to public clouds due to the increasing demand for low-cost, ubiquitous graph data storage and analysis. However, there is little support available for benchmarking graph ...
Parallel analytics as a service
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of DataRecently, massively parallel processing relational database systems (MPPDBs) have gained much momentum in the big data analytic market. With the advent of hosted cloud computing, we envision that the offering of MPPDB-as-a-Service (MPPDBaaS) will become ...
Comments