Algorithms for Cold Data Identification in In-Memory Databases

  • Alessandra V. Santos Universidade de Fortaleza (UNIFOR)
  • Vládia Pinheiro Universidade de Fortaleza (UNIFOR)
  • José Maria Monteiro Federal University of Ceará (UFC)

Abstract


The growth in main-memory storage capacity has fueled the development of main-memory database systems. Thus, many OLTP databases can be stored entirely in the main memory. However, due to the continued growth of data, dealing with data overflow is crucial. OLTP workloads often exhibit skewed access patterns, where some records are hot (frequently accessed) but many records are cold (rarely or never accessed). So, it is more economical to store the coldest records on secondary storage such as flash or hard disk. Recently, many research works have addressed the data overflow problem, developing approaches to identify hot/cold data. In this paper, we present two new algorithms called 2QCold and ARCold, which adapt the classic 2Q and ARC cache algorithms to identify cold data. We implement our algorithms using Seal-DB and compare them with the classic LRU, Forward and Belady algorithms. The TPC-C benchmark was used in the experiments. The results show that both 2QCold and ARCold reduce response time and increase hit ratio outperforming related works.
Keywords: Management, Cold Data, In-Memory Databases

References

Afify Ghada M, Bastawissy Ali El, H. O. M. (2016). Identifying hot / cold data in main- memory database using frequent item set mining. International Journal of Enhanced Research in Management & Computer Applications, pages 35–42.

Belady, L. A. (1966). A study of replacement algorithms for a virtual-storage computer. IBM Systems journal, 5(2):78–101.

Diaconu, C., Freedman, C., Ismert, E., Larson, P.-A., Mittal, P., Stonecipher, R., Verma, N., and Zwilling, M. (2013). Hekaton: Sql server’s memory-optimized oltp engine. Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 1243–1254.

Eldawy, A., Levandoski, J., and Larson, P.-A. (2014). Trekking through siberia: Mana- ging cold data in a memory-optimized database. In Proceedings of the VLDB Endow- ment, volume 7, pages 931–942.

Emmanuel, I. and Stanier, C. (2016). Defining big data. In Proceedings of the International Conference on Big Data and Advanced Wireless Technologies, pages 1–6.

Funke Florian, Kemper Alfons, N. T. (2012). Compacting transactional data in hybrid oltp & olap databases. Proceedings of the VLDB Endowment, pages 1424–1435.

Ha, H., Shim, D., Lee, H., and Park, D. (2021). Dynamic hot data identification using a stack distance approximation. IEEE Access, 9:79889–79903.

Johnson Theodore, Shasha Dennis, N. B. O. W. G. (1994). 2q : A low overhead high per- formance management replacement algorithm. Proceedings of the 20th International Conference on Very Large Data Bases, pages 439–450.

Kemper Alfons, N. T. (2011). Hyper : A hybrid oltp & olap main memory database system based on virtual memory snapshots. 2011 IEEE 27th International Conference on Data Engineering, pages 195–206.

Lahiri, T., Neimat, M.-A., and Folkman, S. (2013). Oracle timesten: An in-memory database for enterprise applications. IEEE Data Eng. Bull., 36(2):6–13.

Megiddo, N. and Modha, D. S. (2003). Arc: A self-tuning, low overhead replacement cache. Proceedings of the 2Nd USENIX Conference on File and Storage Technologies, pages 115–130.

Moraes, G., Moraes Filho, J. d. A., and Brayner, A. (2017). Seal-db: Uma ferramenta de suporte ao aprendizado de banco de dados. 32th Brazilian Symposium on Databases DEMOS AND APPLICATIONS SESSION PROCEEDINGS, pages 35–40.

Pathak, A., Gurajada, A., and Khadilkar, P. (2018). Life cycle of transactional data in in- memory databases. In 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW), pages 122–133. IEEE.

Stoica, R., Levandoski, J. J., and Larson, P.-A. (2013). Identifying hot and cold data in main-memory databases. Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013), pages 26–37.
Published
2021-10-04
SANTOS, Alessandra V.; PINHEIRO, Vládia; MONTEIRO, José Maria. Algorithms for Cold Data Identification in In-Memory Databases. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 36. , 2021, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 241-252. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2021.17881.