ViViD Cuckoo Hash: Fast Cuckoo Table Building in SIMD
Resumo
Hash Tables play a lead role in modern databases systems during the execution of joins, grouping, indexing, removal of duplicates, and accelerating point queries. In this paper, we focus on Cuckoo Hash, a technique to deal with collisions guaranteeing that data is retrieved with at most two memory access in the worst case. However, building the Cuckoo Table with the current scalar methods is inefficient when treating the eviction of the colliding keys. We propose a Vertically Vectorized data-dependent method to build Cuckoo Tables - ViViD Cuckoo Hash. Our method exploits data parallelism with AVX-512 SIMD instructions and transforms control dependencies into data dependencies to make the build process faster with an overall reduction in response time by 90% compared to the scalar Cuckoo Hash.
Referências
Cebrián, J. M., Natvig, L., and Meyer, J. C. (2014). Performance and energy impact of parallelization and vectorization techniques in modern microprocessors. Computing, 96(12).
Celis, P., Larson, P.-A., and Munro, J. I. (1985). Robin hood hashing. In SFCS.
Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2009). Introduction to algorithms. MIT press.
Council, T. P. P. (2008). Tpc-h benchmark specification.
Estébanez, C., Saez, Y., Recio, G., and Isasi, P. (2014). Performance of the most common non-cryptographic hash functions. Software: Practice and Experience, 44(6).
Fowler, G., Noll, L. C., Vo, K.-P., Eastlake, D., and Hansen, T. (2011). The fnv noncryptographic hash algorithm. Ietf-draft.
Herlihy, M., Shavit, N., and Tzafrir, M. (2008). Hopscotch hashing. In DISC.
Intel® (2011). Intel® 64 and ia-32 architectures software developer’s manual. Intel®.
Intel® (2019). Intel® 64 and IA-32 Architectures Optimization Reference Manual. Intel®.
Li, X., Andersen, D. G., Kaminsky, M., and Freedman, M. J. (2014). Algorithmic improvements for fast concurrent cuckoo hashing. In ACM EuroSys, page 27.
Nguyen, N. and Tsigas, P. (2014). Lock-free cuckoo hashing. In ICDCS, pages 627–636.
Pagh, R. and Rodler, F. F. (2004). Cuckoo hashing. Journal of Algorithms, 51(2):122–144.
Polychroniou, O., Raghavan, A., and Ross, K. A. (2015). Rethinking simd vectorization for in-memory databases. In ACM SIGMOD, pages 1493–1508.
Polychroniou, O. and Ross, K. A. (2014). Vectorized bloom filters for advanced simd processors. In ACM Damon, page 6.
Ramakrishan, R. and Gehrke, J. (2003). Database Management Systems. McGraw-Hill, 8th edition.
Reinders, J. (2017). Intel® avx-512 instructions.
Richter, S., Alvarez, V., and Dittrich, J. (2015). A seven-dimensional analysis of hashing methods and its implications on query processing. PVLDB, 9(3).
Ross, K. A. (2007). Efficient hash probes on modern processors. In IEEE ICDE, pages 1297–1301.
Silberschartz, A., Korth, H. F., and Surdashan, S. (2006). Database System Concepts. McGraw-Hill, 5th edition.
Treibig, J., Hager, G., and Wellein, G. (2010). Likwid: A lightweight performanceoriented tool suite for x86 multicore environments. In IEEE ICPPW.
Zukowski, M., Héman, S., and Boncz, P. (2006). Architecture-conscious hashing. In ACM Damon, page 6.