Near-Optimal Space Perfect Hashing Algorithms

Fabiano C. Botelho; Nivio Ziviani

Fabiano C. Botelho UFMG / CEFET-MG
Nivio Ziviani CEFET-MG

Resumo

A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values. Since no collisions occur, each key can be retrieved from a hash table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the hash table size is exactly the number of keys in S. Differently from other hashing schemes, MPHFs completely avoid the problem of wasted space and wasted time to deal with collisions. The study of perfect hash functions started in the early 80s, when it was proved that the theoretic information lower bound to describe a minimal perfect hash function was approximately 1.44 bits per key. Although the proof indicates that it would be possible to build an algorithm capable of generating optimal functions, no one was able to obtain a practical algorithm that could be used in real applications. Thus, there was a gap between theory and practice. The main result of the thesis filled this gap, lowering the space complexity to represent MPHFs that are useful in practice from O(n log n) to O(n) bits. This allows the use of perfect hashing in applications to which it was not considered a good option. This explicit construction of PHFs is something that the data structures and algorithms community has been looking for since the 1980s.

Referências

Alon, N., Dietzfelbinger, M., Miltersen, P. B., Petrank, E., and Tardos, G. (1999). Linear hash functions. Journal of the ACM, 46(5):667–683.

Botelho, F. C. (2008). Near-Optimal Space Perfect Hashing Algorithms. PhD thesis, Federal University of Minas Gerais. Supervised by Nivio Ziviani, [link].

Botelho, F. C., Galinkin, D., Meira-Jr., W., and Ziviani, N. (2008a). Distributed perfect hashing for very large key sets. In Proceedings of the 3rd International ICST Conference on Scalable Information Systems (InfoScale’08). ACM Press. One non self-citations from scientific articles in Google Scholar.

Botelho, F. C., Kohayakawa, Y., and Ziviani, N. (2005). A practical minimal perfect hashing method. In Proceedings of the 4th International Workshop on Efficient and Experimental Algorithms (WEA’05), pages 488–500. Springer LNCS vol. 3503. Seven non self-citations from scientific articles in Google Scholar.

Botelho, F. C., Lacerda, A., Menezes, G. V., and Ziviani, N. (2009a). Minimal perfect hashing: A competitive method for indexing internal memory. Information Sciences. Submitted.

Botelho, F. C., Langbehn, H. R., Menezes, G. V., and Ziviani, N. (2008b). Indexing internal memory with minimal perfect hash functions. In Proceedings of the 23rd Brazilian Symposium on Database (SBBD’08), pages 16–30.

Botelho, F. C., Pagh, R., and Ziviani, N. (2007). Simple and space-efficient minimal perfect hash functions. In Proceedings of the 10th Workshop on Algorithms and Data Structures (WADs’07), pages 139–150. Springer LNCS vol. 4619. Ten non self-citations from scientific articles in Google Scholar.

Botelho, F. C., Pagh, R., and Ziviani, N. (2009b). A scalable and near-optimal space perfect hashing algorithm. Transactions on Algorithms. Submitted.

Botelho, F. C., Reis, D., and Ziviani, N. (2006). CMPH: C minimal perfect hashing library. Free Software Library. More than three thousands downloads by February 2009.

Botelho, F. C. and Ziviani, N. (2007). External perfect hashing for very large key sets. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07), pages 653–662. ACM Press. Four non self-citations from scientific articles in Google Scholar.

Edelkamp, S. and Sulewski, D. (2008). Flash-efficient ltl model checking with minimal counterexamples. In Proceedings of the 2008 Sixth IEEE International Conference on Software Engineering and Formal Methods (SEFM’08), pages 73–82, Washington, DC, USA. IEEE Computer Society.

Fox, E. A., Chen, Q. F., and Heath, L. S. (1992). A faster algorithm for constructing minimal perfect hash functions. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’92), pages 266–273.

Hagerup, T. and Tholey, T. (2001). Efficient minimal perfect hashing in nearly minimal space. In Proceedings of the 18th Symposium on Theoretical Aspects of Computer Science (STACS’01), pages 317–326. Springer LNCS vol. 2010.

Jenkins, B. (1997). Algorithm alley: Hash functions. Dr. Dobb’s Journal of Software Tools, 22(9).

Majewski, B. S., Wormald, N. C., Havas, G., and Czech, Z. J. (1996). A family of perfect hashing methods. The Computer Journal, 39(6):547–554.

Mehlhorn, K. (1984). Data Structures and Algorithms 1: Sorting and Searching. Springer-Verlag.

Pagh, R. (1999). Hash and displace: Efficient evaluation of minimal perfect hash functions. In Workshop on Algorithms and Data Structures (WADS’99), pages 49–54.

Seltzer, M. (2005). Beyond relational databases. ACM Queue, 3(3).

Ziviani, N. and Botelho, F. C. (2006). Projeto de Algoritmos com Implementações em Java e C++. Thomson Learning.