Today, core Java APIs lack high quality hash functions, and 3rd party implementations provide sub-optimal performance. As non-cryptographic hash function are important building blocks of software, this is a major bummer for developers.
Generally, the selection of available hash functions is plenty, and in the last decade, many new hash functions emerged with very good hashing properties. Surprisingly, the core Java API just still offers Adler32 and CRC32, which were designed as checksums many years ago. Of course, there are many hash implementations available outside of the core Java API. However, unlike in the C world, there are just a few comparisons available. The hashing algorithms have very different performance characteristics, when they run inside of a Java VM. Today’s fastest hashes are highly optimized against CPU hardware, and can perform at several GB/s. The VM layer imposed by Java can get in the way here. Also implementation details matter greatly. For example, Murmur3A can outperform CRC32 by a magnitude when implemented in C. Nevertheless, the same Murmur3A implemented in Java can be several times slower than Java’s CRC32 class. These were the results when evaluating Murmur3A from Guava, which is one of the most popular and respected Java libraries available.
Adler32 and CRC32 provide 32 bit hashes. This is unfortunate because 64 bit hashes are a perfect match for today’s 64 bit CPUs and provide much(!) less collisions than their 32 bit counterparts. In contrast to cryptographic hash functions, they are much faster to compute and usually produce smaller hashes that are easy to handle.
So, how do we address these issues? With greenrobot-essentials hash functions, which provide:
- Progressive hash creation
- Convenient updateInt/Long/Array/… methods to update the hash
- Compatibility with java.util.zip.Checksum interface
- Highly efficient Murmur3A (32 bit) and Murmur3F (128 bit) implementations
- FNV-1a implementations for 32 and 64 bit hashes
- Super-fast custom FNVJ hash functions for 32 and 64 bit hashes
- Comprehensive test suite
- Easy-to-use test classes to measure performance and quality of hash functions
For full feature review, an overview of Java hash functions and performance benchmarks, please go here.