Hashing is used by developers to quickly identify a specific, unique object from a group of similar objects. For example, your driver license number is a hash, which can be used to pull your driver's record when you're pulled over for going a wee bit over the speed limit. In computing, where there are say tens of thousands of John Smiths on Facebook, anything you can do to help speed up finding the John Smith who's your buddy is a good thing. Now, Facebook has open-sourced F14, a 14-way probing hash table within Folly, its open-source C++ library.
People have been working on perfecting hashing since computing's early days. The result has been an almost endless number of hashing methods and tables. Facebook has faced this problem as well. Within its offices, Facebook developers have created numerous hash table implementations each with its own strengths and weaknesses. With F14, Facebook developers Nathan Bronson and Xiao Shi claim, "The F14 hash tables outdo our previous specialized implementations while avoiding their pathologies. F14 is a good default -- usually a great choice and never a bad one, regardless of the use case."
That's not easy. Among the issues developers using hash tables face are:
- Do you keep long-lived references or pointers to the entries?
- Do you care more about CPU or memory?
- How big are your keys?
- How big are your tables?
- What is the operation mix between insertions, searches, and iteration?
- Are the keys strings?
- How often do you erase?
Facebook nevertheless states:
Clearly, it's hard to make the best choice when there are so many factors to consider. With F14, we have condensed this list to one simple choice: If you don't keep long-lived references to entries, start with folly::F14FastMap/Set. Otherwise, start with folly::F14NodeMap/Set.
The F14 resolves collisions (multiple keys that map to the same array index) by double hashing. Up to 14 keys are stored in a chunk at a single hash table position. High-speed vector instruction sets -- x86_64 SSE2 and aarch64 NEON -- are used to filter within a chunk. By filtering on up to 14 keys at a time, the hash table can be operated at a high maximum load factor while still keeping probe chains very short.
This new code also reduces memory waste by allowing more of a program's data to fit in cache. It does this for multiple use cases because it supports multiple memory layouts for different scenarios. This speeds up both the hash table and its surrounding code.
This isn't just a theory. Facebook uses F14 ever day because it works well. F14 makes a good hashing choice because it delivers CPU and RAM efficiency that is robust across many use cases.
Can F14 bring your hash-dependant programs faster processing with far fewer collisions? There's only one way to find out and that's to try it for yourself. The code is out there. Good luck.