A framework to create scripts to precompute a near-perfect hashtable of small number of objects that can be stored and used in future.
Why not a CLI tool? Because usually a hashtable needs to be matched against objects. So you would still need a bit of own logic.
The hashtable has the following construction:
- objects are converted into a byte strings
- a tweakable hash function needing a nonce is applied to objects to get 4-byte values. The function should map all the strings into unique 4-byte integers. We currently use
blake2s
because it is in python standard library. - the 4 byte valuea are converted into 2-byte control value
- the 4 byte values are reduced into a single byte value using one of the reducer finctions.
- the single byte values are deoffsetted by subtracting the minimum value across the whole dataset.
- the single byte values are passed through a perfect hash generator (
perfection
) to remap them into a nearly-continious range of indices. - the indices are used to populate an array.
pow
dir contains C++ program brute-forcing nonces (so it is kinda a proof-of-work) in order to minimize the "span" (the distance between maximum and minimum reduced hash) of reduced values (only the nonces mapping keys to distinct values are considered). strings.cpp
should be populated with the byte strings.
After the optimal nonce and reducer have been chosen, it can be used to produce a python source file with the hashtable and the lookup function.