哈希函数的逆向算法

Inverse of a hash function

I’ve used Thomas Wang’s integer hash functions for years for various purposes. Using techniques invented by Bob Jenkins for general hashing (e.g., hashes of strings), Wang derived several hash specialized for fixed size integer input. His 64-bit version is


uint64_t hash(uint64_t key) {
  key = (~key) + (key << 21); // key = (key << 21) - key - 1;
  key = key ^ (key >> 24);
  key = (key + (key << 3)) + (key << 8); // key * 265
  key = key ^ (key >> 14);
  key = (key + (key << 2)) + (key << 4); // key * 21
  key = key ^ (key >> 28);
  key = key + (key << 31);
  return key;
}

Key properties include avalanche (changing any input bit changes about half of the output bits) and invertibility. Recently I wanted to make explicit use of the inverse, in order to verify that zero would never arise as the hash of a given set of inputs. This property would allow me to initialize the hash table in question (which takes up several gigabytes) with zeros and avoid an explicit occupied bit on each entry. Thus, I needed inverse_hash(0).


Our function is the composition of the functions on each line, so we need to invert each one. Multiplication by 21 and 265 is easy; both numbers are odd, and therefore have multiplicative inverses mod 264. The rest of the lines are invertible because they’re Feistel functions; they break the key into two pieces, leave one piece alone, run the second function through an invertible function that depends on the first. For example, the line key = key ^ (key >> 24) leaves the top 24 bits alone. Once you know the top 24 bits, you can reconstruct the next 24 bits with an xor, and one more round gives the remaining bits. The full inverse is

uint64_t inverse_hash(uint64_t key) {
  uint64_t tmp;

  // Invert key = key + (key << 31)
  tmp = key-(key<<31);
  key = key-(tmp<<31);

  // Invert key = key ^ (key >> 28)
  tmp = key^key>>28;
  key = key^tmp>>28;

  // Invert key *= 21
  key *= 14933078535860113213u;

  // Invert key = key ^ (key >> 14)
  tmp = key^key>>14;
  tmp = key^tmp>>14;
  tmp = key^tmp>>14;
  key = key^tmp>>14;

  // Invert key *= 265
  key *= 15244667743933553977u;

  // Invert key = key ^ (key >> 24)
  tmp = key^key>>24;
  key = key^tmp>>24;

  // Invert key = (~key) + (key << 21)
  tmp = ~key;
  tmp = ~(key-(tmp<<21));
  tmp = ~(key-(tmp<<21));
  key = ~(key-(tmp<<21));

  return key;
}

The cleverness of the original hash function is that each invertible step is also extremely fast. The inverse is slower, but only moderately.


Finally, I did indeed luck out:

inverse_hash(0) = 0x7ffffbffffdfffff

This isn’t a valid pentago board in my packed representation, so zero initialization works. This turned out to be obvious in hindsight: all but the first step in the hash function leaves zero alone, and the last step is complement on the lower 21 bits, which is enough to know that inverse_hash(0) can’t be a valid pentago board. It’s still cool to have the full inverse, though.


I tried to email Thomas Wang in case he hadn’t had the occasion to write down the inverse explicitly, but unfortunately his HP email bounced. Impermanent email addresses make me sad.

你可能感兴趣的:(哈希函数的逆向算法)