Jenkins hash

最早,Bob Jenkins提出了多个基于字符串通用Hash算法(搜Jenkins Hash就知道了),而Thomas Wang在Jenkins的基础上,针对固定整数输入做了相应的Hash算法。其64位版本的 Hash算法如下:
uint64_t hash(uint64_t key) {
 key = (~key) + (key << 21); // key = (key << 21) - key - 1;
 key = key ^ (key >> 24);
 key = (key + (key << 3)) + (key << 8); // key * 265
 key = key ^ (key >> 14);
 key = (key + (key << 2)) + (key << 4); // key * 21
 key = key ^ (key >> 28);
 key = key + (key << 31);
 return key;
}

其关键特性是:
1.雪崩性(更改输入参数的任何一位,就将引起输出有一半以上的位发生变化)
2.可逆性

逆Hash函数为:
uint64_t inverse_hash(uint64_t key) {
 uint64_t tmp;

 // Invert key = key + (key << 31)
 tmp = key-(key<<31);
 key = key-(tmp<<31);

 // Invert key = key ^ (key >> 28)
 tmp = key^key>>28;
 key = key^tmp>>28;

 // Invert key *= 21
 key *= 14933078535860113213u;

 // Invert key = key ^ (key >> 14)
 tmp = key^key>>14;
 tmp = key^tmp>>14;
 tmp = key^tmp>>14;
 key = key^tmp>>14;

 // Invert key *= 265
 key *= 15244667743933553977u;

 // Invert key = key ^ (key >> 24)
 tmp = key^key>>24;
 key = key^tmp>>24;

 // Invert key = (~key) + (key << 21)
 tmp = ~key;
 tmp = ~(key-(tmp<<21));
 tmp = ~(key-(tmp<<21));
 key = ~(key-(tmp<<21));

 return key;
}
由上述的算法实现可知,原始的hash算法过程是非常快的,而其逆Hash算法则比较慢一些。

特征:0.

参考:
1.jenkins 32位Hash算法: http://burtleburtle.net/bob/hash/integer.html
2. Geoffrey Irving's blog:  http://naml.us/blog/tag/thomas-wang

你可能感兴趣的:(hadoop)