如何设计hash函数

在WeakHashMap和HashMap中使用了不同的哈希函数

WeakHashMap

/**
 * Retrieve object hash code and applies a supplemental hash function to the
 * result hash, which defends against poor quality hash functions.  This is
 * critical because HashMap uses power-of-two length hash tables, that
 * otherwise encounter collisions for hashCodes that do not differ
 * in lower bits.
 */
final int hash(Object k) {
    int h = k.hashCode();

    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

 HashMap

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 */
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

 如何设计哈希

找了很久才发现没有银弹,哈希方法要考虑分散,性能,安全等方面。这篇文章给了很多启发

http://ticki.github.io/blog/designing-a-good-non-cryptographic-hash-function/

尤其是里面的坐标图,说明设计的哈希函数可以用坐标图的方式验证分散性好不好。

如何设计hash函数_第1张图片

如何设计hash函数_第2张图片

 为什么是异或

常见的map中的哈希函数一般都少不了移位操作和异或操作。为什么偏爱异或?

Assuming uniformly random (1-bit) inputs, the AND function output probability distribution is 75% 0and 25% 1. Conversely, OR is 25% 0 and 75% 1.

The XOR function is 50% 0 and 50% 1, therefore it is good for combining uniform probability distributions.

This can be seen by writing out truth tables:

 a | b | a AND b
---+---+--------
 0 | 0 |    0
 0 | 1 |    0
 1 | 0 |    0
 1 | 1 |    1

 a | b | a OR b
---+---+--------
 0 | 0 |    0
 0 | 1 |    1
 1 | 0 |    1
 1 | 1 |    1

 a | b | a XOR b
---+---+--------
 0 | 0 |    0
 0 | 1 |    1
 1 | 0 |    1
 1 | 1 |    0

Exercise: How many logical functions of two 1-bit inputs a and b have this uniform output distribution? Why is XOR the most suitable for the purpose stated in your question?

 

为什么table的长度是2的指

Map的底层实现一般都是数组,为什么要求数组的长度是2的倍数?长度是2的倍数,长度 - 1 可以得到全1的二进制数,

再与hash值进行"与"运算,得到该值在数组中的位置。

/**
 * Returns index for hash code h.
 */
private static int indexFor(int h, int length) {
    return h & (length-1);
}

你可能感兴趣的:(如何设计hash函数)