注释相关
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.
The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
上面是注释的内容,主要是说HashMap有两个影响因素:初始容量和加载因子。初始容量指的是bucket的大小,加载因子是描述装满的程度的。二者的乘积决定了是否扩容,进行rehash,默认下,初始容量为16,加载因子为0.75。
* If no such object exists, the map should be "wrapped" using the
* {@link Collections#synchronizedMap Collections.synchronizedMap}
* method. This is best done at creation time, to prevent accidental
* unsynchronized access to the map:
* Map m = Collections.synchronizedMap(new HashMap(...));
上面的注释提到HashMap是线程不安全的,因此如果是确定需要在多线程情况下使用的话,一开始就需要使用Collections.synchronizedMap()
方法,同步整个类,类似于HashTable。当然也可以使用ConcurrentHashMap。二者的区别可以看https://blog.csdn.net/lanxiangru/article/details/53495854
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
static final int TREEIFY_THRESHOLD = 8;
上面说的是bucket的链表什么时候会变成红黑树——bucket里的结点数大于8.原因可以看下面的这个注释。这里提到理想的 hashcode 使得 hashMap 每个桶的 size 即 k 服从泊松分布,下面给出了 k 的概率。从这些数据中,可以看出理想情况下,每个桶里的链表的 size 一般是不会超过 8 的。所以,我觉得红黑树主要就是用来修正 hashcode 的“不理想”。
* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins. In
* usages with well-distributed user hashCodes, tree bins are
* rarely used. Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million
方法
-
equals方法
public final boolean equals(Object o) { if (o == this) return true; if (o instanceof Map.Entry) { Map.Entry,?> e = (Map.Entry,?>)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; }
-
hash方法
很清楚的看到,这里就是将hashcode右移16位后和自身相与得出hash值。那么这里为什么要这么做呢?作者在注释里提到,由于数组的边界限制,如果不进行右移操作的话,那么hashcode高位的值是用不到的,这样就比较容易引起哈希冲突。
as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.
static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
-
get方法
需要注意的是
(n - 1) & hash
实际上就是取模运算。具体过程就是通过hash找到对应的第一个node,如果是要找的node就直接放回。不是的话判断他的类型,是TreeNode的话就调用对应方法,否则继续往下判断,知道next Node为 null 。public V get(Object key) { Node
e; return (e = getNode(hash(key), key)) == null ? null : e.value; } /** * Implements Map.get and related methods * * @param hash hash for key * @param key the key * @return the node, or null if none */ final Node getNode(int hash, Object key) { Node [] tab; Node first, e; int n; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) { if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; if ((e = first.next) != null) { if (first instanceof TreeNode) return ((TreeNode )first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); } } return null; } -
put方法
这里直接看源码就很清楚了,主要就是从第一个节点开始找,找不到就需要插入一个新的结点,找到的话就更新结点的值。和get一样,当第一个结点不是目标是,来到他的下一个结点时要判断是不是树的结点,从而调用树的方法。还有就是插入新的结点的时候都会判断是不是超过8个了,是的话就将其转换成树的结构。
public V put(K key, V value) { return putVal(hash(key), key, value, false, true); } /** * Implements Map.put and related methods * * @param hash hash for key * @param key the key * @param value the value to put * @param onlyIfAbsent if true, don't change existing value * @param evict if false, the table is in creation mode. * @return previous value, or null if none */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node
[] tab; Node p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); else { Node e; K k; if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) e = p; else if (p instanceof TreeNode) e = ((TreeNode )p).putTreeVal(this, tab, hash, key, value); else { for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } if (e != null) { // existing mapping for key V oldValue = e.value; if (!onlyIfAbsent || oldValue == null) e.value = value; //在hashmap里下面这个方法是空的,就是一个代码桩,主要为了让linkedHashMap继承后实现 afterNodeAccess(e); return oldValue; } } ++modCount; if (++size > threshold) resize(); //和afterNodeAccess()作用一样 afterNodeInsertion(evict); return null; } -
resize方法
这里需要注意的是扩容后哈希值的计算,即源码中
if ((e.hash & oldCap) == 0)
之后的代码,具体的解释可以看这里。其中lo是low的缩写,hi是high的缩写。简单来说,假设oldCap是16,那么hash值1和17都存在索引为1的位置里。而扩容后是32,1还是存在原来的位置,而17就存在索引为17的位置。这是因为1&16=0,而17&16=17>0。可以这么理解,1因为本来就比16小,所以扩容后没有影响。而17本身就比16大,扩容前是因为容量不够才和1共享位置,而现在扩容后的容量大于17,很自然的就应该把他们分开。后面的j + oldCap
其实就是hash&上新的cap。
final Node[] resize() {
Node[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node[] newTab = (Node[])new Node[newCap];
table = newTab;
if (oldTab != null) {
for (int j = 0; j < oldCap; ++j) {
Node e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode)e).split(this, newTab, j, oldCap);
else { // preserve order
Node loHead = null, loTail = null;
Node hiHead = null, hiTail = null;
Node next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
- remove方法也是类似的,这里就不写出来了。另外剩下的链表转红黑树和红黑树转链表有点复杂…这里也不写了。