秀强

浅析JAVA集合框架之HashMap

文章目录

注：

本文基于jdk1.8
Capacity和Size的区别

数据结构

什么是hash冲突
什么是链表

单向链表
Node实体

什么是红黑树

TreeNode实体

源码阅读

继承与实现接口
类的属性
重要方法解析

构造方法

tableSizeFor()

hash算法

异或运算 ^
逻辑右移 >>>
取模运算 %
计算桶的位置

put
get
resize

参考博客：

注：

本文基于jdk1.8

XiuQiang:~ XiuQiang$ java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

Capacity和Size的区别

Capacity:容量，哈希桶数组的长度，即table.length。
Size：哈希桶数组内节点的总数。

数据结构

HashMap底层的数据结构是数组+链表+红黑树（jdk1.7是数组+链表）。
在下文中，我们将哈希桶数组里可以存储元素的位置称为桶（bucket）。同时，将桶中链表或者红黑树的节点称之为bin。（bin是依据是源码中的注释，不是我一拍脑袋随便想出来的哈）

图来自郑加威的博客：传送门
当桶中的结构为链表时，HashMap采用Node数组来存储key-value对，每一个键值对组成了一个Node实体，即bin。Node类具有Next指针，可以连接下一个Node实体，依此来解决hash冲突的问题。

什么是hash冲突

HashMap是按照Key的hash值来计算bin在HashMap中存储的位置的，即桶的位置。如果hash值相同，而key内容不相等，它们就会被放入同一个桶内。此时就要用链表来解决这种hash冲突。当桶内链表长度大于8时，链表会转化为红黑树，这种转换是有条件的，若桶的数量太少，则会直接进行扩容。欲知后事如何，请听下回分解，哈哈哈哈。

什么是链表

链表是由一系列非连续的节点组成的存储结构，简单分下类的话，链表又分为单向链表和双向链表，而单向/双向链表又可以分为循环链表和非循环链表。因为HashMap中的链表就是单向链表，下面简单就单向链表进行图解说明。其他几种链表感兴趣的同学可以自行查阅资料。

单向链表

单向链表就是通过每个结点的指针指向下一个结点从而链接起来的结构，最后一个节点的next指向null。

Node实体

 /**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */
    static class Node implements Map.Entry {
        final int hash;
        final K key;
        V value;
        Node next;

        Node(int hash, K key, V value, Node next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry e = (Map.Entry)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

什么是红黑树

红黑树是一种自平衡的二叉查找树，在原有的二叉查找树基础上增加了如下几个要求：

Every node is either red or black.（节点是红色或黑色）
The root is black.（根节点是黑色）
Every leaf (NIL) is black.（每个叶子节点都是黑色的空节点（NIL节点））
If a node is red, then both its children are black.（每个红色节点的两个子节点都是黑色）
For each node, all simple paths from the node to descendant leaves contain the same number of black nodes.（从任意节点到其每个叶子的所有路径都包含相同树木的黑色节点）

感兴趣的同学可以看下下面这篇文章，此处就不再赘述了。
漫画算法：什么是红黑树？

TreeNode实体

    /**
     * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn
     * extends Node) so can be used as extension of either regular or
     * linked node.
     */
    static final class TreeNode extends LinkedHashMap.Entry {
        TreeNode parent;  // red-black tree links
        TreeNode left;
        TreeNode right;
        TreeNode prev;    // needed to unlink next upon deletion
        boolean red;
        TreeNode(int hash, K key, V val, Node next) {
            super(hash, key, val, next);
        }
        //此处省略一万字...

源码阅读

继承与实现接口

public class HashMap extends AbstractMap
    implements Map, Cloneable, Serializable

类的属性

// 默认初始容量
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; 
//最大容量
static final int MAXIMUM_CAPACITY = 1 << 30; 
//默认负载因子0.75
static final float DEFAULT_LOAD_FACTOR = 0.75f; 
//当链表长度大于8时转化为红黑树
static final int TREEIFY_THRESHOLD = 8; 
//当红黑树的长度小于6时转为链表
static final int UNTREEIFY_THRESHOLD = 6; 
//当链表要被树化时，最少桶数，否则直接扩容resize()
static final int MIN_TREEIFY_CAPACITY = 64; 
// 第一次使用时，才进行初始化操作
transient Node[] table;
//阈（yu）值，由负载因子和容量决定：CAPACITY * LOAD_FACTOR，默认为16*0.75=12
//当哈希桶数组内的节点数大于该值时，则扩容
int threshold;

思考：
如果确定只装载100个元素，new HashMap(?)多少是最佳的（加载因子默认0.75），为什么？

重要方法解析

构造方法

//无参构造
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
//指定初始容量
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
//指定集合转化为Map
public HashMap(Map m) 
//指定初始容量和加载因子
public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);                                       
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        //
        this.threshold = tableSizeFor(initialCapacity);
}
//返回一个比给定值cap大且最接近的二次幂，比如cap=100，则返回128。
static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

tableSizeFor()

这里简单介绍下tableSizeFor()这个方法。这个算法非常巧妙，通过五次>>>和|操作，将最高位的1后面的位数都变为1，最后返还n+1。比如:
cap=100 转化为二进制以后就是01100100(前面的0都省略了）
int n = cap - 1 = 01100011
n |= n >>> 1;
    0110 0011
|   0011 0001
_________
    0111 0011
同理n |= n >>> 2;n |= n >>> 3;……
最后可得 n = 01111111 = 127
n + 1 = 128 = $2^7$

通过上面几个构造方法可以发现，HashMap的构造方法中没有对table进行初始化操作。table的初始化操作是在putVal（）方法进行的。

hash算法

我们在前面介绍过hash冲突。好的hash算法可以使元素分布地更加均匀，从而减小hash冲突。在介绍hash算法之前，先跟大家一起复习下异或运算（^）逻辑右移(>>>)和取模运算。

异或运算 ^

规则：不同为1
举个栗子：
3^4=7:
     0011
     0100
^ ______
     0111

逻辑右移 >>>

规则：低位溢出，高位补0
举个栗子：
10 >>> 1 = 5:
1010 >>> 1 = 0101

取模运算 %

规则：取模运算就是我们小学的时候的求余数（这里只讨论都为正整数的情况）
举个栗子：
5%3=2
5除以3等于1余2

计算桶的位置

准备工作都做好了，现在来一起看下HashMap中是如何通过hash算法减小冲突，并确定桶的位置，冲鸭！

public native int hashCode();
//hash算法求得key的hash值
static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
//通过hash值和(tab.length-1)做取模运算来确定桶的位置
int n = tab.length;
int index = (n - 1) & hash;

此处我们将该过程拆分为两步：

(h = key.hashCode()) ^ (h >>> 16)
(n - 1) & hash

很多同学看到这里可能会有个疑问️，为什么要将key的hashcode右移16位以后再于原hashcode做异或运算的结果再和数组的长度-1做与运算&，而不是取模运算%。
讲道理，在得到hash值和table.length后，直接通过hash%table.length就可以确定桶的位置，就像下面这样。

int length = table.length();
int hash = key.hashCode()；
int index = hash%length;

先说要什么要用&来代替%操作:

计算机中 & 的效率比 % 高很多。
HashMap中桶的数量必为 $2^n$ 。（构造方法解析中可得）
当 lenth = $2^n$ 时，X % length = X & (length - 1) （这个结论有兴趣的同学可以自己推导下，此处就不再赘述了，别问我为什么，因为我也不会）

根据这三个结论，我们很轻易就可以推导出为什么要用与来代替求模。
然后说为什么要右移 (h = key.hashCode()) ^ (h >>> 16):
简单来说就是将hash值的高16位参与到求桶位置的运算中去。这样说可能不是很好理解，举个栗子,先不进行右移操作：
假设int n = table.length= $2^4$ =16
h:1111 1111 1111 1111 1111 0000 1110 1010
n-1:0000 0000 0000 0000 0000 0000 0000 1111

h&(n-1):0000 0000 0000 0000 0000 0000 0000 1010
$\Downarrow$
1010 = 10
现在我们将h缩小到10,再进行一次运算：
h:0000 0000 0000 0000 0000 0000 0000 1010
n-1:0000 0000 0000 0000 0000 0000 0000 1111

h&(n-1):0000 0000 0000 0000 0000 0000 0000 1010

$\Downarrow$
1010 = 10

哈哈哈哈，有没有发现！只要是低四位相同的hash值，最后算出来的桶都是10。
现在我们加上右移操作：
int hash = h ^ (h >>> 16);
h:1111 1111 1111 1111 1111 0000 1110 1010
h >>> 16:0000 0000 0000 0000 1111 1111 1111 1111

hash:1111 1111 1111 1111 0000 1111 0001 0101
n-1:0000 0000 0000 0000 0000 0000 0000 1111

hash & (n-1):0000 0000 0000 0000 0000 0000 0000 0101
$\Downarrow$
0101 = 5

综上，当数组的长度比较小时，也能使高16位参与到hash值的计算中，同时不会有太大的开销。

put

图来自夜香的博客：传送门

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node[] tab; Node p; int n, i;
        //进行hash表的初始化操作
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        //i = (n - 1) & hash 计算桶的位置
        //将该桶内的头节点赋值给p
        if ((p = tab[i = (n - 1) & hash]) == null) 
            //桶内还没有节点，新增一个
            tab[i] = newNode(hash, key, value, null);
        else {
            //桶中已经有节点了
            Node e; K k;
            //判断p的key和hash值是否和传参中的key值和hash值相同
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                //若相同，把p赋给e
                e = p;
            //判断p节点是否是红黑树节点
            else if (p instanceof TreeNode)
                //Tree version of putVal
                //调用树版本的putVal
                e = ((TreeNode)p).putTreeVal(this, tab, hash, key, value);
            else {//桶内是链表
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        //新增一个节点插入链表尾部
                        p.next = newNode(hash, key, value, null);
                        //判断节点数量
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            //桶内节点数量大于8个，将链表转化为红黑树
                            treeifyBin(tab, hash);
                        break;
                    }
                    //e节点的hash值和key值与传参中的相等, 则e即为目标节点,跳出循环
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
            //替换e的Value，并返回旧value
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        //扩容
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
}

get

思路与put方法大致相同。

public V get(Object key) {
        Node e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    /**
     * Implements Map.get and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @return the node, or null if none
     */
    final Node getNode(int hash, Object key) {
        Node[] tab; Node first, e; int n; K k;
        //判断哈希桶数组内是否有值
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            //判断桶内头节点是否是要查找的节点
            if (first.hash == hash &&                 // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                //若是，则直接返回头节点
                return first;
            if ((e = first.next) != null) {
                //判断桶内结构是否是红黑树
                if (first instanceof TreeNode)
                	//若是红黑树，则调用红黑树的方法
                    return ((TreeNode)first).getTreeNode(hash, key);
                do {
                    //遍历链表，查找目标节点
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        //找不到节点，返回null
        return null;
    }

resize