ConcurrentHashmap源码分析

此文部分内容来自 https://javadoop.com/post/hashmap

Hashmap

多线程死循环
主要是多线程同时put时，如果同时触发了rehash操作，会导致HashMap中的链表中出现循环节点，进而使得后面get的时候，会死循环。

ConcurrentHashmap JDK7

ConcurrentHashMap允许多个修改(写)操作并发进行，其关键在于使用了锁分段技术，它使用了不同的锁来控制对哈希表的不同部分进行的修改(写)，而 ConcurrentHashMap 内部使用段(Segment)来表示这些不同的部分。

segmentMask; // 用于定位段，大小等于segments数组的大小减 1，是不可变的
segmentShift; // 用于定位段，大小等于32(hash值的位数)减去对segments的大小取以2为底的对数值，是不可变的

1 Segment
Segment 类继承于 ReentrantLock 类，从而使得 Segment 对象能充当锁的角色。每个 Segment 对象用来守护它的成员对象 table 中包含的若干个桶。table 是一个由 HashEntry 对象组成的链表数组，table 数组的每一个数组成员就是一个桶。
segment中的字段

volatile int count 是用于计数每个segment管理的所有 table 数组包含的 HashEntry 对象的个数，count 不是concurrenthashmap的全局变量的原因是当需要更新count的时候不需要锁住整个CHM。

volatile HashEntry[] table 链表数组

2 HashEntry
==CHM中不需要对读加锁的原因== 其一是HashEntry内 key hash next这些字段都时final的，所以hashentry对象基本是不可变的另一个原因则是 value域也被volatile修饰所以可以确保读线程读到最后一次更新后的值。

next是final意味着不能从hash链的中间和尾部添加删除节点，所以所有节点的修改只能从头部开始，对于put操作也是头插法，而remove()删除节点的时候也是需要重新new这个节点前面的所有节点并删除原来的节点再把前部的最后一个节点的next指向删除节点的后一个节点。

3 concurrentHashMap的构造函数
主要有三个参数初始容量、负载因子和并发级别，默认分别为16，0.15,16 依据这三个来初始化segments数组(长度为2的n次幂)，段偏移量 segmentShift(32-n)，段掩码segmentMask(2^n-1 二进制为n个1)，每个segment。

4 concurrentHashMap的数据结构

image

5 定位操作

final Segment segmentFor(int hash) {
    return segments[(hash >>> segmentShift) & segmentMask];
}

让key的高n位和段掩码做与操作。就可以具体定位到桶位。因为 segmentshift是 32-n

并发存取

线程对映射表做读操作的时候不需要加锁，而对容器做结构型的修改操作(put remove)则需要加锁

put操作

==ConcurrentHashMap不同于HashMap，它既不允许key值为null，也不允许value值为null。==
当我们向ConcurrentHashMap中put一个Key/Value对时，首先会获得Key的哈希值并对其再次哈希，然后根据最终的hash值定位到这条记录所应该插入的段。(见上面的第5点)

JDK7 put源码

//主流程
public V put(K key, V value) {
    Segment s;
    if (value == null)
        throw new NullPointerException();
    // 1. 计算 key 的 hash 值
    int hash = hash(key);
    // 2. 根据 hash 值找到 Segment 数组中的位置 j
    //    hash 是 32 位，无符号右移 segmentShift(28) 位，剩下低 4 位，
    //    然后和 segmentMask(15) 做一次与操作，也就是说 j 是 hash 值的最后 4 位，也就是槽的数组下标
    int j = (hash >>> segmentShift) & segmentMask;
    // 刚刚说了，初始化的时候初始化了 segment[0]，但是其他位置还是 null，
    // ensureSegment(j) 对 segment[j] 进行初始化
    if ((s = (Segment)UNSAFE.getObject          // nonvolatile; recheck
         (segments, (j << SSHIFT) + SBASE)) == null) //  in ensureSegment
        s = ensureSegment(j);
    // 3. 插入新值到 槽 s 中
    return s.put(key, hash, value, false);
}

final V put(K key, int hash, V value, boolean onlyIfAbsent) {
    // 在往该 segment 写入前，需要先获取该 segment 的独占锁
    HashEntry node = tryLock() ? null :
        scanAndLockForPut(key, hash, value);
    V oldValue;
    try {
        // 这个是 segment 内部的数组
        HashEntry[] tab = table;
        // 再利用 hash 值，求应该放置的数组下标
        int index = (tab.length - 1) & hash;
        // first 是数组该位置处的链表的表头
        HashEntry first = entryAt(tab, index);
 
        // 下面这串 for 循环虽然很长，不过也很好理解，想想该位置没有任何元素和已经存在一个链表这两种情况
        for (HashEntry e = first;;) {
            if (e != null) {
                K k;
                if ((k = e.key) == key ||
                    (e.hash == hash && key.equals(k))) {
                    oldValue = e.value;
                    if (!onlyIfAbsent) {
                        // 覆盖旧值
                        e.value = value;
                        ++modCount;
                    }
                    break;
                }
                // 继续顺着链表走
                e = e.next;
            }
            else {
                // node 到底是不是 null，这个要看获取锁的过程，不过和这里都没有关系。
                // 如果不为 null，那就直接将它设置为链表表头；如果是null，初始化并设置为链表表头。
                if (node != null)
                    node.setNext(first);
                else
                    node = new HashEntry(hash, key, value, first);
 
                int c = count + 1;
                // 如果超过了该 segment 的阈值，这个 segment 需要扩容
                if (c > threshold && tab.length < MAXIMUM_CAPACITY)
                    rehash(node); // 扩容后面也会具体分析
                else
                    // 没有达到阈值，将 node 放到数组 tab 的 index 位置，
                    // 其实就是将新的节点设置成原链表的表头
                    setEntryAt(tab, index, node);
                ++modCount;
                count = c;
                oldValue = null;
                break;
            }
        }
    } finally {
        // 解锁
        unlock();
    }
    return oldValue;

获取写入锁

node = tryLock() ? null : scanAndLockForPut(key, hash, value)根据tryLock()快速获取这个segment的独占锁，如果获取失败就进入到 scanAndLockForPut(key,hash,value)来获取锁该方法的源码如下

private HashEntry scanAndLockForPut(K key, int hash, V value) {
    HashEntry first = entryForHash(this, hash);
    HashEntry e = first;
    HashEntry node = null;
    int retries = -1; // negative while locating node
 
    // 循环获取锁
    while (!tryLock()) {
        HashEntry f; // to recheck first below
        if (retries < 0) {
            if (e == null) {
                if (node == null) // speculatively create node
                    // 进到这里说明数组该位置的链表是空的，没有任何元素
                    // 当然，进到这里的另一个原因是 tryLock() 失败，所以该槽存在并发，不一定是该位置
                    node = new HashEntry(hash, key, value, null);
                retries = 0;
            }
            else if (key.equals(e.key))
                retries = 0;
            else
                // 顺着链表往下走
                e = e.next;
        }
        // 重试次数如果超过 MAX_SCAN_RETRIES（单核1多核64），那么不抢了，进入到阻塞队列等待锁
        //    lock() 是阻塞方法，直到获取锁后返回
        else if (++retries > MAX_SCAN_RETRIES) {
            lock();
            break;
        }
        else if ((retries & 1) == 0 &&
                 // 这个时候是有大问题了，那就是有新的元素进到了链表，成为了新的表头
                 //     所以这边的策略是，相当于重新进入 scanAndLockForPut 方法
                 (f = entryForHash(this, hash)) != first) {
            e = first = f; // re-traverse if entry changed
            retries = -1;
        }
    }
    return node;
}

这个方法有两个出口，一个是 tryLock() 成功了，循环终止，另一个就是重试次数超过了 MAX_SCAN_RETRIES，进到 lock() 方法，此方法会阻塞等待，直到成功拿到独占锁。
实质上这个方法的功能是获取segment的独占锁，然后实例化node

rehash
segment数组不能扩容，扩容是针对于HashEntry[]


/*
     * Reclassify nodes in each list to new table.  Because we
     * are using power-of-two expansion, the elements from
     * each bin must either stay at same index, or move with a
     * power of two offset. We eliminate unnecessary node
     * creation by catching cases where old nodes can be
     * reused because their next fields won't change.
     * Statistically, at the default threshold, only about
     * one-sixth of them need cloning when a table
     * doubles. The nodes they replace will be garbage
     * collectable as soon as they are no longer referenced by
     * any reader thread that may be in the midst of
     * concurrently traversing table. Entry accesses use plain
     * array indexing because they are followed by volatile
     * table write.
     */
// 方法参数上的 node 是这次扩容后，需要添加到新的数组中的数据。
private void rehash(HashEntry node) {
    HashEntry[] oldTable = table;
    int oldCapacity = oldTable.length;
    // 2 倍
    int newCapacity = oldCapacity << 1;
    threshold = (int)(newCapacity * loadFactor);
    // 创建新数组
    HashEntry[] newTable =
        (HashEntry[]) new HashEntry[newCapacity];
    // 新的掩码，如从 16 扩容到 32，那么 sizeMask 为 31，对应二进制 ‘000...00011111’
    int sizeMask = newCapacity - 1;
 
    // 遍历原数组，老套路，将原数组位置 i 处的链表拆分到 新数组位置 i 和 i+oldCap 两个位置
    for (int i = 0; i < oldCapacity ; i++) {
        // e 是链表的第一个元素
        HashEntry e = oldTable[i];
        if (e != null) {
            HashEntry next = e.next;
            // 计算应该放置在新数组中的位置，
            // 假设原数组长度为 16，e 在 oldTable[3] 处，那么 idx 只可能是 3 或者是 3 + 16 = 19
            int idx = e.hash & sizeMask;
            if (next == null)   // 该位置处只有一个元素，那比较好办
                newTable[idx] = e;
            else { // Reuse consecutive sequence at same slot
                // e 是链表表头
                HashEntry lastRun = e;
                // idx 是当前链表的头结点 e 的新位置
                int lastIdx = idx;
 
                // 下面这个 for 循环会找到一个 lastRun 节点，这个节点之后的所有元素是将要放到一起的 
                // 寻找k值相同的子链，该子链尾节点与父链的尾节点必须是同一个
                for (HashEntry last = next;
                     last != null;
                     last = last.next) {
                    int k = last.hash & sizeMask;
                    if (k != lastIdx) {
                        lastIdx = k;
                        lastRun = last;
                    }
                }
                // 将 lastRun 及其之后的所有节点组成的这个链表放到 lastIdx 这个位置
                newTable[lastIdx] = lastRun;
                // 下面的操作是处理 lastRun 之前的节点，
                //    这些节点可能分配在另一个链表中，也可能分配到上面的那个链表中
                for (HashEntry p = e; p != lastRun; p = p.next) {
                    V v = p.value;
                    int h = p.hash;
                    int k = h & sizeMask;
                    HashEntry n = newTable[k];
                    newTable[k] = new HashEntry(h, p.key, v, n);
                }
            }
        }
    }
    // 将新来的 node 放到新数组中刚刚的 两个链表之一 的 头部
    int nodeIndex = node.hash & sizeMask; // add the new node
    node.setNext(newTable[nodeIndex]);
    newTable[nodeIndex] = node;
    table = newTable;
}

扩容后键值对的新位置要么和原位置一样，要么就是原位+旧数组的长度
上面的代码先找出扩容前后需要转移的点，先执行转移，再把该链条上剩下的节点转移，这么写是起到了复用的效果，官方注释中也说了 at the default threshold, only about one-sixth of them need cloning when a table doubles 在默认阈值下只有大约1/6的节点需要被克隆。

get比较简单不需要加锁步骤是

计算hash值找到segment数组的位置
segment中根据hash找到hashEntry数组中具体的位置对应的链表
遍历链表找到具体的KV

源码如下

public V get(Object key) {
    Segment s; // manually integrate access methods to reduce overhead
    HashEntry[] tab;
    // 1. hash 值
    int h = hash(key);
    long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
    // 2. 根据 hash 找到对应的 segment
    if ((s = (Segment)UNSAFE.getObjectVolatile(segments, u)) != null &&
        (tab = s.table) != null) {
        // 3. 找到segment 内部数组相应位置的链表，遍历
        for (HashEntry e = (HashEntry) UNSAFE.getObjectVolatile
                 (tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE);
             e != null; e = e.next) {
            K k;
            if ((k = e.key) == key || (e.hash == h && key.equals(k)))
                return e.value;
        }
    }
    return null;
}

Q:如果在执行get的时候在同一个segment中发生了put或remove操作呢？
1.put操作的线程安全性的体现

使用CAS在初始化segment数组
链表中添加节点使用头插法，如果此时get遍历链表已经到了中间就不会对get操作有影响，如果put happens-before get 需要get到刚刚插入到的头节点，就依赖setEntryAt方法中使用的 UNSAFE.putOrderedObject.
扩容，创建了新的数组如果这个时候有get,get先行就在旧table上查询，put先行，put的操作可见性保证就是table使用了volatile关键字transient volatile HashEntry[] table;

remove操作的线程安全性
remove操作会更改链表的结构，如果remove的是get已经遍历过的节点，则不会发生问题。
如果 remove 先破坏了一个节点，分两种情况考虑。 1、如果此节点是头结点，那么需要将头结点的 next 设置为数组该位置的元素，table 虽然使用了 volatile 修饰，但是 volatile 并不能提供数组内部操作的可见性保证，所以源码中使用了 UNSAFE 来操作数组，请看方法 setEntryAt。2、如果要删除的节点不是头结点，它会将要删除节点的后继节点接到前驱节点中，这里的并发保证就是 next 属性是 volatile 的。

ConcurrentHashmap JDK8

类似于8中的HashMap的改进,ConcurrentHashMap也引入了红黑树。