Java并发-JUC（下）

并发容器

JDK提供的并发容器大部分在java。util.concurrent包中。比较常用的有:

ConcurrentHashMap：线程安全版HashMap。
ConcurrentLinkedQueue：线程安全版LinkedList。
ConcurrentSkipListMap：线程安全版跳表Map。
CopyOnWriteArrayList：线程安全版List，但是不是通过锁实现。在读多写少的场合性能非常好。
LinkedBlockQueue：线程安全的阻塞队列。
PriorityBlockingQueue：支持优先级的无界阻塞队列。

ConcurrentHashMap

ConcurrentHashMap 的数据组织和 HashMap 基本相同。通过一个数组来实现 Hash 桶，当没发生 Hash 冲突时，每个 Hash 桶内都保存一个 Key-Value Entry（Node 对象）。对桶内数据的修改都是通过 CAS 操作进行的，因为数组中的元素没法声明为 volatile, 所以从哈希表中读取数据时，使用到了 UNSAFE 的 getObjectVolatile 函数。

/**
 * The bin count threshold for using a tree rather than list for a * bin.  Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2, and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon * shrinkage. */
static final int TREEIFY_THRESHOLD = 8;

/**
 * The array of bins. Lazily initialized upon first insertion. * Size is always a power of two. Accessed directly by iterators. */
transient volatile Node[] table;

SuppressWarnings("unchecked")
static final  Node tabAt(Node[] tab, int i) {
    return (Node)U.getObjectVolatile(tab, ((long)i << ASHIFT) + ABASE);
}

static final  boolean casTabAt(Node[] tab, int i,
                                    Node c, Node v) {
    return U.compareAndSwapObject(tab, ((long)i << ASHIFT) + ABASE, c, v);
}

ConcurrentHashMap 只是用到了对象 hash 码的正数部分，因为它把一些负数的 Hash 码用来描述状态了。比如用 -1 表达当前节点正在迁移，-2 表示当前节点时一个红黑树的根。-3 表示当前节点是一个保留节点。

/*
 * Encodings for Node hash fields. See above for explanation. */
static final int MOVED     = -1; // hash for forwarding nodes static final int TREEBIN   = -2; // hash for roots of trees static final int RESERVED  = -3; // hash for transient reservations static final int HASH_BITS = 0x7fffffff; // usable bits of normal node hash 
/**
 * Spreads (XORs) higher bits of hash to lower and also forces top * bit to 0. Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.)  So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds. */
static final int spread(int h) {
    return (h ^ (h >>> 16)) & HASH_BITS;
}

当发生 Hash 冲突时，先通过链表来保存 Hash 相同的所有 Key-Value Entry（Node 对象）。从下面 Node 的实现中，我们可以看到它实际上就是一个链表的实现（包含next指针）。

static class Node implements Map.Entry {
    final int hash;
    final K key;
    volatile V val;
    volatile Node next;
    //...     /**
 * Virtualized support for map.get(); overridden in subclasses. */
    Node find(int h, Object k) {
        Node e = this;
        if (k != null) {
            do {
                K ek;
                if (e.hash == h &&
                    ((ek = e.key) == k || (ek != null && k.equals(ek))))
                    return e;
            } while ((e = e.next) != null);
        }
        return null;
    }
}

当链表的数量大于 TREEIFY_THRESHOLD（8）时，会用红黑树的 Node 代替链表来保存 Key-Value Entry。红黑树是一个自平衡的二叉树，能以 LogN 的时间复杂度修改和查找数据。

/**
 * Nodes for use in TreeBins */
static final class TreeNode extends Node {
    TreeNode parent;  // red-black tree links     TreeNode left;
    TreeNode right;
    TreeNode prev;    // needed to unlink next upon deletion     boolean red;

    Node find(int h, Object k) {
        return findTreeNode(h, k, null);
    }

    /**
 * Returns the TreeNode (or null if not found) for the given key * starting at given root. */
    final TreeNode findTreeNode(int h, Object k, Class kc) {
        if (k != null) {
            TreeNode p = this;
            do  {
                int ph, dir; K pk; TreeNode q;
                TreeNode pl = p.left, pr = p.right;
                if ((ph = p.hash) > h)
                    p = pl;
                else if (ph < h)
                    p = pr;
                else if ((pk = p.key) == k || (pk != null && k.equals(pk)))
                    return p;
                else if (pl == null)
                    p = pr;
                else if (pr == null)
                    p = pl;
                else if ((kc != null ||
                          (kc = comparableClassFor(k)) != null) &&
                         (dir = compareComparables(kc, k, pk)) != 0)
                    p = (dir < 0) ? pl : pr;
                else if ((q = pr.findTreeNode(h, k, kc)) != null)
                    return q;
                else
                    p = pl;
            } while (p != null);
        }
        return null;
    }
}

介绍完主要的内部数据结构，我们来看一看 hash 表的初始化部分。这里面用到了一个 sizeCtl，它初始保存的是 HashMap 初始大小，在完成hash表的初始化之后，它保存的是下次进行扩容时的表内数据的数量。在进行初始化时，sizeCtl 还充当了锁的角色，我们需要通过它来控制进行初始化工作的线程数量，只让一个线程进行初始化，其他线程等待。初始化完成后，sizeCtl 保存了下次进行扩容时，需要的数据数量，计算规则是 0.75 * 当前容量 。而当进行扩容时，sizeCtl 又起到了记录并发扩容线程数的作用。

/**
 * Table initialization and resizing control.  When negative, the * table is being initialized or resized: -1 for initialization, * else -(1 + the number of active resizing threads).  Otherwise, * when table is null, holds the initial table size to use upon * creation, or 0 for default. After initialization, holds the * next element count value upon which to resize the table. */
private transient volatile int sizeCtl;
/**
 * Initializes table, using the size recorded in sizeCtl. */
private final Node[] initTable() {
    Node[] tab; int sc;
    while ((tab = table) == null || tab.length == 0) {
        if ((sc = sizeCtl) < 0) // 有其他线程在初始化，直接 yield             Thread.yield(); // lost initialization race; just spin         else if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) { // 加锁             try {
                if ((tab = table) == null || tab.length == 0) {
                    int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
                    @SuppressWarnings("unchecked")
                    // 初始化                     Node[] nt = (Node[])new Node[n];
                    table = tab = nt;
                    // 初始化完成后，sizeCtl 保存了下次进行扩容时，需要的数据数量，计算规则是 0.75 * 当前容量                     sc = n - (n >>> 2);
                }
            } finally {
                sizeCtl = sc;
            }
            break;
        }
    }
    return tab;
}

接下来我们介绍一下添加数据时的处理逻辑。

必要时先进行初始化
如果当前 key 所在槽位为空，通过 CAS 创建初始 Node，其中直接保存了 key value，如果成功则直接返回
否则，检查是否正在进行扩容，多线程一起扩容
走到这，说明当前hash表槽位已经被占，这时候我们需要对该槽位保存的 Node 加锁，该 Node 可能是链表的头也可能是红黑树的"树根"
加锁成功后，要确保锁没加错对象，因为在此之前可能别的线程已经把这个槽位的Node由链表改成了红黑树

接下来根据 Node 节点的hash进行分情况处理，hash码大于0说明当前是链表

1.  检查对应的 key 是不是已经在链表中，则直接修改
2.  检查到尾结点仍然没找到对应的key，则在尾部添加节点

否则如果 Node 节点是红黑树的树根节点类型，则在红黑树中添加或修改节点，这里面需要对数进行平衡，这里就不展开介绍了，在算法那篇文章中有红黑树的介绍
添加完数据之后，如果该槽位的 Node 是链表，则检查链表长度，如果链表长度大于等于 8 则适时地将其转换为红黑树
如果对应的 key 是第一次 put 进map中，则修改当前数据数量，并适时地扩容

/** Implementation for put and putIfAbsent */
final V putVal(K key, V value, boolean onlyIfAbsent) {
    if (key == null || value == null) throw new NullPointerException();
    int hash = spread(key.hashCode());
    int binCount = 0;
    for (Node[] tab = table;;) {
        Node f; int n, i, fh;
        // 如果没有初始化，进行初始化         if (tab == null || (n = tab.length) == 0)
            tab = initTable();
        // 如果当前 key 所在槽位为空，通过 CAS 创建初始 Node，其中直接保存了 key value         else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
            if (casTabAt(tab, i, null,
                         new Node(hash, key, value, null)))
                break;                   // no lock when adding to empty bin         }
        // 扩容中，帮助一起扩容，多线程扩容         else if ((fh = f.hash) == MOVED)
            tab = helpTransfer(tab, f);
        else {
            V oldVal = null;
            // 对 Node 加锁             synchronized (f) {
                if (tabAt(tab, i) == f) { // 确保加锁后，锁没加错对象，因为在此之前可能别的线程已经把这个槽位的Node由链表改成了红黑树                     if (fh >= 0) { // hash码大于0说明当前是链表                         binCount = 1;
                        for (Node e = f;; ++binCount) {
                            K ek;
                            if (e.hash == hash &&
                                ((ek = e.key) == key ||
                                 (ek != null && key.equals(ek)))) {
                                // 检查对应的 key 是不是已经在链表中                                 oldVal = e.val;
                                if (!onlyIfAbsent)
                                    e.val = value;
                                break;
                            }
                            Node pred = e;
                            if ((e = e.next) == null) {
                                // 检查到尾结点仍然没找到对应的key，则在尾部添加节点                                 pred.next = new Node(hash, key,
                                                          value, null);
                                break;
                            }
                        }
                    }
                    else if (f instanceof TreeBin) {
                        // 如果 Node 节点是红黑树的树根节点类型，则在红黑树中添加节点                         Node p;
                        binCount = 2;
                        if ((p = ((TreeBin)f).putTreeVal(hash, key,
                                                       value)) != null) {
                            oldVal = p.val;
                            if (!onlyIfAbsent)
                                p.val = value;
                        }
                    }
                }
            }
            if (binCount != 0) {
                // 如果该槽位的 Node 是链表，则检查链表长度，如果链表长度大于等于 8 则适时地将其转换为红黑树                 if (binCount >= TREEIFY_THRESHOLD)
                    treeifyBin(tab, i);
                if (oldVal != null)
                // 如果发现覆盖了之前的值，则不进行后续扩容，直接返回结果                     return oldVal;
                break;
            }
        }
    }
    addCount(1L, binCount);
    return null;
}

介绍一下扩容逻辑，这里我们以 addCount 函数为例，它会在每次 putVal 添加了新元素之后调用，其中 x 是增加的元素数量，check 表示是否要进行扩容检查，规则是 check < 0 不进行检查（常用与移除元素时），check <=1 在没有竞争的时候检查：

putVal 因槽位为 null 而新添加元素时（check=0）
putVal 时已经存在元素，而且该元素是链表结构，如果目标key是链表的头结点（check=1）, 或者链表只有一个元素（check=1），而当头结点不是目标key或者链表长度大于 1 时（check>1）
putVal 时, 如果对应槽位保存的是红黑树节点，则 check= 2
remove 函数移除元素时，check=-1

在ConcurrentHashMap 中，为了拉满性能，对数据size的维护也进行了优化，它的优化策略很像 linux 中多cpu联合计数器的思路。ConcurrentHashMap 有一个基计数器 baseCount，所有线程在增加size时，先通过 CAS 对 baseCount 进行修改，如果修改失败，它会为当前线程开辟一个服务于当前线程的计数器（以类似于哈希表的形式存储），不过这个计数器也会发生冲突，当发生冲突时，一般采用扩容和重新hash的方式处理，通过种种操作，降低互斥时长。光说的话有点抽象，我们看一下相关代码吧。

如果线程独享的计数器 hash 表 counterCells 不为空或者通过 CAS 修改 baseCount 失败的话，说明 baseCount 上出现了竞争，对 size 的计算需要通过线程独享的计数器来实现
紧接着，如果 counterCells 为空,或者counterCells大小为0, 或者当前线程还没有分配 counterCells 槽位，或者从属于当前线程的 counterCell 计数器也发生冲突时，会通过 fullAddCount 进行 counterCells hash 表的创建，或为当前线程分配 counterCells 槽位，或counterCells 哈希表扩容，或者rehash等操作来规避竞争
如果存在baseCount竞争，并且check <= 1 则不进行扩容检查
通过baseCount加所有counterCells的值统计合计size

private final void addCount(long x, int check) {
    CounterCell[] as; long b, s;
    // 统计容量 size, 执行加一操作     if ((as = counterCells) != null ||
        !U.compareAndSwapLong(this, BASECOUNT, b = baseCount, s = b + x)) {
        // 如果线程独享的计数器 hash 表 counterCells 不为空或者通过 CAS 修改 baseCount 失败的话，说明 baseCount 上出现了竞争，对 size 的计算需要通过线程独享的计数器来实现         CounterCell a; long v; int m;
        boolean uncontended = true;
        if (as == null || (m = as.length - 1) < 0 ||
            (a = as[ThreadLocalRandom.getProbe() & m]) == null ||
            !(uncontended =
              U.compareAndSwapLong(a, CELLVALUE, v = a.value, v + x))) {
            // counterCells 为空,或者counterCells大小为0, 或者当前线程还没有分配 counterCells 槽位，或者从属于当前线程的 counterCell 计数器也发生冲突时             // 会通过 fullAddCount 进行 counterCells hash 表的创建，或为当前线程分配 counterCells 槽位，或counterCells 哈希表扩容，或者rehash等操作来规避竞争             fullAddCount(x, uncontended);
            return;
        }
        // 如果存在baseCount竞争，并且check <= 1 则不进行扩容检查         if (check <= 1)
            return;
        // 通过baseCount加所有counterCells的值统计合计size         s = sumCount();
    }
    // ... }

其中 sumCount 比较简单，就是把 baseCount 和所有counterCells的值加起来。

final long sumCount() {
    CounterCell[] as = counterCells; CounterCell a;
    long sum = baseCount;
    if (as != null) {
        for (int i = 0; i < as.length; ++i) {
            if ((a = as[i]) != null)
                sum += a.value;
        }
    }
    return sum;
}

fullAddCount 的实现很复杂，我们这里制作简单的介绍，不往深挖。

在每个线程中，会分配一个探针值，这个探针值通过 localInit 进行初始化，我觉得这里大家就把他简单地理解为线程中保存的随机数，它保存在 java.lang.Thread#threadLocalRandomProbe 字段，通过 Contended 注解来解决伪共享问题。
如果 counterCells hash 表等于空（代码在fullAddCount的后半段），就初始化 counterCells hash 表，初始大小是2，创建好之后，对当前线程对应的槽位进行赋值。所有对 counterCells 的修改都是通过一个 CELLSBUSY 自旋锁进行保护的
如果创建 counterCells hash 表的过程也发生了冲突就重新通过 baseCount 进行 size 的更新, 代码在fullAddCount的最后几行
如果 counterCells hash 表不为空，通过前面得到的线程探针值与 counterCells hash 表的容量-1 相与，得到所属的槽位
如果所属槽位为空，先加 CELLSBUSY 自旋锁，然后创建 CounterCell 对象并存在对应槽位，如果这一步操作成功了的话，就返回
如果上述操作没有成功，说明出现了很严重的冲突，这里先试着对当前线程对应的槽位 CounterCell 进行更新，如果成功就返回
如果上述操作都失败，就对 counterCells hash 表进行扩容，扩为原来的2倍，然后重新执行上述操作
如果上述操作全失败，而且扩容的时候还发生冲突，就重置当前线程的探针值，相当于再换一个随机数

// See LongAdder version for explanation private final void fullAddCount(long x, boolean wasUncontended) {
    int h;
    if ((h = ThreadLocalRandom.getProbe()) == 0) { // 0 表示未分配探针值         ThreadLocalRandom.localInit();      // force initialization         // 对探针进行初始化         h = ThreadLocalRandom.getProbe();
        wasUncontended = true;
    }
    boolean collide = false;                // True if last slot nonempty     for (;;) {
        CounterCell[] as; CounterCell a; int n; long v;
        if ((as = counterCells) != null && (n = as.length) > 0) {
            // 如果 counterCells hash 表不为空，前面得到的线程探针值与 counterCells hash 表的容量-1 相与，得到所属的槽位，             if ((a = as[(n - 1) & h]) == null) {
                if (cellsBusy == 0) {            // Try to attach new Cell                     CounterCell r = new CounterCell(x); // Optimistic create                     if (cellsBusy == 0 &&
                        U.compareAndSwapInt(this, CELLSBUSY, 0, 1)) {
                        boolean created = false;
                        try {               // Recheck under lock                             CounterCell[] rs; int m, j;
                            if ((rs = counterCells) != null &&
                                (m = rs.length) > 0 &&
                                rs[j = (m - 1) & h] == null) {
                                rs[j] = r;
                                created = true;
                            }
                        } finally {
                            cellsBusy = 0;
                        }
                        if (created)
                            break;
                        continue;           // Slot is now non-empty                     }
                }
                collide = false;
            }
            // 如果上述操作没有成功，说明出现了很严重的冲突，             else if (!wasUncontended)       // CAS already known to fail                 wasUncontended = true;      // Continue after rehash             else if (U.compareAndSwapLong(a, CELLVALUE, v = a.value, v + x))
                break;
            else if (counterCells != as || n >= NCPU)
                collide = false;            // At max size or stale             else if (!collide)
                collide = true;
            else if (cellsBusy == 0 &&
                     U.compareAndSwapInt(this, CELLSBUSY, 0, 1)) {
                //  如果上述操作都失败，就对 counterCells hash 表进行扩容，扩为原来的2倍，然后重新执行上述操作                 try {
                    if (counterCells == as) {// Expand table unless stale                         CounterCell[] rs = new CounterCell[n << 1];
                        for (int i = 0; i < n; ++i)
                            rs[i] = as[i];
                        counterCells = rs;
                    }
                } finally {
                    cellsBusy = 0;
                }
                collide = false;
                continue;                   // Retry with expanded table             }
            // 如果上述操作全失败，而且扩容的时候还发生冲突，就重置当前线程的探针值，相当于再换一个随机数             h = ThreadLocalRandom.advanceProbe(h);
        }
        else if (cellsBusy == 0 && counterCells == as &&
                 U.compareAndSwapInt(this, CELLSBUSY, 0, 1)) {
            // 如果 counterCells hash 表等于空，就初始化 counterCells hash 表，所有对 counterCells 的修改都是通过一个 CELLSBUSY 自旋锁进行保护的             boolean init = false;
            try {                           // Initialize table                 if (counterCells == as) {
                    // hash 表的初始大小为 2，将当前线程对应的槽位进行设置                     CounterCell[] rs = new CounterCell[2];
                    rs[h & 1] = new CounterCell(x);
                    counterCells = rs;
                    init = true;
                }
            } finally {
                cellsBusy = 0;
            }
            if (init)
                break;
        }
        // 创建 counterCells hash 表的过程也发生了冲突就重新通过 baseCount 进行 size 的更新         else if (U.compareAndSwapLong(this, BASECOUNT, v = baseCount, v + x))
            break;                          // Fall back on using base     }
}

ConcurrentLinkedQueue

Java提供的线程安全的 Queue 可以分为阻塞队列和非阻塞队列，其中阻塞队列的典型例子是 BlockingQueue，非阻塞队列的典型例子是ConcurrentLinkedQueue，在实际应用中要根据实际需要选用阻塞队列或者非阻塞队列。阻塞队列可以通过加锁来实现，非阻塞队列可以通过 CAS 操作实现。

ConcurrentLinkedQueue 使用了链表作为其数据结构．内部使用 CAS 来进行链表的维护。ConcurrentLinkedQueue 适合在对性能要求相对较高，同时对队列的读写存在多个线程同时进行的场景，即如果对队列加锁的成本较高则适合使用无锁的ConcurrentLinkedQueue来替代。

接下来我们简单地看一下 ConcurrentLinkedQueue 的实现，在 ConcurrentLinkedQueue 中所有数据通过单向链表存储，同时我们还会保存该链表的头指针和尾指针。

// 链表中的节点 private static class Node {
    volatile E item;
    volatile Node next;
    //... }
/**
 * A node from which the first live (non-deleted) node (if any) * can be reached in O(1) time. * Invariants: * - all live nodes are reachable from head via succ() * - head != null * - (tmp = head).next != tmp || tmp != head * Non-invariants: * - head.item may or may not be null. * - it is permitted for tail to lag behind head, that is, for tail *   to not be reachable from head! */
private transient volatile Node head;

/**
 * A node from which the last node on list (that is, the unique * node with node.next == null) can be reached in O(1) time. * Invariants: * - the last node is always reachable from tail via succ() * - tail != null * Non-invariants: * - tail.item may or may not be null. * - it is permitted for tail to lag behind head, that is, for tail *   to not be reachable from head! * - tail.next may or may not be self-pointing to tail. */
private transient volatile Node tail;

在对象实例化时，会创建一个虚节点。看到后面你会发现，如果想通过 CAS 维护一个链表，一般都会使用到虚节点。

public ConcurrentLinkedQueue() {
    head = tail = new Node(null);
}

介绍完内部数据结构，我们看一下增删节点的实现方式。先来看一下增加数据的逻辑：

入队操作是在一个循环中尝试 CAS 操作，首先判断，尾结点p.next 是不是null，是的话就通过 CAS 将 null-> newNode，如果 CAS 成功，说明该节点就已经算是加入到队列中了

但是这里并没有直接修改尾结点，因为ConcurrentLinkedQueue 中 tail 并不一定是实际上的尾结点，在并发很大时，如果所有线程都要去竞争修改尾结点的话，对性能会有影响，所以，当实际的尾结点（代码中的变量 p）不等于 tail 时，才会进行更新。
在ConcurrentLinkedQueue中会出现 Node1(head)->Node2(tail)->null 以及 Node1(head)->Node2(tail)->Node3->null 这样的情况甚至 Node1(head)->Node2(tail)->Node3->Node4 这样的情况，虽然 tail 指针没有直接指向尾结点会导致将新节点加入链表时，需要从tail 向后查找实际的尾结点，但是这个过程相较于对tail节点的竞争来说，影响较小，最终效率也更高

如果发现当前p节点不是实际上的尾结点，会先检查它的next 指针是否指向自己，在出队函数poll中，将一个元素出队后会把它的next指针指向自己，所以这一步实际上是判断当前的 p 节点是否已经出队

如果满足上述情况，我们需要重新获取 tail 指针，如果发现在上述过程中 tail 指针发生了变化，这说明期间已经好有个并发插入过程完成了，我们直接从最新的tail对象开始上述流程即可，所以这里就将 p 赋为最新的 tail 指向的对象，
如果整个过程中 tail 指针都没变，说明当前的情况类似于 Node1(head，tail)-> Node2->null, 但是在判断 p == q 之前，发生了出队操作，状态变成了 Node1(tail, 已经出队的对象) Node2(head)->null，这个时候我们要将 p 设置为 head 然后从head开始向后遍历

最后就是单纯的没有遍历到尾结点的情况了， Node1(head)->Node2(tail，当前 p 变量)->Node3（当前q变量）->null

如果发现已经进行过一次向后遍历的过程，即 p != t ，并且 tail 指针发生了变化，我们就直接使用 tail 指针，不再向后遍历了 p = t(最新的tail指针)
如果不满足上述情况，比如还从来没遍历过，或者虽然遍历过但是 tail 指针没变，我们就继续遍历 p = q(p.next)

public boolean offer(E e) {
    checkNotNull(e);
    final Node newNode = new Node(e);

    for (Node t = tail, p = t;;) {
        Node q = p.next;
        if (q == null) {
            // p is last node             // 找到了最后一个节点，通过 CAS 将其 next 指向新节点             if (p.casNext(null, newNode)) {
                // Successful CAS is the linearization point                 // for e to become an element of this queue,                 // and for newNode to become "live".                 // 如果 tail.next 为null就不修改tail，tail.next != null 时才会修改                 // 这里会出现多个线程同时发现 tail.next != null 的情况，所以 tail 指针和实际的尾结点的距离不一定是1                 if (p != t) // hop two nodes at a time                     casTail(t, newNode);  // Failure is OK. 因为没有要求 tail 指针和实际的尾结点的距离是1                 return true;
            }
            // Lost CAS race to another thread; re-read next         }
        else if (p == q)
            // 如果发现当前p节点不是实际上的尾结点，会先检查它的next 指针是否指向自己，在出队函数poll中，将一个元素出队后会把它的next指针指向自己，所以这一步实际上是判断当前的 p 节点是否已经出队             // 如果 tail 指针发生了变化，就从最新的 tail 开始遍历             // 否则，从 head 开始遍历，因为这时候 tail 可能指向了一个死掉(next 指向自己，已经从队列中移除)的节点             // We have fallen off list.  If tail is unchanged, it             // will also be off-list, in which case we need to             // jump to head, from which all live nodes are always             // reachable.  Else the new tail is a better bet.             p = (t != (t = tail)) ? t : head;
        else
            // 最后就是单纯的没有遍历到尾结点的情况了             // 如果发现已经进行过一次向后遍历的过程，并且 tail 指针发生了变化，我们就直接使用 tail 指针             // 如果还从来没遍历过，或者虽然遍历过但是 tail 指针没变，我们就继续遍历             // Check for tail updates after two hops.             p = (p != t && t != (t = tail)) ? t : q;
    }
}

最后，我们介绍一下出队的操作，整个出队过程也是在一个 CAS 循环中进行：

首先我们检查头指针的 p(head).item 是不是null，不是的话才说明该节点是一个有效节点，因为初始化是创建的虚节点item才等于null，这里通过 item 是不是 null 来判断是不是虚节点也就是说 ConcurrentLinkedQueue 中不能添加值为 null 的节点

找到有效节点后，通过 cas 将item改为null，后续的操作和添加元素时类似，因为 head 指针也是一个竞争点，所以这里并没有直接修改 head 指针，而是发现从 head 至少向后遍历过一次时，才会修改 head 指针，这和 offer 中的方式类似
如果当前线程要负责修改 head 指针，会判断刚删掉的节点 p 的 next 是不是null，是的话就让 p 作为 head（此时p充当新的虚节点），如果不是的话，就让 p.next 作为 next（此时head就是实际上的头结点）

如果 p 的item == null 或者cas 失败（别的线程已经把p.item置为 null），我们要检查一下 p.next 是不是null，如果是的话说明 p已经是最后一个节点了，我们需要返回 null，但是在此之前，我们不妨把p设为新的head来减少其他线程的遍历开销
检查当前 p 节点的 next 指针是不是指向自己，是的话说明当前检查的这个节点已经被别的线程从队列中移除了，那我们就重新开始执行 poll
否则，让 p = q(p.next)，也就是说这是从 head 开始向后遍历的过程

public E poll() {
    restartFromHead:
    for (;;) {
        for (Node h = head, p = h, q;;) {
            E item = p.item;
            // item != null 说明该节点是一个有效节点, 通过 CAS 将其item改为 null             if (item != null && p.casItem(item, null)) {
                // CAS 成功说明已经移除一个节点了，后续的操作和添加元素时类似，因为 head 指针也是一个竞争点                 // 所以这里并没有直接修改 head 指针，而是发现从 head 至少向后遍历过一次时，才会修改 head 指针，这和 offer 中的方式类似                 // Successful CAS is the linearization point                 // for item to be removed from this queue.                 if (p != h) // hop two nodes at a time                     // 判断刚删掉的节点 p 的 next 是不是null，是的话就让 p 作为 head（此时p充当新的虚节点），                     // 如果不是的话，就让 p.next 作为 next（此时head就是实际上的头结点）                     updateHead(h, ((q = p.next) != null) ? q : p);
                return item;
            }
            else if ((q = p.next) == null) {
                // 说明 p已经是最后一个节点了，我们需要返回 null                 // 但是在此之前，我们不妨把p设为新的head来减少其他线程的遍历开销                 updateHead(h, p);
                return null;
            }
            else if (p == q)
                // 说明当前检查的这个节点已经被别的线程从队列中移除了，那我们就重新开始执行 poll                 continue restartFromHead;
            else
                // p = q(p.next)，也就是说这是从 head 开始向后遍历的过程                 p = q;
        }
    }
}

updateHead 的过程中先会检查是不是真的有必要重置 head 指针，有必要的话在通过 CAS 修改 head 指针，如果CAS 失败了也无妨，毕竟我们不要求 head 一定指向实际的头结点，poll 中的遍历过程能 cover 这种情况。如果 CAS 成功，会将删掉的 head 指针指向自己。

/**
 * Tries to CAS head to p. If successful, repoint old head to itself * as sentinel for succ(), below. */
final void updateHead(Node h, Node p) {
    if (h != p && casHead(h, p))
        h.lazySetNext(h);
}

void lazySetNext(Node val) {
    UNSAFE.putOrderedObject(this, nextOffset, val);
}

这里大家可能会有疑问，为什么要 lazySet next 指针呢？要想理解这个问题，我们需要先理解 putOrderedObject 和 putObjectVolatile 的区别。因为 Node 中的 next 属性是用 volatile 修饰的，而 volatile 有什么特点呢？一个是防止指令重拍，一个是将其他 CPU cache 中的相关数据无效化，迫使这些 CPU 重新从主存中拉取最新数据。这是通过 Fence(内存屏障) 实现的，在 linux x86 架构中一般是 lock; addl $0,0(%%esp). ,这里的 lock 是一个指令前缀, 它蕴含了storeload 内存屏障的语义, 后面的 addl $0,0(%%esp) 是一个空指令, 因为 lock 前缀不能独立存在(它不是一条完整的指令), 所以在使用它的时候一般会在后面跟一条什么都不做的指令。

而 putObjectVolatile 函数等效于声明一个 volatile 变量，然后直接对该变量进行修改。也就是说，无论是 putObjectVolatile 还是对 volatile 变量的直接修改，都依赖与 StoreLoad barriers ，这里 StoreLoad barriers 就是说如果指令的顺序是 Store1; StoreLoad; Load2 ，就需要确保 Store1 保存的数据在 Load2 访问数据之前，一定要能够对所有线程可见。关于内存屏障的解释，可以参考这篇手册, 其中介绍了各个内存屏障的要求，以及在不同架构上的实现方式。

而 putOrderedObject 函数呢，只需要保证当前 cpu 内指令是有序的，不会出现非法的内存访问即可，这也就是说，putOrderedObject 没有多处理期间的可见性保证，也就不会有多余的开销。在我们 ConcurrentLinkedQueue 的场景中，最终将 next 指针指向自己并不需要这么高的可见性需求，而且 next 是修饰为 volatile 的，所以，我们需要显式地调用 putOrderedObject 才能达到 “去 volatile 特性”的效果，从而提升效率。

关于它们的实现，可以参考如下代码，可以看到 ordered_store 最后插入了一个 Op_MemBarCPUOrder 内存屏障，而 putObjectVolatile 对应了 inline_unsafe_access 中的 is_volatile=true && is_store == true 的逻辑，也就是插入了 Op_MemBarVolatile 内存屏障。

bool LibraryCallKit::inline_unsafe_ordered_store(BasicType type) {
  // This is another variant of inline_unsafe_access, differing in   // that it always issues store-store ("release") barrier and ensures   // store-atomicity (which only matters for "long"). 
  // ...   if (type == T_OBJECT) // reference stores need a store barrier.     store = store_oop_to_unknown(control(), base, adr, adr_type, val, type);
  else {
    store = store_to_memory(control(), adr, val, type, adr_type, require_atomic_access);
  }
  insert_mem_bar(Op_MemBarCPUOrder);
  return true;
}

bool LibraryCallKit::inline_unsafe_access(bool is_native_ptr, bool is_store, BasicType type, bool is_volatile) {
  // .... 
  if (is_volatile) {
    if (!is_store)
      insert_mem_bar(Op_MemBarAcquire);
    else
      insert_mem_bar(Op_MemBarVolatile);
  }

  if (need_mem_bar) insert_mem_bar(Op_MemBarCPUOrder);

  return true;
}

再来看看 memnode.hpp 中对这两种内存屏障的解释。MemBarVolatileNode 需要保证多 CPU 的可见性，MemBarCPUOrderNode 只需要保证单 CPU 顺序即可，而且 CPU 已经做了所有的排序工作，我们无须多做。

// Ordering between a volatile store and a following volatile load. // Requires multi-CPU visibility class MemBarVolatileNode: public MemBarNode {
public:
  MemBarVolatileNode(Compile* C, int alias_idx, Node* precedent)
    : MemBarNode(C, alias_idx, precedent) {}
  virtual int Opcode() const;
};

// Ordering within the same CPU.  Used to order unsafe memory references // inside the compiler when we lack alias info.  Not needed "outside" the // compiler because the CPU does all the ordering for us. class MemBarCPUOrderNode: public MemBarNode {
public:
  MemBarCPUOrderNode(Compile* C, int alias_idx, Node* precedent)
    : MemBarNode(C, alias_idx, precedent) {}
  virtual int Opcode() const;
  virtual uint ideal_reg() const { return 0; } // not matched in the AD file };

ConcurrentSkipListMap

对于一个单链表，即使链表是有序的，如果我们想要在其中查找某个数据，也只能从头到尾遍历链表，这样效率自然就会很低。而跳表是在这个单链表的基础上同时维护了多个链表，并且链表是分层的。

最低层的链表维护了跳表内所有的元素，每上面一层链表都是下面一层的子集。

跳表内的所有链表的元素都是排序的。查找时，可以从顶级链表开始找。一旦发现被查找的元素大于当前链表中的取值，就会转入下一层链表继续找。这也就是说在查找过程中，搜索是跳跃式的。如上图所示，在跳表中查找元素18。

查找18 的时候原来需要遍历 12 次，现在只需要 7 次即可。针对链表长度比较大的时候，构建索引查找效率的提升就会非常明显。

使用跳表实现Map 和使用哈希算法实现Map的另外一个不同之处是：哈希并不会保存元素的顺序，而跳表内所有的元素都是排序的。因此在对跳表进行遍历时，你会得到一个有序的结果。

在 JDK 的 ConcurrentSkipListMap 实现中，没有使用到锁，而是通过 CAS 来进行数据的修改，当插入数据时，通过 CAS 修改最下层列表的内容，然后再逐层向上维护各级列表（各层列表的修改都是通过 CAS 完成），这两个过程是独立的，因为上层列表维护的数据少也只会影响查找数据的速度，而不会影响到数据的准确性，因为添加与查找数据都以最下层列表内容为准。

CopyOnWriteArrayList

CopyOnWriteArrayList 用于读场景远多于写场景的情况，它能够让读与读之间不互斥，读与写也不互斥，只有写与写之间才会互斥。它的思路也很简单，内部通过一个数组来维护数据，正常读数据时直接通过索引从数组中提取数据。

/** The array, accessed only via getArray/setArray. */
private transient volatile Object[] array;

@SuppressWarnings("unchecked")
private E get(Object[] a, int index) {
    return (E) a[index];
}

/**
 * {@inheritDoc} * * @throws IndexOutOfBoundsException {@inheritDoc} */
public E get(int index) {
    return get(getArray(), index);
}

/**
 * Gets the array.  Non-private so as to also be accessible * from CopyOnWriteArraySet class. */
final Object[] getArray() {
    return array;
}

而写数据时，需要将整个数组都复制一遍，然后在新数组的末尾添加最新的数据。最后替换掉原来的数组，这样原来的数组就会被回收。很显然，这种实现方式在减小竞争的同时，承担了数据空间 * 2 的压力。

/** The lock protecting all mutators */
final transient ReentrantLock lock = new ReentrantLock();

/**
 * Appends the specified element to the end of this list. * * @param e element to be appended to this list * @return {@code true} (as specified by {@link Collection#add}) */
public boolean add(E e) {
    final ReentrantLock lock = this.lock;
    lock.lock();
    try {
        Object[] elements = getArray();
        int len = elements.length;
        Object[] newElements = Arrays.copyOf(elements, len + 1);
        newElements[len] = e;
        setArray(newElements);
        return true;
    } finally {
        lock.unlock();
    }
}

/**
 * Sets the array. */
final void setArray(Object[] a) {
    array = a;
}

LinkedBlockQueue

LinkedBlockingQueue 底层基于单向链表实现的阻塞队列，可以当做无界队列也可以当做有界队列来使用，满足FIFO的特性，为了防止 LinkedBlockingQueue 容量迅速增，损耗大量内存。通常在创建LinkedBlockingQueue 对象时，会指定其大小，如果未指定，容量等于Integer. MAX_VALUE。那么什么是阻塞队列呢？我们知道队列有入队出队两个操作，所谓阻塞队列，就是说如果队列已满时，可以阻塞入队操作，而如果队列为空时，可以阻塞出队操作。

为了实现阻塞效果并保证线程安全，它的内部用到了两个锁和两个Condition。

/** Lock held by take, poll, etc */
private final ReentrantLock takeLock = new ReentrantLock();

/** Wait queue for waiting takes */
private final Condition notEmpty = takeLock.newCondition();

/** Lock held by put, offer, etc */
private final ReentrantLock putLock = new ReentrantLock();

/** Wait queue for waiting puts */
private final Condition notFull = putLock.newCondition();

在进行数据出队时，先要获得 takeLock，然后检查当前队列容量是否为 0，如果队列容量为 0，则在 notEmpty 上等待，否则直接执行出队操作。最后判断一下，是不是执行出队操作之前，队列已经达到最大容量，如果是的话，就唤醒等待中的入队操作。

public E take() throws InterruptedException {
    E x;
    int c = -1;
    final AtomicInteger count = this.count;
    // 获取锁     final ReentrantLock takeLock = this.takeLock;
    takeLock.lockInterruptibly();
    try {
        // 如果队列为空，就等待         while (count.get() == 0) {
            notEmpty.await();
        }
        // 否则，执行出队操作，并修改 size         x = dequeue();
        c = count.getAndDecrement();
        if (c > 1)
            // 如果队列不为空，则唤醒下一个等待中的出队操作             notEmpty.signal();
    } finally {
        takeLock.unlock();
    }
    // 如果之前队列满了，则唤醒等待中的入队操作     if (c == capacity)
        signalNotFull();
    return x;
}

/**
 * Signals a waiting put. Called only from take/poll. */
private void signalNotFull() {
    final ReentrantLock putLock = this.putLock;
    putLock.lock();
    try {
        notFull.signal();
    } finally {
        putLock.unlock();
    }
}

入队操作和出队操作正好相反，同样先获取锁，不过这里用的是 putLock，检查当前队列是否已满，是的话就在 notFull 上等待，否则执行入队操作并修改size，如果之前的队列长度为 0，那么就有可能有一些出队操作被阻塞了，所以我们这里需要唤醒所有在 notEmpty 上等待的线程。

/**
 * Inserts the specified element at the tail of this queue, waiting if * necessary for space to become available. * * @throws InterruptedException {@inheritDoc} * @throws NullPointerException {@inheritDoc} */
public void put(E e) throws InterruptedException {
    if (e == null) throw new NullPointerException();
    // Note: convention in all put/take/etc is to preset local var     // holding count negative to indicate failure unless set.     int c = -1;
    Node node = new Node(e);
    final ReentrantLock putLock = this.putLock;
    final AtomicInteger count = this.count;
    // 先获取锁     putLock.lockInterruptibly();
    try {
        /*
 * Note that count is used in wait guard even though it is * not protected by lock. This works because count can * only decrease at this point (all other puts are shut * out by lock), and we (or some other waiting put) are * signalled if it ever changes from capacity. Similarly * for all other uses of count in other wait guards. */
        while (count.get() == capacity) {
            notFull.await();
        }
        enqueue(node);
        c = count.getAndIncrement();
        if (c + 1 < capacity)
            notFull.signal();
    } finally {
        putLock.unlock();
    }
    if (c == 0)
        signalNotEmpty();
}

/**
 * Signals a waiting take. Called only from put/offer (which do not * otherwise ordinarily lock takeLock.) */
private void signalNotEmpty() {
    final ReentrantLock takeLock = this.takeLock;
    takeLock.lock();
    try {
        notEmpty.signal();
    } finally {
        takeLock.unlock();
    }
}

LinkedBlockingQueue 除了阻塞版的入队出队操作外，当然也有不阻塞的接口，不过这些接口比较简单，基本上就是在上述基础取消 await 和 signal 逻辑，这里就不再赘述了。

PriorityBlockingQueue

PriorityBlockingQueue 是一个带排序功能的阻塞队列，因为它是一个队列，没必要保证整个队列的内部顺序，只需要保证出队时按照排序结果出即可，所以其内部使用了二分堆得形式实现，同时，PriorityBlockingQueue 也是线程安全的，内部通过一个锁来控制堆数据的维护。

PriorityBlockingQueue 的堆数据都保存在如下的 queue 数组中，堆的根节点是 queue[0] , 就像如下注释所说的，我们可以根据一个节点的下标 n 快速计算出它的两个子节点的下标，即 queue[2*n+1] 和 queue[2*(n+1)] 。这是用数组来描述二分堆（树）的常见方法。

/**
 * Priority queue represented as a balanced binary heap: the two * children of queue[n] are queue[2*n+1] and queue[2*(n+1)].  The * priority queue is ordered by comparator, or by the elements' * natural ordering, if comparator is null: For each node n in the * heap and each descendant d of n, n <= d.  The element with the * lowest value is in queue[0], assuming the queue is nonempty. */
private transient Object[] queue;

因为维护二分堆时，我们不需要保证整个堆内所有元素有序，只需要保证父子节点之间有序即可，所以当我们要插入一个元素时，直接将其插入到堆尾，然后通过其与父节点的关系，进行适当地父子节点换位，就能保证堆的性质。

/**
 * Inserts the specified element into this priority queue. * As the queue is unbounded, this method will never return {@code false}. * * @param e the element to add * @return {@code true} (as specified by {@link Queue#offer}) * @throws ClassCastException if the specified element cannot be compared *         with elements currently in the priority queue according to the *         priority queue's ordering * @throws NullPointerException if the specified element is null */
public boolean offer(E e) {
    if (e == null)
        throw new NullPointerException();
    final ReentrantLock lock = this.lock;
    lock.lock();
    int n, cap;
    Object[] array;
    // 最新节点的下标（n） 等于 size，确保容量没问题     while ((n = size) >= (cap = (array = queue).length))
        tryGrow(array, cap);
    try {
        Comparator cmp = comparator;
        if (cmp == null)
            // 通过比较函数，进行父子节点换位             siftUpComparable(n, e, array);
        else
            // 通过比较函数，进行父子节点换位             siftUpUsingComparator(n, e, array, cmp);
        // 修改 size 并唤醒等待中的出队操作         size = n + 1;
        notEmpty.signal();
    } finally {
        lock.unlock();
    }
    return true;
}

/**
 * Inserts item x at position k, maintaining heap invariant by * promoting x up the tree until it is greater than or equal to * its parent, or is the root. * * To simplify and speed up coercions and comparisons. the * Comparable and Comparator versions are separated into different * methods that are otherwise identical. (Similarly for siftDown.) * These methods are static, with heap state as arguments, to * simplify use in light of possible comparator exceptions. * * @param k the position to fill * @param x the item to insert * @param array the heap array */
private static  void siftUpComparable(int k, T x, Object[] array) {
    Comparable key = (Comparable) x;
    while (k > 0) {
        // 计算父节点下标         int parent = (k - 1) >>> 1;
        Object e = array[parent];
        // 和父节点比较，因为默认是升序（最小堆），所以当新添加的节点大于父节点时就停下来了         if (key.compareTo((T) e) >= 0)
            break;
        // 新添加的节点小于父节点，那么父子节点换位，然后重复上述过程         array[k] = e;
        k = parent;
    }
    array[k] = key;
}

当从 PriorityBlockingQueue 中执行出队操作时，直接提取下标0元素，然后用 queue 中的最后一个元素，接替 0 号元素的位置，自上而下地修正堆中元素的位置关系，使其满足堆的性质。

public E take() throws InterruptedException {
    final ReentrantLock lock = this.lock;
    lock.lockInterruptibly();
    E result;
    // 加锁并循环出队，如果没有数据就在 notEmpty condition上等待     try {
        while ( (result = dequeue()) == null)
            notEmpty.await();
    } finally {
        lock.unlock();
    }
    return result;
}

/**
 * Mechanics for poll().  Call only while holding lock. */
private E dequeue() {
    int n = size - 1;
    if (n < 0)
        return null;
    else {
        Object[] array = queue;
        // 出队 0 号元素         E result = (E) array[0];
        // 接替者是 queue 的末尾元素         E x = (E) array[n];
        array[n] = null;
        Comparator cmp = comparator;
        if (cmp == null)
            // 接替者下移             siftDownComparable(0, x, array, n);
        else
            // 接替者下移             siftDownUsingComparator(0, x, array, n, cmp);
        size = n;
        return result;
    }
}

从下移节点的实现中，我们可以看到，它先会比较两个子节点，选取最小的节点最为下移的目标节点。然后通过比较器比较当前节点 x 和目标节点是否满足堆的性质，如果不满足则交换节点位置，并重复上述过程。

/**
 * Inserts item x at position k, maintaining heap invariant by * demoting x down the tree repeatedly until it is less than or * equal to its children or is a leaf. * * @param k the position to fill * @param x the item to insert * @param array the heap array * @param n heap size */
private static  void siftDownComparable(int k, T x, Object[] array,
                                           int n) {
    if (n > 0) {
        Comparable key = (Comparable)x;
        int half = n >>> 1;           // loop while a non-leaf         while (k < half) {
            int child = (k << 1) + 1; // assume left child is least             Object c = array[child];
            int right = child + 1;
            if (right < n &&
                ((Comparable) c).compareTo((T) array[right]) > 0)
                // 比较左右两个节点，以最小节点作为下移的目标节点                 c = array[child = right];
            if (key.compareTo((T) c) <= 0)
                // 父节点最小，停止                 break;
            // 子节点小，交换子节点和父节点，并重复上述过程             array[k] = c;
            k = child;
        }
        array[k] = key;
    }
}