ConcurrentHashMap类实现说明

1.类实现说明翻译

The primary design goal of this hash table is to maintain
concurrent readability (typically method get(), but also
iterators and related methods) while minimizing update
contention. Secondary goals are to keep space consumption about
the same or better than java.util.HashMap, and to support high
initial insertion rates on an empty table by many threads.

主要目标:在最小化更新竞争的同时保持并发可读性(get方法、迭代器及相关方法)。

次要目标:使空间消耗与HashMap保持大致相同或更好,并支持多线程在空表上高效初始化插入。

This map usually acts as a binned (bucketed) hash table.  Each
key-value mapping is held in a Node.  Most nodes are instances
of the basic Node class with hash, key, value, and next
fields. However, various subclasses exist: TreeNodes are
arranged in balanced trees, not lists.  TreeBins hold the roots
of sets of TreeNodes. ForwardingNodes are placed at the heads
of bins during resizing. ReservationNodes are used as
placeholders while establishing values in computeIfAbsent and
related methods.  The types TreeBin, ForwardingNode, and
ReservationNode do not hold normal user keys, values, or
hashes, and are readily distinguishable during search etc
because they have negative hash fields and null key and value
fields. (These special nodes are either uncommon or transient,
so the impact of carrying around some unused fields is
insignificant.)

该Map是分桶哈希表。key-value保存在Node中,大多数结点都是基础Node类的实例(包括域hash key value next)。但是,存在各种不同的子类:用在平衡树中的TreeNode;TreeBin持有TreeNode集的根;ForwardingNode在resize时放在桶的head处;ReservationNode在computeIfAbsent及其相关方法计算值用作占位符。

TreeBin, ForwardingNode, ReservationNode有负的hash域以及null key和value域,在search等方法中很容易区分。这些特殊结点要么不常见,要么是transient,所以影响是微不足道的。

The table is lazily initialized to a power-of-two size upon the
first insertion.  Each bin in the table normally contains a
list of Nodes (most often, the list has only zero or one Node).
Table accesses require volatile/atomic reads, writes, and
CASes.  Because there is no other way to arrange this without
adding further indirections, we use intrinsics
(sun.misc.Unsafe) operations.

在第一次插入时表被懒惰初始为2的幂,表中的每个桶通常包含一个Node链表(最常见的是,列表只有0个或1个结点)。表访问需要volatile、原子读写和CAS操作。因为没有其他不增加间接操作的方式,所以使用内部方法(sun.misc.Unsafe)。

备注:JEP 193: Variable Handles
The classes in java.util.concurrent (and other areas identified in the JDK) will be migrated from sun.misc.Unsafe to VarHandle.
并发类从JDK9开始从sun.misc.Unsafe迁移到了VarHandle。

We use the top (sign) bit of Node hash fields for control
purposes -- it is available anyway because of addressing
constraints.  Nodes with negative hash fields are specially
handled or ignored in map methods.

使用Node结点的hash域的最高位(符号位)来进行控制。在map的方法中带有负hash域的Node结点会被特殊对待。

Insertion (via put or its variants) of the first node in an
empty bin is performed by just CASing it to the bin.  This is
by far the most common case for put operations under most
key/hash distributions.  Other update operations (insert,
delete, and replace) require locks.  We do not want to waste
the space required to associate a distinct lock object with
each bin, so instead use the first node of a bin list itself as
a lock. Locking support for these locks relies on builtin
"synchronized" monitors.

在空桶中插入第一个结点是通过CAS操作。这是大多数键/哈希分布下最常见的put操作。其他的更新操作(insert delete replace)需要锁。为每个桶关联一个独立的锁很浪费空间,因此将桶链表本身的第一个结点作为锁,使用内置的synchronized锁。

Using the first node of a list as a lock does not by itself
suffice though: When a node is locked, any update must first
validate that it is still the first node after locking it, and
retry if not. Because new nodes are always appended to lists,
once a node is first in a bin, it remains first until deleted
or the bin becomes invalidated (upon resizing).

使用链表第一个结点作为锁还不够:当结点被锁后,任何更新需要首先验证该结点是否仍然是第一个结点,如果不是则需要重试。

因为新结点总是添加到链表尾部。所以一旦一个结点称为桶中的首结点,其将一直是首结点直到删除或者该桶变得无效(resize时)

The main disadvantage of per-bin locks is that other update
operations on other nodes in a bin list protected by the same
lock can stall, for example when user equals() or mapping
functions take a long time.  However, statistically, under
random hash codes, this is not a common problem.  Ideally, the
frequency of nodes in bins follows a Poisson distribution
(http://en.wikipedia.org/wiki/Poisson_distribution) with a
parameter of about 0.5 on average, given the resizing threshold
of 0.75, although with a large variance because of resizing
granularity. Ignoring variance, the expected occurrences of
list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The
first values are:

0:    0.60653066
1:    0.30326533
2:    0.07581633
3:    0.01263606
4:    0.00157952
5:    0.00015795
6:    0.00001316
7:    0.00000094
8:    0.00000006
more: less than 1 in ten million

Lock contention probability for two threads accessing distinct
elements is roughly 1 / (8 * #elements) under random hashes.

桶上锁时,桶中其他结点的其他操作会停止。但是统计上,在随机哈希值的情况下,这不是一个常见的问题。理想情况下,在给定扩容阈值0.75的情况下,桶中结点数遵循平均值约为0.5的泊松分布,尽管会因为resize粒度具有较大方差。忽略方差,链表的期望大小k为(exp(-0.5) * pow(0.5, k) / factorial(k))



在随机哈希下,两线程访问不同元素锁竞争的概率大约为1 / (8 * elements)。

这个概率的计算应该是:
P(X=2) * (2-1 / (elements - 1)) + P(X=3) * (3-1 /(elements - 1)) + ...
> (0.07581633 + 0.01263606 * 2 + 0.00157952 * 3 + 0.00015795 * 4 + 0.00001316 * 5 + 0.00000094 * 6 + 0.00000006 * 7) / elements
= 0.10653067 / elements

分析:第一个元素落在一个size为2的桶中,然后第二元素也落在该桶中;依次类推

Actual hash code distributions encountered in practice
sometimes deviate significantly from uniform randomness.  This
includes the case when N > (1<<30), so some keys MUST collide.
Similarly for dumb or hostile usages in which multiple keys are
designed to have identical hash codes or ones that differs only
in masked-out high bits. So we use a secondary strategy that
applies when the number of nodes in a bin exceeds a
threshold. These TreeBins use a balanced tree to hold nodes (a
specialized form of red-black trees), bounding search time to
O(log N).  Each search step in a TreeBin is at least twice as
slow as in a regular list, but given that N cannot exceed
(1<<64) (before running out of addresses) this bounds search
steps, lock hold times, etc, to reasonable constants (roughly
100 nodes inspected per operation worst case) so long as keys
are Comparable (which is very common -- String, Long, etc).
TreeBin nodes (TreeNodes) also maintain the same "next"
traversal pointers as regular nodes, so can be traversed in
iterators in the same way.

实践中会遇到哈希值分布有时会明显偏离均匀随机分布。包括当N > (1 << 30)时,一些键必须发生碰撞。类似的,对于dumb或者hostile用途情况,会有多个key设计成具有相同哈希值或者仅仅在最高位不同。

所以使用第二种策略,当桶中的结点数超过阈值时,将其转换为变种红黑树,使得搜索时间限制在O(logN)。Treebin中搜索比常规链表慢(两倍慢)。

The table is resized when occupancy exceeds a percentage
threshold (nominally, 0.75, but see below).  Any thread
noticing an overfull bin may assist in resizing after the
initiating thread allocates and sets up the replacement array.
However, rather than stalling, these other threads may proceed
with insertions etc.  The use of TreeBins shields us from the
worst case effects of overfilling while resizes are in
progress.  Resizing proceeds by transferring bins, one by one,
from the table to the next table. However, threads claim small
blocks of indices to transfer (via field transferIndex) before
doing so, reducing contention.  A generation stamp in field
sizeCtl ensures that resizings do not overlap. Because we are
using power-of-two expansion, the elements from each bin must
either stay at same index, or move with a power of two
offset. We eliminate unnecessary node creation by catching
cases where old nodes can be reused because their next fields
won't change.  On average, only about one-sixth of them need
cloning when a table doubles. The nodes they replace will be
garbage collectable as soon as they are no longer referenced by
any reader thread that may be in the midst of concurrently
traversing table.  Upon transfer, the old table bin contains
only a special forwarding node (with hash field "MOVED") that
contains the next table as its key. On encountering a
forwarding node, access and update operations restart, using
the new table.

当容量达到阈值时(默认0.75),会进行扩容。在初始化线程分配和设置新数组后,其他注意到正在扩容的线程都可以协助扩容。

线程在复制(transfer)前,会首先获取传输块(通过transferIndex)来减少竞争。sizeCtl标记确保不会重叠绑定。

会在老结点可复用的情况下(next域不变)尽可能复用以减少不必要的创建。

在传输时,旧表的桶包含一个特殊的forwarding结点(hash域为MOVED),其键为新的表。遇到这种结点,访问和更新操作会重启,使用新表。

Each bin transfer requires its bin lock, which can stall
waiting for locks while resizing. However, because other
threads can join in and help resize rather than contend for
locks, average aggregate waits become shorter as resizing
progresses.  The transfer operation must also ensure that all
accessible bins in both the old and new table are usable by any
traversal.  This is arranged in part by proceeding from the
last bin (table.length - 1) up towards the first.  Upon seeing
a forwarding node, traversals (see class Traverser) arrange to
move to the new table without revisiting nodes.  To ensure that
no intervening nodes are skipped even when moved out of order,
a stack (see class TableStack) is created on first encounter of
a forwarding node during a traversal, to maintain its place if
later processing the current table. The need for these
save/restore mechanics is relatively rare, but when one
forwarding node is encountered, typically many more will be.
So Traversers use a simple caching scheme to avoid creating so
many new TableStack nodes. (Thanks to Peter Levart for
suggesting use of a stack here.)

多线程协助扩容会使得锁等待时间变短。transfer操作必须保证在新表和旧表所有可访问的桶能被任何遍历使用。最后一个桶向上指向第一个桶。

当看到一个forwarding结点,遍历会被安排到新表。为了确保在无序移动中不会跳过中间结点,在遍历期间首次遇到forwarding结点时会创建一个TableStack,以便在以后处理当前表时保持其位置。

对保存/恢复机制的需求相对较少,但是当遇到一个forwarding时,通常会有更多forwarding。Traverser使用了一个简单的缓存机制避免创建过多新的TableStack结点。

The traversal scheme also applies to partial traversals of
ranges of bins (via an alternate Traverser constructor)
to support partitioned aggregate operations.  Also, read-only
operations give up if ever forwarded to a null table, which
provides support for shutdown-style clearing, which is also not
currently implemented.

遍历机制还适用于区间范围的部分遍历(通过另外的Traverser构造器)来支持分区聚合操作。

Lazy table initialization minimizes footprint until first use,
and also avoids resizings when the first operation is from a
putAll, constructor with map argument, or deserialization.
These cases attempt to override the initial capacity settings,
but harmlessly fail to take effect in cases of races.

延迟初始化可以避免putAll、带有map构造器、反序列化时进行重新扩容。

The element count is maintained using a specialization of
LongAdder. We need to incorporate a specialization rather than
just use a LongAdder in order to access implicit
contention-sensing that leads to creation of multiple
CounterCells.  The counter mechanics avoid contention on
updates but can encounter cache thrashing if read too
frequently during concurrent access. To avoid reading so often,
resizing under contention is attempted only upon adding to a
bin already holding two or more nodes. Under uniform hash
distributions, the probability of this occurring at threshold
is around 13%, meaning that only about 1 in 8 puts check
threshold (and after resizing, many fewer do so).

使用LongAdder来维护计数。在竞争的情况下会创建多个CounterCells。这种计数机制可以避免更新竞争,但是在并发访问期间过于频繁读取会造成缓存抖动。为了避免频繁读取,只有在向已经拥有两个或更多结点的桶添加元素时才尝试在竞争时扩容。在均匀的哈希分布情况下,发生检查是否达到threshold的概率为13%(意味着只有1/8的put操作会检查threshold)(扩容后,更少)。

TreeBins use a special form of comparison for search and
related operations (which is the main reason we cannot use
existing collections such as TreeMaps). TreeBins contain
Comparable elements, but may contain others, as well as
elements that are Comparable but not necessarily Comparable for
the same T, so we cannot invoke compareTo among them. To handle
this, the tree is ordered primarily by hash value, then by
Comparable.compareTo order if applicable.  On lookup at a node,
if elements are not comparable or compare as 0 then both left
and right children may need to be searched in the case of tied
hash values. (This corresponds to the full list search that
would be necessary if all elements were non-Comparable and had
tied hashes.) On insertion, to keep a total ordering (or as
close as is required here) across rebalancings, we compare
classes and identityHashCodes as tie-breakers. The red-black
balancing code is updated from pre-jdk-collections
(http://gee.cs.oswego.edu/dl/classes/collections/RBCell.java)
based in turn on Cormen, Leiserson, and Rivest "Introduction to
Algorithms" (CLR).

Treebins在搜索和相关操作中使用了特殊形式的比较(这也是不能使用现存集合例如TreeMap的主要原因)。Treebins包含Comparable元素,但是也可能包含其他类型的(不能调用compareTo的元素)。为了处理这种情况,红黑树首先按照hash值排序,然后再使用的时候使用Comparable.compareTo排序。

结点查找时,如果元素是不可比较的或者为0,则在teid hash值情况下左右子树都可能需要被搜索。(对应必须全链表搜索的情况:所有元素是不可比较的并且有tied hashes)。

插入时,在重平衡是为了保持总体顺序,比较classes和identityHashCodes as tie-breakes。

TreeBins also require an additional locking mechanism.  While
list traversal is always possible by readers even during
updates, tree traversal is not, mainly because of tree-rotations
that may change the root node and/or its linkages.  TreeBins
include a simple read-write lock mechanism parasitic on the
main bin-synchronization strategy: Structural adjustments
associated with an insertion or removal are already bin-locked
(and so cannot conflict with other writers) but must wait for
ongoing readers to finish. Since there can be only one such
waiter, we use a simple scheme using a single "waiter" field to
block writers.  However, readers need never block.  If the root
lock is held, they proceed along the slow traversal path (via
next-pointers) until the lock becomes available or the list is
exhausted, whichever comes first. These cases are not fast, but
maximize aggregate expected throughput.

TreeBin需要额外的锁机制。在更新期间链表遍历总是可以进行的,但是树遍历不行,因为树旋转时可能会改变根结点或者其链接。

TreeBin在主要的桶同步策略上个包含了一个简单的读写锁机制:插入或删除的结构调整是桶锁定(其他写线程不能冲突),必须等正在进行读完成。

读线程永远也不会阻塞,可以通过慢速遍历路径(通过next指针)前进直到锁变得可用或者列表读完了。可以最大化吞吐量期望。

Maintaining API and serialization compatibility with previous
versions of this class introduces several oddities. Mainly: We
leave untouched but unused constructor arguments refering to
concurrencyLevel. We accept a loadFactor constructor argument,
but apply it only to initial table capacity (which is the only
time that we can guarantee to honor it.) We also declare an
unused "Segment" class that is instantiated in minimal form
only when serializing.

维持与之前版本兼容:concurrencyLevel loadFactor Segment

Also, solely for compatibility with previous versions of this
class, it extends AbstractMap, even though all of its methods
are overridden, so it is just useless baggage.

维持与之前版本兼容:继承自AbstractMap,尽管所有的方法都重写了

This file is organized to make things a little easier to follow
while reading than they might otherwise: First the main static
declarations and utilities, then fields, then main public
methods (with a few factorings of multiple public methods into
internal ones), then sizing methods, trees, traversers, and
bulk operations.

类文件组织方式:

静态声明和实用程序

公共方法
size方法
tree
traverser
builk操作

2.为什么将链表转树的threshold设置为8?

1)空间分析:TreeNode比Node大不少
2)时间分析:单个TreeNode搜索操作是单个Node搜索操作两倍慢
所以综上,当链表较短时,TreeNode搜索操作优势并不明显,而且更浪费空间。
另外链表并发操作时效率比TreeNode更高(根节点会边)
3)为了维持较好的时空效率,所以只在极少的情况下才会转为红黑树(根据泊松分布,P(X=8)的概率极低)

你可能感兴趣的:(ConcurrentHashMap类实现说明)