聊聊java中的哪些Map:(四)LinkedHashMap源码分析

[toc]
在前面对LinkedList进行分析的时候说到,LinkedList实际上性能比ArrayList不会高多少,只有在前向插入的时候才能比ArrayList性能高。因为LinkedList虽然在remove和insert的操作不需要数据拷贝,但是寻址需要时间,也就是说此从链表中找到需要操作的节点需要时间,只能根据链表挨个遍历。那么当时就在想,查询链表中的某一个元素能不能将O(n)的时间复杂度变为O(1)呢,那样就能充分利用链表的特点。实际上我们本章讨论的LinkedHashMap就是这样一个数据结构。其综合了HashMap和链表的优点,虽然数据结构比LinkedList更加复杂,每一个节点Entry都增加了很多指针,但是在某些场景下,是可以同时发挥Hashmap和链表的优点的数据结构。

1.类结构及其成员变量

1.1 类的基本结构

LinkedHashMap本质上只是HashMap的一个子类,之后在HashMap的基础之上,扩充了链表功能。类结构如下:


聊聊java中的哪些Map:(四)LinkedHashMap源码分析_第1张图片
image.png
/**
 * 

Hash table and linked list implementation of the Map interface, * with predictable iteration order. This implementation differs from * HashMap in that it maintains a doubly-linked list running through * all of its entries. This linked list defines the iteration ordering, * which is normally the order in which keys were inserted into the map * (insertion-order). Note that insertion order is not affected * if a key is re-inserted into the map. (A key k is * reinserted into a map m if m.put(k, v) is invoked when * m.containsKey(k) would return true immediately prior to * the invocation.) * *

This implementation spares its clients from the unspecified, generally * chaotic ordering provided by {@link HashMap} (and {@link Hashtable}), * without incurring the increased cost associated with {@link TreeMap}. It * can be used to produce a copy of a map that has the same order as the * original, regardless of the original map's implementation: *

 *     void foo(Map m) {
 *         Map copy = new LinkedHashMap(m);
 *         ...
 *     }
 * 
* This technique is particularly useful if a module takes a map on input, * copies it, and later returns results whose order is determined by that of * the copy. (Clients generally appreciate having things returned in the same * order they were presented.) * *

A special {@link #LinkedHashMap(int,float,boolean) constructor} is * provided to create a linked hash map whose order of iteration is the order * in which its entries were last accessed, from least-recently accessed to * most-recently (access-order). This kind of map is well-suited to * building LRU caches. Invoking the {@code put}, {@code putIfAbsent}, * {@code get}, {@code getOrDefault}, {@code compute}, {@code computeIfAbsent}, * {@code computeIfPresent}, or {@code merge} methods results * in an access to the corresponding entry (assuming it exists after the * invocation completes). The {@code replace} methods only result in an access * of the entry if the value is replaced. The {@code putAll} method generates one * entry access for each mapping in the specified map, in the order that * key-value mappings are provided by the specified map's entry set iterator. * No other methods generate entry accesses. In particular, operations * on collection-views do not affect the order of iteration of the * backing map. * *

The {@link #removeEldestEntry(Map.Entry)} method may be overridden to * impose a policy for removing stale mappings automatically when new mappings * are added to the map. * *

This class provides all of the optional Map operations, and * permits null elements. Like HashMap, it provides constant-time * performance for the basic operations (add, contains and * remove), assuming the hash function disperses elements * properly among the buckets. Performance is likely to be just slightly * below that of HashMap, due to the added expense of maintaining the * linked list, with one exception: Iteration over the collection-views * of a LinkedHashMap requires time proportional to the size * of the map, regardless of its capacity. Iteration over a HashMap * is likely to be more expensive, requiring time proportional to its * capacity. * *

A linked hash map has two parameters that affect its performance: * initial capacity and load factor. They are defined precisely * as for HashMap. Note, however, that the penalty for choosing an * excessively high value for initial capacity is less severe for this class * than for HashMap, as iteration times for this class are unaffected * by capacity. * *

Note that this implementation is not synchronized. * If multiple threads access a linked hash map concurrently, and at least * one of the threads modifies the map structurally, it must be * synchronized externally. This is typically accomplished by * synchronizing on some object that naturally encapsulates the map. * * If no such object exists, the map should be "wrapped" using the * {@link Collections#synchronizedMap Collections.synchronizedMap} * method. This is best done at creation time, to prevent accidental * unsynchronized access to the map:

 *   Map m = Collections.synchronizedMap(new LinkedHashMap(...));
* * A structural modification is any operation that adds or deletes one or more * mappings or, in the case of access-ordered linked hash maps, affects * iteration order. In insertion-ordered linked hash maps, merely changing * the value associated with a key that is already contained in the map is not * a structural modification. In access-ordered linked hash maps, * merely querying the map with get is a structural modification. * ) * *

The iterators returned by the iterator method of the collections * returned by all of this class's collection view methods are * fail-fast: if the map is structurally modified at any time after * the iterator is created, in any way except through the iterator's own * remove method, the iterator will throw a {@link * ConcurrentModificationException}. Thus, in the face of concurrent * modification, the iterator fails quickly and cleanly, rather than risking * arbitrary, non-deterministic behavior at an undetermined time in the future. * *

Note that the fail-fast behavior of an iterator cannot be guaranteed * as it is, generally speaking, impossible to make any hard guarantees in the * presence of unsynchronized concurrent modification. Fail-fast iterators * throw ConcurrentModificationException on a best-effort basis. * Therefore, it would be wrong to write a program that depended on this * exception for its correctness: the fail-fast behavior of iterators * should be used only to detect bugs. * *

The spliterators returned by the spliterator method of the collections * returned by all of this class's collection view methods are * late-binding, * fail-fast, and additionally report {@link Spliterator#ORDERED}. * *

This class is a member of the * * Java Collections Framework. * * @implNote * The spliterators returned by the spliterator method of the collections * returned by all of this class's collection view methods are created from * the iterators of the corresponding collections. * * @param the type of keys maintained by this map * @param the type of mapped values * * @author Josh Bloch * @see Object#hashCode() * @see Collection * @see Map * @see HashMap * @see TreeMap * @see Hashtable * @since 1.4 */ public class LinkedHashMap extends HashMap implements Map { }

以上是LinkedHashmap源码中的相关描述。可以看到LinkedHashMap继承了Hashmap,同时实现了Map的接口。其注释大意为:
LinkedHashMap同时实现了Hash表和链表这两种数据结构,是一种有序的数据结构,与Hashmap相比,这个实现维护了一个双向链表的数据结构,这个数据结构定义了一个以插入顺序为序的迭代顺序结构。需要注意的是,如果一个Key重复插入,其迭代顺序不会改变,其会按之前第一次的插入顺序返回。
这个数据结构的实现其目的是为了避免HashMap和hashTable的无序结构,同时又不会想TreeMap那样为了达到有序而带来一些额外的开销。它可以生成一些具有原始顺序副本,在实现过程中可以完全不用考虑原始的实现。

     void foo(Map m) {
         Map copy = new LinkedHashMap(m);
        ...
     }

如果一个模块在输入的时候获取一个map,之后复制,然后返回结果,其顺序由复制的时候的顺序而决定,那么这种技术特别有用,(客户通常喜欢按提交的顺序归还物品。)
特殊的构造函数LinkedHashMap(int,float,boolean)用于创建迭代顺序为上次访问顺序的链表hash结构。从最近最少访问到最近访问,这种map很时候构建LRU缓存。调用put、 get、getOrDefault、compute、computeIfPresent、merge等方法将导致对应条目的访问(假设它在调用完成后存在)。replace仅仅在值被替换的时候才导致对条目的访问,putAll方法为指定映射中的每个映射生成一个条目访问,顺序是指映射的条目集迭代器提供的key-value映射。
没有其他的访问生成入口访问。尤其是需要说明的是,对集合视图的操作不会影响备份映射的迭代顺序。
removeEldestEntry(Map.Entry)方法可能会被重写,以便向Map中添加新的映射时强制执行一个策略。以自动删除过时的映射。
HashMap提供了所有的Map的操作,允许空值,与HashMap一样,其提供了恒定时间性能的add、contains、remove操作。假定hash足够离散,元素都分散在各个bucket中。其性能只比HashMap略低,比链表的时间略高。有一个例外的情况,对LinkedHashMap的集合视图进行迭代所需的时间与映射的大小成正比,而不管其容量如何, 在HashMap上迭代的性能可能比较低,需要的时间与其bucket的容量有关。而LinkedHashMap的迭代性能只与元素的数量有关。
LinkedHashMap有两个影响其性能的参数,初始化容量和负载因子。他们在Hashmap中有精确的定义。但是需要注意的是,对于LinkedHashMap来说,一旦初始容量选择过高带来的弊端比HashMap要轻,因为LinkedHashMap的迭代次数不受容量的影响。
需要注意的是,LinkedHashMap是不同步的,如果多个线程需要同时访问,并且至少有一个线程在结构上对其进行了修改,则它必须在外部同步。这通常是在一些自然封装的映射的对象上来同步实现的。
如果没有这样的对象,那么应该用Collections.synchronizedMap将这个对象包裹。最好在对象创建的时候就完成,以免造成非同步的访问。

 Map m = Collections.synchronizedMap(new LinkedHashMap(...));

结构修改时添加或者删除一个或者多个操作的任何映射操作。或者在访问顺序LinkedHashMap的情况下,影响其迭代顺序的任何操作。在插入顺序的LinkedHashMap中,仅仅是更改map中已包含的key相关的值不是结构修改。在有访问顺序的linkedHashmap中,仅仅使用get进行查询也是一种结构修改。
类的所有集合视图方法返回的collections的iterator方法都是fail-fast的。如果迭代器在创建之后任何时候被修改,以任何方式(除了迭代器自己的remove)对map进行修改。迭代器将抛出ConcurrentModificationException。因此在并发修改的情况下,迭代器会快速的失败,而不是将来在某个不确定的时间,冒着任意的的不确定的风险。
需要注意的是迭代器的fail-fast行为不能得到保证,因为一般来说,存在不同步的并发修改时不可能做出任何保证。fail-fast机制尽力的抛出了ConcurrentModificationException,因此在编写一个依赖于这个异常来保证其正确性的程序是错误的。迭代器的fail-fast机制只用于检测bug。
集合的spliterators方法返回的所有Spliterator集合的late-binding和fail-fast参见Spliterator#ORDERED。
这个类是java集合框架的成员之一。

由此可以看出,LinkedHashMap实际上兼顾了HashMap和链表的优点,默认情况下可以按照插入序构建链表,另外,还可以做为LRU的缓存使用。

1.2 核心内部类Entry

我们在前面说过,HashMap树化之后,TreeNode节点是LinkedHashMap.Entry的子类。而LinkedHashMap.Entry又是HashMap.Node的子类。HashMap.Node又继承了Map.Entry。这个继承关系如下图:


聊聊java中的哪些Map:(四)LinkedHashMap源码分析_第2张图片
image.png
/*
 * Implementation note.  A previous version of this class was
 * internally structured a little differently. Because superclass
 * HashMap now uses trees for some of its nodes, class
 * LinkedHashMap.Entry is now treated as intermediary node class
 * that can also be converted to tree form. The name of this
 * class, LinkedHashMap.Entry, is confusing in several ways in its
 * current context, but cannot be changed.  Otherwise, even though
 * it is not exported outside this package, some existing source
 * code is known to have relied on a symbol resolution corner case
 * rule in calls to removeEldestEntry that suppressed compilation
 * errors due to ambiguous usages. So, we keep the name to
 * preserve unmodified compilability.
 *
 * The changes in node classes also require using two fields
 * (head, tail) rather than a pointer to a header node to maintain
 * the doubly-linked before/after list. This class also
 * previously used a different style of callback methods upon
 * access, insertion, and removal.
 */

/**
 * HashMap.Node subclass for normal LinkedHashMap entries.
 */
static class Entry extends HashMap.Node {
    Entry before, after;
    Entry(int hash, K key, V value, Node next) {
        super(hash, key, value, next);
    }
}

Entry主要的作用是实现对Node节点的链表化。构建一个新的双向链表来做为HashMap的扩展。之后LinkedHashMap可以做为链表使用。
Entry主要增加了before和after两个指针,以将HashMap的全部Entry变成双向链表。
我们可以看到的是,在jdk1.4的时候就完成了LinkedHashMap。在1.7之前没有引入红黑树的时候,Hashmap中的Entry继承关系就存在。而在引入红黑树之后,直接又定义了一个新的内部类TreeNode来继承LinkedHashMap.Entry节点。这也为我们后续代码重构提供了一个新的思路。

1.3 重要的成员变量

我们可以看到,在LinkedHashMap中扩张的成员变量位head、tail。分别指向链表的首和尾。

/**
 * The head (eldest) of the doubly linked list.
 */
transient LinkedHashMap.Entry head;

/**
 * The tail (youngest) of the doubly linked list.
 */
transient LinkedHashMap.Entry tail;

/**
 * The iteration ordering method for this linked hash map: true
 * for access-order, false for insertion-order.
 *
 * @serial
 */
final boolean accessOrder;

这三个变量均用transient修饰,也就是说,序列化的时候这三个属性不会被序列化,那么LinkedHashMap如何确保对象能从序列化中还原呢?那么与HashMap相同,自定义了序列化的方法,这在后面将单独介绍。
head表示双向链表的头部,也就是最早插入的元素。而tail将表示链表的尾部,为最近插入的元素。accessOrder表示链表的迭代顺序,当为true的时候,按访问顺序也就是最近访问的元素将被移动到head,如果为false,则按照插入顺序。

2.LinkedHashMap的基本原理

看过第一部分的内容之后,我们对LinkedHashMap的基本原理有了一个最基本的了解。其本身就是一个在Hashmap的基础上的扩展,让HashMap的每一个元素都添加了链表的指针结构。
那么我们假定存在如下LinkedHashMap,在accessOrder为false的情况下。我们插入了5个Entry。其结构如下图所示:


聊聊java中的哪些Map:(四)LinkedHashMap源码分析_第3张图片
image.png

上图绿色箭头表示after指针。红色箭头表示before指针。上图仅仅表示举例对LinkedHashMap的结构进行说明,在真实的情况下可能很难出现上述这种HashMap。
LinkedHashMap中存在两个指征,head和tail。分别指向链表的第一个插入的元素和最后插入的元素。这样就能分别从这两个指征获取Entry并对LinkedHashMap的各个元素遍历。

3.构造函数

LinkedhashMap主要提供了如下4个构造函数,由于LinkedhashMap能够通过accessorder来控制链表的迭代的顺序,实际上就是控制按插入序还是按访问序。

3.1 LinkedHashMap()

空的构造函数。

public LinkedHashMap() {
    super();
    accessOrder = false;
}

调用了super,即HashMap中的构造函数。初始化长度为16,默认负载因子为0.75。accessorder为false,表示按插入序。

3.2 LinkedHashMap(int initialCapacity)

指定初始化的容量。

public LinkedHashMap(int initialCapacity) {
    super(initialCapacity);
    accessOrder = false;
}

同样使用了HashMap的构造函数。默认的accessorder为false按插入序。负载因子为0.75。

3.3 LinkedHashMap(int initialCapacity, float loadFactor)

通过这个构造函数可以同时指定初始容量和负载因子。

public LinkedHashMap(int initialCapacity, float loadFactor) {
    super(initialCapacity, loadFactor);
    accessOrder = false;
}

3.4 LinkedHashMap(Map m)

将一个map通过构造函数的方式按插入序变成LinkedHashMap。但是需要注意的是,HashMap本身无序,因此此处的插入序是不确定的。

public LinkedHashMap(Map m) {
    super();
    accessOrder = false;
    putMapEntries(m, false);
}

但是在插入之后,会实现按putMapEntrues中的方法进行插入。
putMapEntries也是HashMap中的方法,按table进行遍历。

3.5 LinkedHashMap(int initialCapacity,float loadFactor,boolean accessOrder)

这个构造函数是LinkedHashMap最重要的一个构造函数,因为如果我们要通过LinkedHashMap来实现一个LRU的缓存,那么必须采用这个构造函数,将accessOrder指定为true。

public LinkedHashMap(int initialCapacity,
                     float loadFactor,
                     boolean accessOrder) {
    super(initialCapacity, loadFactor);
    this.accessOrder = accessOrder;
}

初始容量和负载因子都能自行控制。accessOrder只是控制了一个变量,但是后续在各get和remove的方法中将会根据这个变量变成不同的行为。

4.重要方法

4.1 put

我们先来看看LinkedHashMap中如何进行Put的。
实际上LinkedHashMap没有重载put方法,而是利用了HashMap中的put。

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

实际上还是调用的putVal

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node[] tab; Node p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

以上是HashMap中的put方法,我们发现,putVal是final修饰的,也就是说LinkedhashMap无法重写这个方法。那么HashMap中是否会考虑到put之后链表的关系如何维护呢?因为putVal中全部都是对Hashmap是否需要树化的操作。实际上,在Hashmap中的putVal中,已经预留了后处理的方法。

// Callbacks to allow LinkedHashMap post-actions
void afterNodeAccess(Node p) { }
void afterNodeInsertion(boolean evict) { }
void afterNodeRemoval(Node p) { }

上述三个方法在HashMap中的时候都是空的。就是为了预留给LinkedHashMap进行使用的。
链表的维护操作都在这三个方法中。分别对应访问和插入、删除操作。我们后续来看看这三个方法。

4.2 afterNodeAccess

此方法是access之后的后处理操作,如果accessOrder为true,那么被access的节点就要移动到tail。tail就是最新的节点。head是最老的节点。

void afterNodeAccess(Node e) { // move node to last
    LinkedHashMap.Entry last;
    //判断accessOrder是否为true,当前节点不为tail
    if (accessOrder && (last = tail) != e) {
        LinkedHashMap.Entry p =
            (LinkedHashMap.Entry)e, b = p.before, a = p.after;
        //定义指针
        p.after = null;
        //将e的after变为null
        if (b == null)
            head = a;
        else
            b.after = a;
        //
        if (a != null)
            a.before = b;
        else
            last = b;
        if (last == null)
            head = p;
        else {
            p.before = last;
            last.after = p;
        }
        //tail指向p
        tail = p;
        ++modCount;
    }
}

4.3 afterNodeInsertion

我们可以看到,在Insert之后调用afterNodeInsertion,在LinkedHashMap中的真正意图就是可能会将最老的元素删除。但是实际上,由于removeEldestEntry始终返回false,那么也就意味着这个方法中的逻辑永远不会被执行。

void afterNodeInsertion(boolean evict) { // possibly remove eldest
    LinkedHashMap.Entry first;
    if (evict && (first = head) != null && removeEldestEntry(first)) {
        K key = first.key;
        removeNode(hash(key), key, null, false, true);
    }
}
protected boolean removeEldestEntry(Map.Entry eldest) {
    return false;
}

实际上afterNodeInsertion不会做任何操作。也就是说,LinkedHashMap本身并不提供删除最老元素的实现,如果要实现LRU的缓存,对老元素进行删除的话,可能这个方法需要我们自己重写来实现。

4.4 afterNodeRemoval

这个方法的意思是在删除元素之后,将这个元素的前后的元素重写组成链表。

void afterNodeRemoval(Node e) { // unlink
    LinkedHashMap.Entry p =
        (LinkedHashMap.Entry)e, b = p.before, a = p.after;
    p.before = p.after = null;
    if (b == null)
        head = a;
    else
        b.after = a;
    if (a == null)
        tail = b;
    else
        a.before = b;
}

4.5 get

get方法则进行重写:

public V get(Object key) {
    Node e;
    if ((e = getNode(hash(key), key)) == null)
        return null;
    if (accessOrder)
        afterNodeAccess(e);
    return e.value;
}

实际上还是调用的HashMap方法中的getNode方法。之后再调用afterNodeAccess进行整理。实际上accessOrder方法最关键的就是通过这个afterNodeAccess方法来调整链表的顺序。

4.6 remove

remove方法实际上是HashMap中的remove方法:

public V remove(Object key) {
    Node e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

之后在removeNode方法中调用了afterNodeRemoval方法。对链表进行整理。

5.视图类

同样,LinkedHashMap也存在基于key、values和entrys的视图类。其原理与HashMap类似,只不过,在某些地方重写以实现了链表的遍历及特性。

public Set keySet() {
    Set ks = keySet;
    if (ks == null) {
        ks = new LinkedKeySet();
        keySet = ks;
    }
    return ks;
}

final class LinkedKeySet extends AbstractSet {
    public final int size()                 { return size; }
    public final void clear()               { LinkedHashMap.this.clear(); }
    public final Iterator iterator() {
        return new LinkedKeyIterator();
    }
    public final boolean contains(Object o) { return containsKey(o); }
    public final boolean remove(Object key) {
        return removeNode(hash(key), key, null, false, true) != null;
    }
    public final Spliterator spliterator()  {
        return Spliterators.spliterator(this, Spliterator.SIZED |
                                        Spliterator.ORDERED |
                                        Spliterator.DISTINCT);
    }
    public final void forEach(Consumer action) {
        if (action == null)
            throw new NullPointerException();
        int mc = modCount;
        for (LinkedHashMap.Entry e = head; e != null; e = e.after)
            action.accept(e.key);
        if (modCount != mc)
            throw new ConcurrentModificationException();
    }
}

如上即是keySet,其迭代器和Spliterator都存在。

5.1 Iterator

final class LinkedKeyIterator extends LinkedHashIterator
    implements Iterator {
    public final K next() { return nextNode().getKey(); }
}

我们可以看到基于Key的Iterator,实际上比HashMap中的实现方式简单,因为可以直接使用链表。直接从链表返回。同理LinkedValueIterator和LinkedEntryIterator也相同。

5.2 Spliterators

我们可以看到在LinkedKeySet中:

public final Spliterator spliterator()  {
        return Spliterators.spliterator(this, Spliterator.SIZED |
                                        Spliterator.ORDERED |
                                        Spliterator.DISTINCT);
    }

这个方法比HashMap的Spliterator多了一个属性即是Spliterator.ORDERED 。
同理LinkedEntrySet 及LinkedValues都有类似的结构,这就不一一进行分析了。

6.总结

以上本文是我对LinkedHashMap的理解,LinkedHashMap也是一个常用的集合类,兼具了Hashmap和链表的优点。在get元素的时候可以巧妙的利用HashMap的特性将查找的性能降低到O(1)。这也是我们在某些情况下可以利用的。
比如在某些情况下,实现一个基于LRU的缓存,虽然LinkedHashMap本身没有提供删除元素的方法,但是我们可以根据list的长度自行控制,利用accessorder的特性,将超过一定长度的head元素删除。

你可能感兴趣的:(聊聊java中的哪些Map:(四)LinkedHashMap源码分析)