HashMap TreeMap LinkedListHashMap源码解析

HashMap TreeMap LinkedListHashMap源码浅析

Map和Collection是不同的一套接口;也是常用的一种类型。
类图如下

image.png

An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.

This interface takes the place of the Dictionary class, which was a totally abstract class rather than an interface.

The Map interface provides three collection views, which allow a map's contents to be viewed as a set of keys, collection of values, or set of key-value mappings. The order of a map is defined as the order in which the iterators on the map's collection views return their elements. Some map implementations, like the TreeMap class, make specific guarantees as to their order; others, like the HashMap class, do not.

Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map. A special case of this prohibition is that it is not permissible for a map to contain itself as a key. While it is permissible for a map to contain itself as a value, extreme caution is advised: the equals and hashCode methods are no longer well defined on such a map.

All general-purpose map implementation classes should provide two "standard" constructors: a void (no arguments) constructor which creates an empty map, and a constructor with a single argument of type Map, which creates a new map with the same key-value mappings as its argument. In effect, the latter constructor allows the user to copy any map, producing an equivalent map of the desired class. There is no way to enforce this recommendation (as interfaces cannot contain constructors) but all of the general-purpose map implementations in the JDK comply.

The "destructive" methods contained in this interface, that is, the methods that modify the map on which they operate, are specified to throw UnsupportedOperationException if this map does not support the operation. If this is the case, these methods may, but are not required to, throw an UnsupportedOperationException if the invocation would have no effect on the map. For example, invoking the putAll(Map) method on an unmodifiable map may, but is not required to, throw the exception if the map whose mappings are to be "superimposed" is empty.

Some map implementations have restrictions on the keys and values they may contain. For example, some implementations prohibit null keys and values, and some have restrictions on the types of their keys. Attempting to insert an ineligible key or value throws an unchecked exception, typically NullPointerException or ClassCastException. Attempting to query the presence of an ineligible key or value may throw an exception, or it may simply return false; some implementations will exhibit the former behavior and some will exhibit the latter. More generally, attempting an operation on an ineligible key or value whose completion would not result in the insertion of an ineligible element into the map may throw an exception or it may succeed, at the option of the implementation. Such exceptions are marked as "optional" in the specification for this interface.

Many methods in Collections Framework interfaces are defined in terms of the equals method. For example, the specification for the containsKey(Object key) method says: "returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k))." This specification should not be construed to imply that invoking Map.containsKey with a non-null argument key will cause key.equals(k) to be invoked for any key k. Implementations are free to implement optimizations whereby the equals invocation is avoided, for example, by first comparing the hash codes of the two keys. (The Object.hashCode() specification guarantees that two objects with unequal hash codes cannot be equal.) More generally, implementations of the various Collections Framework interfaces are free to take advantage of the specified behavior of underlying Objectmethods wherever the implementor deems it appropriate.

Some map operations which perform recursive traversal of the map may fail with an exception for self-referential instances where the map directly or indirectly contains itself. This includes the clone(), equals(), hashCode() and toString() methods. Implementations may optionally handle the self-referential scenario, however most current implementations do not do so.

Map接口代表键值对的数据结构,用于替代Dictionary类,java.util确实有这么一个类,基于这个类实现了HashTable,不过这两个类目前都是废弃的,不要使用。

Map 可以有三个观察视角,entry(键值对)的set,key的set,value的collection。

Map默认至少有两个构造函数,一个是空参数,一个是使用其他map作为参数,相当于构造器浅拷贝。

有的Map没有实现所有的操作,会抛出UnsupportOperationException。

有的Map会对key和value的类型做限制,抛出NullPointerException或者CastClassException

有一些Map的方法,依赖于equals方法,比如containKey,这些依赖不是强制的,依赖于遵守一定的设置。

还有一些Map的方法,会进行递归遍历,所以Map直接或间接引用自己可能会产生异常,比如方法clone,equals,hashCode,tostring。

AbstractMap 定义了两个域,一个是keyset,一个是value的collection,使用迭代器把一些基本功能实现了。最后提供了一个静态内部类, 不可变的Entry

public static class SimpleImmutableEntry
        implements Entry, java.io.Serializable

HashMap

HashMap 是基于哈希散列表的实现:

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:

   Map m = Collections.synchronizedMap(new HashMap(...));

The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.

数据都保存在Node[] 数组中, 数组是Entry的实现,并且数组的大小适中是2的次方,这为hash算法提供了一个良好的空间。

transient Node[] table;

/**
 * Holds cached entrySet(). Note that AbstractMap fields are used
 * for keySet() and values().
 */
transient Set> entrySet;
static class Node implements Map.Entry {

查看hash是如何发生,可以查看put和remove方法,存放的时候要判断是否碰撞,遇到了碰撞就按照链表进行处理,但是如果链表长度达到8,就转成红黑树。 如果存放空间超过容量因子,默认是75%,就扩容。

因为太满,就会太容易碰撞。

详细可参考 :http://www.importnew.com/18633.html

LinekdHashMap

linkedhash是在ArrayHashMap的基础上,给每个Entry进行了扩展,使得Entry形成了一个双向链表。而底层依然用数组进行保存,但是在使用迭代器进行遍历的时候,是根据链表进行遍历,因此遍历的性能不太受到负载因子影响。

    static class Entry extends HashMap.Node {
        Entry before, after;
        Entry(int hash, K key, V value, Node next) {
            super(hash, key, value, next);
        }
    }
   /**
     * The head (eldest) of the doubly linked list.
     */
    transient LinkedHashMap.Entry head;

    /**
     * The tail (youngest) of the doubly linked list.
     */
    transient LinkedHashMap.Entry tail;    

另外LinkedHashMap ,是通过模板方式模式 和重载的方式,维护Entry的链表关系,默认情况下,新增节点都是添加到tail,删除节点都是,断开节点。代码如下:

//HashMap.java
 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict){
  ....
     if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
  }
final Node removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
 ...
  afterNodeRemoval(node);
....
 }
 
 
//LinkedHashMap.java
void afterNodeRemoval(Node e) { // unlink
        LinkedHashMap.Entry p =
            (LinkedHashMap.Entry)e, b = p.before, a = p.after;
        p.before = p.after = null;
        if (b == null)
            head = a;
        else
            b.after = a;
        if (a == null)
            tail = b;
        else
            a.before = b;
    }

    void afterNodeInsertion(boolean evict) { // possibly remove eldest
        LinkedHashMap.Entry first;
        if (evict && (first = head) != null && removeEldestEntry(first)) {
            K key = first.key;
            removeNode(hash(key), key, null, false, true);
        }
    }

    void afterNodeAccess(Node e) { // move node to last
        LinkedHashMap.Entry last;
        if (accessOrder && (last = tail) != e) {
            LinkedHashMap.Entry p =
                (LinkedHashMap.Entry)e, b = p.before, a = p.after;
            p.after = null;
            if (b == null)
                head = a;
            else
                b.after = a;
            if (a != null)
                a.before = b;
            else
                last = b;
            if (last == null)
                head = p;
            else {
                p.before = last;
                last.after = p;
            }
            tail = p;
            ++modCount;
        }

afterNodeRemoval和afterNodeInsertion这两个方法维护了连接关系。

初次之外,LinkedListMap还有一种顺序模式,就是访问顺序,就是访问操作的元素,就是被访问的元素。被访问的元素放到链表的最后,因此最近访问的元素,是链表的末尾。

这些操作属于访问操作,其他操作都不是访问操作:

  • put, putIfAbsent, get, getOrDefault, compute, computeIfAbsent, computeIfPresent, or merge
  • replace 如果元素被替换了就算访问
  • putAll 如果发生了元素访问(key存在)

有一个标志位来设置模式

final boolean accessOrder
true: 访问顺序
false: 插入顺序

因此在构造器内设置好模式,在调用访问操作时,调用afterNodeAccess,保证访问顺序。

TreeMap

TreeMap 是按照比较器(comparator)的比较顺序,遍历所有的entry,这是TreeMap的特点。 如果在构造器中不提供Comparator,那么默认假定Key一定要实现Comparable,否则会抛CastClassException

 @SuppressWarnings("unchecked")
    final int compare(Object k1, Object k2) {
        return comparator==null ? ((Comparable)k1).compareTo((K)k2)
            : comparator.compare((K)k1, (K)k2);
 }

可以注意到,TreeMap的内部类非常多,前面在HashMap的内部类比TreeNode少一些,一般那内部类是HahsMap的Entry实现(Node,TreeNode),三个视角(key,value,entry)的 集合类,分别的迭代器,分别的spliterator的迭代器。

//三个视角的容器
Values
EntrySet
KeySet
//三个视角的迭代器
PrivateEntryIterator // 抽象类,父类
EntryIterator
ValueIterator
KeyIterator

DescendingKeyIterator
NavigableSubMap
AscendingSubMap
DescendingSubMap
SubMap
Entry
TreeMapSpliterator

KeySpliterator
DescendingKeySpliterator
ValueSpliterator
EntrySpliterator

这里多出了DescendingKeyIterator,降序迭代器,还有一堆subMap。

则可以可以看TreeMap实现的接口,SortedMap , NavigableMap 的设计要求和目标。

SortedMap

Note that the ordering maintained by a sorted map (whether or not an explicit comparator is provided) must be consistent with equals if the sorted map is to correctly implement the Map interface.
sorted map 的顺序 必须 和equals的一致,如果sorted map 正确的实现了Map的接口。

(See the Comparable interface or Comparator interface for a precise definition of consistent with equals.)
(Comparable/Comparator的接口是精确定义了要和equals一致的)

This is so because the Map interface is defined in terms of the equals operation,
这是因为Map接口是按照equals操作来定义的,
but a sorted map performs all key comparisons using its compareTo (or compare) method, 
但 sorted map 都是使用compareTo来执行key的比较操作。
so two keys that are deemed equal by this method are, from the standpoint of the sorted map, equal.
所以两个key使用比较方法认为是相等的,从sorted map的角度上说,是相当的。

The behavior of a tree map is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Map interface.
tree map的行为是明确定义的,即使它的顺序和equals方法不一致;这样的话他只是不遵循Map接口的一般约定。

概括的说,就是sortedmap本身是按照比较来排序,最好也遵循equals的操作,如果不遵守也只是破坏了map接口的约定,sotedmap本身没有这种约定。可以参考https://stackoverflow.com/questions/46912097/case-insensitive-comparator-breaks-my-treemap


java.util.SortedMap#comparator
//返回部分map
java.util.SortedMap#subMap  //半开  [from, end)
java.util.SortedMap#headMap  //  [开头,  toKey)
java.util.SortedMap#tailMap  // (fromKey,结尾]
//头和尾
java.util.SortedMap#firstKey
java.util.SortedMap#lastKey
//三个视角
java.util.SortedMap#keySet
java.util.SortedMap#values
java.util.SortedMap#entrySet

subMap返回的是 截取的map。

再看看 NavigableMap,这个接口的目标是更多访问API,得到最接近搜索的目标的结果。

// 小于目标值的 最大的Key或Entry  
// max( i < target )
java.util.NavigableMap#lowerEntry
java.util.NavigableMap#lowerKey   
//  max( i <= target)
java.util.NavigableMap#floorEntry
java.util.NavigableMap#floorKey
//  min( i >= target);
java.util.NavigableMap#ceilingEntry
java.util.NavigableMap#ceilingKey
// min(i > target);
java.util.NavigableMap#higherEntry
java.util.NavigableMap#higherKey

// 首尾Entry
java.util.NavigableMap#firstEntry
java.util.NavigableMap#lastEntry
// 返回头尾,并移除,有点像队列操作
java.util.NavigableMap#pollFirstEntry
java.util.NavigableMap#pollLastEntry
// 降序的map,即和默认顺序相反
java.util.NavigableMap#descendingMap
// 返回Key的Set,不过这个返回是实现NavigableSet接口,即排序的,可搜索的
java.util.NavigableMap#navigableKeySet
// 降序的KeySet
java.util.NavigableMap#descendingKeySet
// submap 可以要求开区间还是闭区间。
java.util.NavigableMap#subMap(K, boolean, K, boolean)

// 小于某个值的Key组成的map,可以设置开闭区间, 必须返回navigableMap
java.util.NavigableMap#headMap(K, boolean)
java.util.NavigableMap#tailMap(K, boolean)
// 来自SortedMap接口
java.util.NavigableMap#subMap(K, K)
java.util.NavigableMap#headMap(K)
java.util.NavigableMap#tailMap(K)

再回来看TreeMap的实现,这些subMap的内部类就是提供给这些接口使用

看一下数据结构,先看成员变量:

private transient Entry root;
   // Red-black mechanics

    private static final boolean RED   = false;
    private static final boolean BLACK = true;

TreeMap是使用红黑树实现的,红黑树是自平衡的二叉查找树,因此迭代的过程,就是遍历树的过程。那么比较大小的时候就使用比较器比较。另外前面HashMap里不是实现了一个红黑树的Node么,是不是可以拿过来用呢,不可以哦,这里只需要树,不需要维护一个Hash桶,因此这里又实现了一遍。

插入删除查找,这些操作都是红黑树基本操作,那么我们看一下navigateMap里的那么操作是如何实现的:

    /**
     * @since 1.6
     */
    public Map.Entry firstEntry() {
        return exportEntry(getFirstEntry());
    }

    /**
     * @since 1.6
     */
    public Map.Entry lastEntry() {
        return exportEntry(getLastEntry());
    }
    
      static  Map.Entry exportEntry(TreeMap.Entry e) {
        return (e == null) ? null :
            new AbstractMap.SimpleImmutableEntry<>(e);
    }

找到相应的Entry,并返回一个不可变的Entry,为了取得那一时刻的快照, 不可以修改。所以修改入口只能是get 之后进行修改, 或者迭代器获得的进行修改。Navigate的都不可以修改,那是一个导航视角,只读。

subMap,tailMap, headMap的实现借助于内部类AscendingSubMap, 使用委托进行操作,通过上下界控制边界, AscendingSubMap继承NavigableSubMap;

abstract static class NavigableSubMap extends AbstractMap
        implements NavigableMap, java.io.Serializable {
        private static final long serialVersionUID = -2102997345730753016L;
        /**
         * The backing map.
         */
        final TreeMap m;

        /**
         * Endpoints are represented as triples (fromStart, lo,
         * loInclusive) and (toEnd, hi, hiInclusive). If fromStart is
         * true, then the low (absolute) bound is the start of the
         * backing map, and the other values are ignored. Otherwise,
         * if loInclusive is true, lo is the inclusive bound, else lo
         * is the exclusive bound. Similarly for the upper bound.
         */
        final K lo, hi;
        final boolean fromStart, toEnd;
        final boolean loInclusive, hiInclusive;

大体就是这样。

小结:

  1. HashMap 是2次幂大小的数组存放Entry,利用HashCode >>> 16 ^ HashCode 并按照桶 求模存放,碰撞时,按照链表存放,链表长度超过8,按照红黑树处理。
  2. LinkedHashMap,继承HashMap,在Entry的基础上加上双向链表,遍历时不太受负载因子的影响,可以支持插入顺序和访问顺序两种模式
  3. TreeMap是红黑树实现,可以自定义comparator,元素顺序按照比较顺序,提供大量的导航访问API。

因此一般使用HashMap就够了,如果要求最近访问,插入顺序,可以使用LinkedHashMap,如果要自定义顺序,还要各种访问可以选TreeMap。

END

你可能感兴趣的:(HashMap TreeMap LinkedListHashMap源码解析)