Java集合·07·HashMap详解

一、概述

HashMap是一个散列表，存储内容是键值对(key-value)映射。

继承AbstractMap。实现Map、Cloneable、Serializable接口。

Map

An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.

定义了key-value键值对映射的基础接口。用于取代Dictionary。

定义内容，分为四块，Query Operations、Modification Operations、Views、Comparison and hashing：

Query Operations
- size()
- isEmpty()
- containsKey()
- containsValue()
- get(Object)
Modification Operations
- put(K, V)
- remove(Object)
- putAll(Map)
- clear()
Views
- keySet()
- values()
- entrySet()
Comparison and hashing
- equals(Object)
- hashCode()

与Dictionary的异同

相同点：

定义了key-value键值对映射的基础接口。

不同点：

类型不同。Dictionary为abstract class，Map为interface。
访问方式不同。Dictionary提供Enumeration遍历整个Dictionnary。Map中没有定义访问方式。
提供视图不同。Dictionary提供keySet、valueSet两个视图。Map中提供了keySet、valueSet、entrySet三个视图。
对null值容忍度不同。Dictionary不容许key或者value为null，Map没有说明，子类自己决定。

定义了Entry接口

interface Entry{
  K getKey();
  V getValue();
  V setValue(V v);
  boolean equal(Object o);
  int hashCode();
}

AbstractMap

提供了一个Map的框架实现。定义了了SimpleEntry、SimpleImmutableEntry。

注意：

equel()，遍历所有entry，看是否有不存在的键值对映射或者映射关系不一致。（注意对null值的处理）
hashCode()，遍历所有entry，所有entry的hashcode相加结果为最终hashcode结果

定义了SimpleEntry，保存key、value两个值，支持Entry所有方法。

定义了SimpleImmutableEntry，与SimpleEntry不同点为setValue(V value)方法抛出UnsupportedOperationException，不容许修改value。

二、数据结构

拉链法

定义了一个数组来存储HashMapEntry，每个HashMapEntry为一个节点。

transient HashMapEntry[] table = (HashMapEntry[]) EMPTY_TABLE;

定义了一个size记录当前元素数量。

定义了一个threshold记录下次需要resize时的容量，即触发rehash的元素数量。会在resize时更新。

定义了一个loadFactor记录设置，threshold = loadFactor * capactity + 1。

自定义HashMapEntry

继承Map.Entry，添加一个int对象的hashcode，一个HashMapEntry的引用next

注意：

hashCode()，key和value的hashcode取异或。Objects.hashCode(getKey())这种写法避免了空判断。
```
 return Objects.hashCode(getKey()) ^ Objects.hashCode(getValue());
```

equal()，注意==和equal方法同时使用。

public final boolean equals(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry e = (Map.Entry)o;
            Object k1 = getKey();
            Object k2 = e.getKey();
            if (k1 == k2 || (k1 != null && k1.equals(k2))) {
                Object v1 = getValue();
                Object v2 = e.getValue();
                if (v1 == v2 || (v1 != null && v1.equals(v2)))
                    return true;
            }
            return false;
        }

三、特点

容许key为null，容许value为null
线程不安全。如有需要使用Collection.synchronizedMap()。
无序，存入顺序和读取顺序不一致，且读取顺序也有可能变化

不保证有序(比如插入的顺序)、也不保证序不随时间变化
view的iterator都是fast-fail的

四、实现原理

1.基础功能实现

getEntry(Object key)

 final Entry getEntry(Object key) {
        if (size == 0) {
            return null;
        }

        int hash = (key == null) ? 0 : sun.misc.Hashing.singleWordWangJenkinsHash(key);
        for (HashMapEntry e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }
static int indexFor(int h, int length) {
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
        return h & (length-1);
    }

pub(K key, V value)

对key的hashCode()做hash，然后再计算index;

如果没碰撞直接放到bucket里；

如果碰撞了，以链表的形式存在buckets后；

如果碰撞导致链表过长(大于等于TREEIFY_THRESHOLD)，就把链表转换成红黑树；

如果节点已经存在就替换old value(保证key的唯一性)

如果bucket满了(超过load factor*current capacity)，就要resize。

 public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        int hash = sun.misc.Hashing.singleWordWangJenkinsHash(key);
        int i = indexFor(hash, table.length);
        for (HashMapEntry e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }
void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? sun.misc.Hashing.singleWordWangJenkinsHash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }
void createEntry(int hash, K key, V value, int bucketIndex) {
        HashMapEntry e = table[bucketIndex];
        table[bucketIndex] = new HashMapEntry<>(hash, key, value, e);
        size++;
    }

可能会触发resize，resize之后需要重新计算index，因为capacity改变了，插入时插入到列表头部。

remove，先找index，之后遍历链表删除节点

final Entry removeEntryForKey(Object key) {
        if (size == 0) {
            return null;
        }
        int hash = (key == null) ? 0 : sun.misc.Hashing.singleWordWangJenkinsHash(key);
        int i = indexFor(hash, table.length);
        HashMapEntry prev = table[i];
        HashMapEntry e = prev;

        while (e != null) {
            HashMapEntry next = e.next;
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k)))) {
                modCount++;
                size--;
                if (prev == e)
                    table[i] = next;
                else
                    prev.next = next;
                e.recordRemoval(this);
                return e;
            }
            prev = e;
            e = next;
        }

        return e;
    }

clear，置为null，不改变capacity

    public void clear() {
        modCount++;
        Arrays.fill(table, null);
        size = 0;
    }

2.hash相关

hash算法

int hashcode = sun.misc.Hashing.singleWordWangJenkinsHash(key);

rehash原理及时机

capacity容量，默认值4，capacity需要是2的幂，最大值为2的30次方。

load factor 负载因子，默认值0.75，时间和空间性能还ok。太大可以节约空间，但是会降低访问性能

The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.

The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.

触发扩容，会调整整个数据结构，bucket会扩充到两倍

When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

分为两步，resize和rehash

resize负责新生成一个数组，容量为2的幂次方。当单次put()触发时扩展为原来容量的2倍，当putAll批量添加时扩展为大于需求容量的最小的2的幂次方。

rehash，把数据重新计算hash值，并添加到新列表中

最后使用旧列表代替新列表

void resize(int newCapacity) {
        HashMapEntry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        HashMapEntry[] newTable = new HashMapEntry[newCapacity];
        transfer(newTable);
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }

void transfer(HashMapEntry[] newTable) {
        int newCapacity = newTable.length;
        for (HashMapEntry e : table) {
            while(null != e) {
                HashMapEntry next = e.next;
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

capacity限制

为何要是2的幂次方？因为扩容时提高效率

假设大小是7，index = hashcode % 7

假设大小是8，index = hashcode & 0x07

位运算比取余运算要快的多。

load factor影响

load factor的设置会影响到hashmap的写入和读取、删除性能

hashmap中读写删除都需要先找到对应的entry，再对entry进行操作。

假设load factor设置得很大，hashcode发生碰撞的几率较高，需要循环便利链表以找到目标元素，会降低性能。

假设load factor设置得很小，较少元素就会触发扩容，扩容时会把全部元素重新计算和添加，影响性能。

因此要设置合适的load factor。

新版优化点

链表替换为树，查找时间O(N)->O(logN)
优化hash算法，让尽量多的数据参与hash过程，使分布均匀
优化resize算法，resize时数据新位置有两种可能：原位置或者是原位置再移动2次幂的位置，所以不需要再次计算hash值和index值，根据原hash值的高位来确定新index值。

3.null特殊处理

hashmap支持nullkey：

当插入时，hashcode=0，index=0，插入到首位
当冲突（已有entry存在）时，会覆盖原先的entry

hashmap支持nullvalue：

直接设置为null

4.Iterator

HashIterator，abstract类，实现了Iterator接口，未实现next接口

HashMapEntry next;        // next entry to return
int expectedModCount;   // For fast-fail
int index;              // current slot
HashMapEntry current;     // current entry

注意点：

nextEntry时先判断HashMapEntry是否有next，如果没有了，跳转到下个index（slot／bucket）
remove的话调用基本方法

ValueIterator／KeyIterator／EntryIterator，继承HashIterator，实现next接口

返回value／key／entry

四、视图

KeySet

继承AbstractSet，调用HashMap中基本方法，iterator返回HashMap中的KeyIterator

Values

继承AbstractCollection，调用HashMap中基本方法，iterator返回HashMap中的ValueIterator

EntrySet

继承AbstractSet，调用HashMap中基本方法，iterator返回HashMap中的EntryIterator