JAVA常用数据结构

背景

所有JAVA开发工程师在日常开发工作中,离不开JAVA常用数据结构,比如List、Map、Set等。对其内部原理的了解能够更好地理解这些数据结构的适用场合,避免使用不当引发的诡异问题。本文将对这些常用的JAVA中的数据结构进行一个简单的介绍。

JAVA中的数据结构简述

JAVA中常用的数据结构主要有这样几种分类:

1. List:可存储相同的值(确切讲是a.equals(b)时,二者都可存储)。我们会挑选适宜连续存储的ArrayList和链式存储的LinkedList进行介绍。

2. Set:不可存储相同值。挑选线程不安全的HashSet和线程安全的ConcurrentHashSet进行介绍。

3. Map:存储key-value形式的数据。挑选线程不安全的HashMap和线程安全的ConcurrentHashMap进行介绍。

List

名称

特点

适用场合

不适用的场合

ArrayList

内部采用数组顺序存储

1.       数据连续写入,需要根据index进行查找;

2.       按index写入和删除少

需要按index写入数据或删除

LinkedList

采用链表进行存储

1.       数据需要按index插入或删除;

2.       按index查找少

需要根据index进行查找

ArrayList

主要属性

 /**
     * The array buffer into which the elements of the ArrayList are stored.
     * The capacity of the ArrayList is the length of this array buffer. Any
     * empty ArrayList with elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA
     * will be expanded to DEFAULT_CAPACITY when the first element is added.
     */
    transient Object[] elementData; // non-private to simplify nested class access

    /**
     * The size of the ArrayList (the number of elements it contains).
     *
     * @serial
     */
    private int size;

elementData用于存储数据,size用于标识其存储的元素个数。

构造函数

共有三种构造函数。

无参数构造函数,元素存储数组设置为为空数组,初始size为10。

    /**
     * Constructs an empty list with an initial capacity of ten.
     */
    public ArrayList() {
        this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
    }

给定元素初始个数的构造函数,以传入的initialCapacity为数组长度,创建数组。如果开发者在创建ArrayList的时候,能够预估元素个数,建议采用这个构造函数。原因是可在构造函数中,创建足够长度的数组,而不是在添加元素的时候,再去扩充数组长度。

  /**
     * Constructs an empty list with the specified initial capacity.
     *
     * @param  initialCapacity  the initial capacity of the list
     * @throws IllegalArgumentException if the specified initial capacity
     *         is negative
     */
    public ArrayList(int initialCapacity) {
        if (initialCapacity > 0) {
            this.elementData = new Object[initialCapacity];
        } else if (initialCapacity == 0) {
            this.elementData = EMPTY_ELEMENTDATA;
        } else {
            throw new IllegalArgumentException("Illegal Capacity: "+
                                               initialCapacity);
        }
    }

拷贝构造函数

  /**
     * Constructs a list containing the elements of the specified
     * collection, in the order they are returned by the collection's
     * iterator.
     *
     * @param c the collection whose elements are to be placed into this list
     * @throws NullPointerException if the specified collection is null
     */
    public ArrayList(Collection c) {
        elementData = c.toArray();
        if ((size = elementData.length) != 0) {
            // c.toArray might (incorrectly) not return Object[] (see 6260652)
            if (elementData.getClass() != Object[].class)
                elementData = Arrays.copyOf(elementData, size, Object[].class);
        } else {
            // replace with empty array.
            this.elementData = EMPTY_ELEMENTDATA;
        }
    }

根据index添加、删除、获取元素

根据index添加元素:

    /**
     * Inserts the specified element at the specified position in this
     * list. Shifts the element currently at that position (if any) and
     * any subsequent elements to the right (adds one to their indices).
     *
     * @param index index at which the specified element is to be inserted
     * @param element element to be inserted
     * @throws IndexOutOfBoundsException {@inheritDoc}
     */
    public void add(int index, E element) {
        rangeCheckForAdd(index);

        ensureCapacityInternal(size + 1);  // Increments modCount!!
        System.arraycopy(elementData, index, elementData, index + 1,
                         size - index);
        elementData[index] = element;
        size++;
    }

从代码中不难看出,首先检查index是否存在越界问题,接下来检查数组容量(如果超过了默认的10,需要做数组容量扩展,扩展50%),最后在index位置插入给定的元素,并将原有index位置到size-1位置的元素向后挪一个位置,将size+1。

根据index删除元素:

/**
     * Removes the element at the specified position in this list.
     * Shifts any subsequent elements to the left (subtracts one from their
     * indices).
     *
     * @param index the index of the element to be removed
     * @return the element that was removed from the list
     * @throws IndexOutOfBoundsException {@inheritDoc}
     */
    public E remove(int index) {
        rangeCheck(index);

        modCount++;
        E oldValue = elementData(index);

        int numMoved = size - index - 1;
        if (numMoved > 0)
            System.arraycopy(elementData, index+1, elementData, index,
                             numMoved);
        elementData[--size] = null; // clear to let GC do its work

        return oldValue;
    }

从代码中不难看出,首先check是否数组越界,接下来根据index找到元素,然后将index+1到size-1位置的元素向前挪动一个位置,并返回要删除的元素。

根据index获取元素:

    /**
     * Returns the element at the specified position in this list.
     *
     * @param  index index of the element to return
     * @return the element at the specified position in this list
     * @throws IndexOutOfBoundsException {@inheritDoc}
     */
    public E get(int index) {
        rangeCheck(index);

        return elementData(index);
    }

从代码中不难看出,首先check数组越界问题,然后直接按照index从数组中获取元素。

从以上三个方法的代码中不难看出,由于按index插入、删除数据涉及挪动index+1到size-1位置的全部数据,因此不适宜频繁随机写入、删除的场景。但由于其内部采用数组进行元素存储,因此比较适宜按index获取数据的场景。

LinkedList

LinkedList是双向链表,即同时保有前驱节点和后驱节点,支持正向遍历和反向遍历。

 

主要属性

private static class Node {
        E item;
        Node next;
        Node prev;

        Node(Node prev, E element, Node next) {
            this.item = element;
            this.next = next;
            this.prev = prev;
        }
    }

从中可以看出,item为节点中的元素值,prev和next分别是前向节点和后向节点的引用。

 

构造函数

默认构造函数和以集合为入参的构造函数。其中以集合为入参的构造函数如下:

    /**
     * Constructs a list containing the elements of the specified
     * collection, in the order they are returned by the collection's
     * iterator.
     *
     * @param  c the collection whose elements are to be placed into this list
     * @throws NullPointerException if the specified collection is null
     */
    public LinkedList(Collection c) {
        this();
        addAll(c);
    }
    /**
     * Appends all of the elements in the specified collection to the end of
     * this list, in the order that they are returned by the specified
     * collection's iterator.  The behavior of this operation is undefined if
     * the specified collection is modified while the operation is in
     * progress.  (Note that this will occur if the specified collection is
     * this list, and it's nonempty.)
     *
     * @param c collection containing elements to be added to this list
     * @return {@code true} if this list changed as a result of the call
     * @throws NullPointerException if the specified collection is null
     */
    public boolean addAll(Collection c) {
        return addAll(size, c);
    }
 /**
     * Inserts all of the elements in the specified collection into this
     * list, starting at the specified position.  Shifts the element
     * currently at that position (if any) and any subsequent elements to
     * the right (increases their indices).  The new elements will appear
     * in the list in the order that they are returned by the
     * specified collection's iterator.
     *
     * @param index index at which to insert the first element
     *              from the specified collection
     * @param c collection containing elements to be added to this list
     * @return {@code true} if this list changed as a result of the call
     * @throws IndexOutOfBoundsException {@inheritDoc}
     * @throws NullPointerException if the specified collection is null
     */
    public boolean addAll(int index, Collection c) {
        checkPositionIndex(index);

        Object[] a = c.toArray();
        int numNew = a.length;
        if (numNew == 0)
            return false;

        Node pred, succ;
        if (index == size) {
            succ = null;
            pred = last;
        } else {
            succ = node(index);
            pred = succ.prev;
        }

        for (Object o : a) {
            @SuppressWarnings("unchecked") E e = (E) o;
            Node newNode = new Node<>(pred, e, null);
            if (pred == null)
                first = newNode;
            else
                pred.next = newNode;
            pred = newNode;
        }

        if (succ == null) {
            last = pred;
        } else {
            pred.next = succ;
            succ.prev = pred;
        }

        size += numNew;
        modCount++;
        return true;
    }

可见,以给定集合构造LinkedList的步骤为:遍历给定的集合,将第一个元素作为LinkedList的first节点,然后之后的每一个元素负责将上一个节点的后继节点置为自身,最后一个元素作为LinkedList的last节点。

 

根据index添加、删除、修改、获取元素

先看根据index添加元素。

首先check index是否越界,接下来判断index是否为最后一个节点。

如果是最后一个节点,则根据传入的元素值创建新节点,将last节点的后继节点设置为新节点,将新节点的前置节点设置为原last节点,将新节点设置为LinkedList的last节点,LinkedList的size加1;

如果不是最后一个节点,则根据index查找到节点,根据传入的元素值创建新节点,将按index查找到的节点的前置节点作为新节点的前置节点,将新节点作为按index查找到的节点的前置节点,将按index查找到的节点的前置节点的后继节点设置为新节点,完成整个插入过程。

  /**
     * Inserts the specified element at the specified position in this list.
     * Shifts the element currently at that position (if any) and any
     * subsequent elements to the right (adds one to their indices).
     *
     * @param index index at which the specified element is to be inserted
     * @param element element to be inserted
     * @throws IndexOutOfBoundsException {@inheritDoc}
     */
    public void add(int index, E element) {
        checkPositionIndex(index);

        if (index == size)
            linkLast(element);
        else
            linkBefore(element, node(index));
    }

    /**
     * Inserts element e before non-null Node succ.
     */
    void linkBefore(E e, Node succ) {
        // assert succ != null;
        final Node pred = succ.prev;
        final Node newNode = new Node<>(pred, e, succ);
        succ.prev = newNode;
        if (pred == null)
            first = newNode;
        else
            pred.next = newNode;
        size++;
        modCount++;
    }
    /**
     * Returns the (non-null) Node at the specified element index.
     */
    Node node(int index) {
        // assert isElementIndex(index);

        if (index < (size >> 1)) {
            Node x = first;
            for (int i = 0; i < index; i++)
                x = x.next;
            return x;
        } else {
            Node x = last;
            for (int i = size - 1; i > index; i--)
                x = x.prev;
            return x;
        }
    }

再看根据index删除元素。从中不难看出,首先检查index是否越界,之后按index获取到节点,并进一步获取到其前置节点(prev)和后继节点(succ)。如果prev为null(即要删除的节点为头节点),则将succ变为头节点;反之,则将succ的前置节点设置为prev,将要删除节点的前置节点设置为null,将前向链接断开。如果succ为null,则将prev设置为尾节点;反之,则将   prev的后继节点设置为succ,将要删除节点的后继节点设置为null,将后继链接断开。最后将节点的value设置为null,size减1,完成全部删除工作。

    /**
     * Removes the element at the specified position in this list.  Shifts any
     * subsequent elements to the left (subtracts one from their indices).
     * Returns the element that was removed from the list.
     *
     * @param index the index of the element to be removed
     * @return the element previously at the specified position
     * @throws IndexOutOfBoundsException {@inheritDoc}
     */
    public E remove(int index) {
        checkElementIndex(index);
        return unlink(node(index));
    }
 /**
     * Unlinks non-null node x.
     */
    E unlink(Node x) {
        // assert x != null;
        final E element = x.item;
        final Node next = x.next;
        final Node prev = x.prev;

        if (prev == null) {
            first = next;
        } else {
            prev.next = next;
            x.prev = null;
        }

        if (next == null) {
            last = prev;
        } else {
            next.prev = prev;
            x.next = null;
        }

        x.item = null;
        size--;
        modCount++;
        return element;
    }

再看根据index修改元素。操作过程为,首先检查index是否越界,接下来根据index定位节点,最后将节点的值变更。

    /**
     * Replaces the element at the specified position in this list with the
     * specified element.
     *
     * @param index index of the element to replace
     * @param element element to be stored at the specified position
     * @return the element previously at the specified position
     * @throws IndexOutOfBoundsException {@inheritDoc}
     */
    public E set(int index, E element) {
        checkElementIndex(index);
        Node x = node(index);
        E oldVal = x.item;
        x.item = element;
        return oldVal;
    }

最后看根据index获取元素。其实上文中已经进行过介绍了,首先将index右移1位(其实就是除以2),如果index在前半段,那就从first节点开始往后查找,反之就从last节点开始往前查找。这样可以保证最多查找一半的元素就可以找到。

    /**
     * Returns the (non-null) Node at the specified element index.
     */
    Node node(int index) {
        // assert isElementIndex(index);

        if (index < (size >> 1)) {
            Node x = first;
            for (int i = 0; i < index; i++)
                x = x.next;
            return x;
        } else {
            Node x = last;
            for (int i = size - 1; i > index; i--)
                x = x.prev;
            return x;
        }
    }

从上述的按index新增、删除、获取、修改等方法不难看出,由于LinkedList本质上是双向链表的实现,因此在根据index获取元素的时候,需要从头节点或尾节点遍历链表中的元素,找到位置为index的节点(最多遍历size/2的节点)。而在按index新增、删除元素的时候,仅需找到对应index的节点,然后插入新的节点或将该节点断开与前置节点与后继节点的链接即可。

相比于以数组为存储形式的ArrayList,LinkedList不适用于频繁按index读取元素的场景(因为其不是连续存储,需要做链表遍历),但适用于按index插入、删除元素的场景(因为主要开销在链表遍历,一旦找到index位置的节点,就可以灵活地通过插入节点或将节点解除双向链接的方式完成;相比之下,ArrayList需要将index之后的元素进行挪动)。

Map

我们在此主要介绍Map接口的两个实现类,线程不安全的HashMap和线程安全的ConcurrentHashMap。

HashMap

首先介绍线程不安全的HashMap。这里推荐一篇图文并茂的文章:https://juejin.im/post/5aa5d8d26fb9a028d2079264。

重要属性

 /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node[] table;


    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

 
    /**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;

table属性即为分桶。

size属性表示HashMap中存储的Key-Value对儿的个数。

threshold为判断是否需要进行扩容的门限值。

构造函数

/**
     * Constructs an empty HashMap with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

    /**
     * Constructs an empty HashMap with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param  initialCapacity the initial capacity.
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    /**
     * Constructs an empty HashMap with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

    /**
     * Constructs a new HashMap with the same mappings as the
     * specified Map.  The HashMap is created with
     * default load factor (0.75) and an initial capacity sufficient to
     * hold the mappings in the specified Map.
     *
     * @param   m the map whose mappings are to be placed in this map
     * @throws  NullPointerException if the specified map is null
     */
    public HashMap(Map m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }

    /**
     * Implements Map.putAll and Map constructor
     *
     * @param m the map
     * @param evict false when initially constructing this map, else
     * true (relayed to method afterNodeInsertion).
     */
    final void putMapEntries(Map m, boolean evict) {
        int s = m.size();
        if (s > 0) {
            if (table == null) { // pre-size
                float ft = ((float)s / loadFactor) + 1.0F;
                int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                         (int)ft : MAXIMUM_CAPACITY);
                if (t > threshold)
                    threshold = tableSizeFor(t);
            }
            else if (s > threshold)
                resize();
            for (Map.Entry e : m.entrySet()) {
                K key = e.getKey();
                V value = e.getValue();
                putVal(hash(key), key, value, false, evict);
            }
        }
    }

其中最为重要的就是根据传入的initialCapacity计算threshold的方法,如下所示:

  /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

其功能在于计算得到大于等于initialCapacity的最小的2的正整数幂。

添加KV

添加KV方法算是HashMap中最为重要的几个方法之一了。

先上源码:

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with key, or
     *         null if there was no mapping for key.
     *         (A null return can also indicate that the map
     *         previously associated null with key.)
     */
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node[] tab; Node p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

个人感觉该方法中的信息量还是比较大的,该方法在1.8版本中,在处理落入同一个分桶的数据时,较历史版本有所更新。

之前版本中,对于该方法的处理是这样的流程:

 

  1. 根据给出的key,计算hash值(key的hashCode方法),从而定位到分桶;
  2. 检查定位到的分桶中是否已经存在了数据。如果没有,则直接将数据放入分桶中;反之,则将数据加入分桶的链表中。

以上操作存在一个问题,那就是当分桶中数据比较多的时候,get方法需要在根据key的hash值定位到具体分桶后,还需要进行链表遍历才能找到需要获取到的value。假设链表长度为n,则时间复杂度为O(n),效率不够理想。

有什么办法进行改进呢?对数据结构有所了解的同学们立刻就能反应到,相对于链表的遍历,二叉树的遍历往往能够减小时间复杂度。因此在JDK8中对此进行了功能增强,变为如下流程:

  1. 据给出的key,计算hash值(key的hashCode方法),从而定位到分桶,此步骤与改版前一致;
  2. 检查分桶中是否有数据,如果没有数据则直接将数据放入分桶中;反之,更精细化地进行处理。对于分桶中不太多数据的场景(个数小于TREEIFY_THRESHOLD-1)采用链表的方式,将新节点放到链表的尾部;对于分桶中较多数据的场景(个数大于等于TREEIFY_THRESHOLD-1),采用红黑树的方式,将节点进行存储。这样一来,既减轻了数据结构的复杂程度(红黑树较链表要复杂一些),又在链表较长的情况下,有效地减少了遍历节点的时间复杂度。个人觉得,这算是JDK8中,对于解决此问题的很精巧的一点。

下面我们仔细地看一下这个方法的代码。

首先,需要先根据key找到对应的分桶,所使用的方法是key.hashCode()右移16位,再与自身按位异或运算得到的hash值,再与分桶总数-1进行按位与操作。初看这个方法的时候,你一定很疑惑,取了hashCode之后为什么还要如此复杂地右移16位,再与自身按位异或才能获取到hash值。原因在于,hashCode方法返回的hash值并不够完全随机,为了更好地使得不同的key均匀地散落在不同的分桶中(换句话说,为了不同key的hash值分布得更加均匀),在原有的调用hashCode方法的基础上,又做了一次运算,将结果进一步地随机化。之后逻辑上需要将计算得来的h值,按照分桶总数取模,这样才能尽可能地保证均匀地分配到不同的分桶中。但是,计算机取模运算的效率不高,所以采用了与分桶总数-1进行按位与操作,也能达到同样的效果。

接下来,根据分桶中已有的数据的情况来判断如何将输入的Key-Value存储到分桶中。如果分桶中没有数据,那么就将输入的Key-Value转化成存储节点Node,存储到分桶中;如果分桶中有数据,且已采用红黑树进行存储了,则将该存储节点插入到红黑树中,以优化后续查找性能;如果分桶中的数据依然是使用链表进行存储,则存储到链表的尾部,之后判断是否达到树化条件(链表中存储的数据个数大于等于TREEIFY_THRESHOLD-1),达到的话就将链表转成红黑树存储。具体的插入红黑树的代码就不再这里赘述了,感兴趣的同学可在参照红黑树的介绍文章:https://baike.baidu.com/item/%E7%BA%A2%E9%BB%91%E6%A0%91。

最后,在插入节点之后,判断是否要进行扩容(resize),扩容的条件是加入新节点后,存储的Key-Value的个数大于threshold(threshold的赋值参见构造函数中,为大于等于initialCapacity的最小的2的正整数幂)。扩容的具体操作步骤如下:

 /**
     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     *
     * @return the table
     */
    final Node[] resize() {
        Node[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node[] newTab = (Node[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node loHead = null, loTail = null;
                        Node hiHead = null, hiTail = null;
                        Node next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

首先计算得到扩充后的threshold和capacity,然后将扩容前的各分桶中的数据按照新分桶的定位计算方法,定位到新分桶中,然后依次进行迁移。注意代码中涉及到的HashMap中的table属性在多线程操作时是临界资源,因此HashMap不是线程安全的,需要在代码中做线程安全保护。此外,我们观察到HashMap的resize方法,需要进行新分桶的threshold/capacity重新计算,旧数据按照新分桶进行重新定位,旧分桶中的数据按新规则向新分桶中迁移,这里边的开销还是比较大的,因此比较建议在创建HashMap实例的时候,尽可能地根据业务需求对HashMap的capacity进行一个预估,避免HashMap在程序运行过程中频繁进行扩容计算,提升性能。

 

根据Key获取Value

在了解了Key-Value的插入方法之后,再了解如何根据Key获取Value就会简单许多。get方法代码如下:

    /**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * 

More formally, if this map contains a mapping from a key * {@code k} to a value {@code v} such that {@code (key==null ? k==null : * key.equals(k))}, then this method returns {@code v}; otherwise * it returns {@code null}. (There can be at most one such mapping.) * *

A return value of {@code null} does not necessarily * indicate that the map contains no mapping for the key; it's also * possible that the map explicitly maps the key to {@code null}. * The {@link #containsKey containsKey} operation may be used to * distinguish these two cases. * * @see #put(Object, Object) */ public V get(Object key) { Node e; return (e = getNode(hash(key), key)) == null ? null : e.value; } /** * Implements Map.get and related methods * * @param hash hash for key * @param key the key * @return the node, or null if none */ final Node getNode(int hash, Object key) { Node[] tab; Node first, e; int n; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) { if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; if ((e = first.next) != null) { if (first instanceof TreeNode) return ((TreeNode)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); } } return null; }

整体步骤分为以下几步:

  1. 按照put方法中描述的规则,根据key定位到分桶;
  2. 将key与分桶中第一个节点的key做equals比较,如果相等则返回;不相等则考虑与后续节点进行比对。分两种情况,如果分桶中存储的是红黑树,则做树的遍历查找;如果存储的是链表,则做链表的遍历查找。如果遍历结束依然没有找到,则返回null。

根据Key删除

remove方法的操作步骤如下:

 /**
     * Removes the mapping for the specified key from this map if present.
     *
     * @param  key key whose mapping is to be removed from the map
     * @return the previous value associated with key, or
     *         null if there was no mapping for key.
     *         (A null return can also indicate that the map
     *         previously associated null with key.)
     */
    public V remove(Object key) {
        Node e;
        return (e = removeNode(hash(key), key, null, false, true)) == null ?
            null : e.value;
    }

    /**
     * Implements Map.remove and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to match if matchValue, else ignored
     * @param matchValue if true only remove if value is equal
     * @param movable if false do not move other nodes while removing
     * @return the node, or null if none
     */
    final Node removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
        Node[] tab; Node p; int n, index;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (p = tab[index = (n - 1) & hash]) != null) {
            Node node = null, e; K k; V v;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                node = p;
            else if ((e = p.next) != null) {
                if (p instanceof TreeNode)
                    node = ((TreeNode)p).getTreeNode(hash, key);
                else {
                    do {
                        if (e.hash == hash &&
                            ((k = e.key) == key ||
                             (key != null && key.equals(k)))) {
                            node = e;
                            break;
                        }
                        p = e;
                    } while ((e = e.next) != null);
                }
            }
            if (node != null && (!matchValue || (v = node.value) == value ||
                                 (value != null && value.equals(v)))) {
                if (node instanceof TreeNode)
                    ((TreeNode)node).removeTreeNode(this, tab, movable);
                else if (node == p)
                    tab[index] = node.next;
                else
                    p.next = node.next;
                ++modCount;
                --size;
                afterNodeRemoval(node);
                return node;
            }
        }
        return null;
    }
  1. 根据key定位到分桶;
  2. 从分桶中找到要删除的节点;
  3. 删除节点,并将size-1。

ConcurrentHashMap

ConcurrentHashMap是线程安全的HashMap,在JDK8中,ConcurrentHashMap为进一步优化多线程下的并发性能,不再采用分段锁对分桶进行保护,而是采用CAS操作(Compare And Set)。这个改变在思想上很像是乐观锁与悲观锁。

在JDK8之前,ConcurrentHashMap采用分段锁的方式来保证线程安全性,相对于CAS操作,量级更重一些,因为需要做加锁、解锁操作。这很类似于悲观锁的思想,即假设多线程并发操作临界资源的几率比较大,因此采用加锁的方式来应对。

JDK8中,采用了轻量级的CAS操作来获取及向分桶中写入元素(当分桶中没有元素存储时),采用类似LongAdder的方式计算ConcurrentHashMap的size,使得多线程操作时,采用更轻量级的方式加以应对。CAS很类似于乐观锁的思想,而LongAdder是将多线程的+1操作随机地定位到不同分桶中来避免写入时的冲突,在计算size时,通过将各分桶中的数值进行叠加从而得到总的size值,从而很好地解决了多线程+1写操作的冲突和加锁等待问题。

 

构造函数

与HashMap相类似,ConcurrentHashMap的构造函数也分为以下几个:

  1. 默认构造函数:空实现;
  2. 给定initialCapacity的构造函数;
  3. 拷贝构造函数。

这里我们关注以下给定initialCapacity的构造函数,之所以关注这个构造函数,是因为HashMap与ConcurrentHashMap的reSize扩容过程都属于开销比较大的操作,所以期望使用者在使用的时候尽可能地根据业务需求对size有一个大致的预估,并使用该构造函数对Map进行构造,以避免后续不断地扩容操作,给性能带来不利的影响。

我们都还记得上文中提到的HashMap的size计算,其值等于大于等于initialCapacity的最小的2的正整数幂。而ConcurrentHashMap的该构造函数源码如下:

  /**
     * Creates a new, empty map with an initial table size
     * accommodating the specified number of elements without the need
     * to dynamically resize.
     *
     * @param initialCapacity The implementation performs internal
     * sizing to accommodate this many elements.
     * @throws IllegalArgumentException if the initial capacity of
     * elements is negative
     */
    public ConcurrentHashMap(int initialCapacity) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException();
        int cap = ((initialCapacity >= (MAXIMUM_CAPACITY >>> 1)) ?
                   MAXIMUM_CAPACITY :
                   tableSizeFor(initialCapacity + (initialCapacity >>> 1) + 1));
        this.sizeCtl = cap;
    }

从中不难看出,ConcurrentHashMap的sizeCtl值大于等于1.5倍的initialCapacity+1的最小2的正整数幂。

而拷贝构造函数的思路则是遍历入参中的Map,然后依次将其put到本次创建的ConcurrentHashMap中,put方法的操作过程我会在下边加以详细介绍,所以此处不再赘述。

添加KV

put方法与HashMap的put方法在基本思路上是一致的,主要分以下步骤:

 

  1. 对传入的key计算hashCode,然后计算获取到index,即指向哪个分桶;
  2. 访问到指定的分桶,如果分桶中此时没有节点,则将KV做插入处理;如果分桶中已经有节点了,则判断是树节点还是链表节点,然后根据树或是链表进行遍历。如果该Key已经存储到了Map中了,则将新值写入,旧值返回;反之,则创建新节点,存储到树的合适位置,或者是链表的尾节点(当然还会根据链表中存储的节点数来判断是否应该进行树化处理);
  3. 将size自增1,并判断是否需要进行reSize扩容操作,是则进行扩容,即生成新的分桶,将旧有分桶中的节点根据新的规则拷贝并创建到新的分桶中。

只不过ConcurrentHashMap为了保证其是线程安全的,因此采用了一系列的手段来保证这一点。

 /**
     * Maps the specified key to the specified value in this table.
     * Neither the key nor the value can be null.
     *
     * 

The value can be retrieved by calling the {@code get} method * with a key that is equal to the original key. * * @param key key with which the specified value is to be associated * @param value value to be associated with the specified key * @return the previous value associated with {@code key}, or * {@code null} if there was no mapping for {@code key} * @throws NullPointerException if the specified key or value is null */ public V put(K key, V value) { return putVal(key, value, false); } /** Implementation for put and putIfAbsent */ final V putVal(K key, V value, boolean onlyIfAbsent) { if (key == null || value == null) throw new NullPointerException(); int hash = spread(key.hashCode()); int binCount = 0; for (Node[] tab = table;;) { Node f; int n, i, fh; if (tab == null || (n = tab.length) == 0) tab = initTable(); else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) { if (casTabAt(tab, i, null, new Node(hash, key, value, null))) break; // no lock when adding to empty bin } else if ((fh = f.hash) == MOVED) tab = helpTransfer(tab, f); else { V oldVal = null; synchronized (f) { if (tabAt(tab, i) == f) { if (fh >= 0) { binCount = 1; for (Node e = f;; ++binCount) { K ek; if (e.hash == hash && ((ek = e.key) == key || (ek != null && key.equals(ek)))) { oldVal = e.val; if (!onlyIfAbsent) e.val = value; break; } Node pred = e; if ((e = e.next) == null) { pred.next = new Node(hash, key, value, null); break; } } } else if (f instanceof TreeBin) { Node p; binCount = 2; if ((p = ((TreeBin)f).putTreeVal(hash, key, value)) != null) { oldVal = p.val; if (!onlyIfAbsent) p.val = value; } } } } if (binCount != 0) { if (binCount >= TREEIFY_THRESHOLD) treeifyBin(tab, i); if (oldVal != null) return oldVal; break; } } } addCount(1L, binCount); return null; }

关注其中在获取分桶中节点的tabAt方法,和向分桶中添加节点的casTabAt方法,二者在底层都使用到了sun.misc.Unsafe。关于Unsafe,可参考文章:https://blog.csdn.net/anla_/article/details/78631026。简单说就是,Unsafe提供了硬件级别的原子操作。借助于Unsafe类,tabAt方法能够获取到分桶中的节点,casTabAt方法能够采用CAS的方式将新节点写入到分桶中,即如果分桶中如果当前存储的节点与刚才使用tabAt方法获取到的相同,则将新节点覆盖之前的节点;反之,则说明在当前线程设置之前,其他线程已经改变了分桶中的节点,因此本线程的设置操作失败。

如果CAS操作失败了,则进入到下边的分支,首先check是否该节点在迁移中(迁移中的节点的hash值为-1,有可能是在reSize扩容中),如果不在迁移过程中,则采用synchronized关键字,对tabAt获取到的节点加锁,然后像上文中陈述的步骤2那样完成新KV的写入操作。

最后一步则是将ConcurrentHashMap的size+1,并根据已存储的节点的个数判断是否要进行reSize扩容操作。为了满足线程安全性,并尽可能地提升size+1的性能,ConcurrentHashMap采用的是类似于LongAdder的方式来完成size+1这个操作的。可参考文章:https://www.cnblogs.com/ten951/p/6590596.html。LongAdder的思想本质上就是将多线程操作同一个变量(将同一个size做+1操作),转变为多线程随机向一组cell做写入,通过将各cell中的value累加,即可得到总的值,虽然这个值可能不准。这样即可由多线程集中写,转变为多线程分散写,有效地减轻了多线程操作时的竞争。

根据Key获取Value

在了解了put方法之后,再去看get方法就会明晰很多。

    /**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * 

More formally, if this map contains a mapping from a key * {@code k} to a value {@code v} such that {@code key.equals(k)}, * then this method returns {@code v}; otherwise it returns * {@code null}. (There can be at most one such mapping.) * * @throws NullPointerException if the specified key is null */ public V get(Object key) { Node[] tab; Node e, p; int n, eh; K ek; int h = spread(key.hashCode()); if ((tab = table) != null && (n = tab.length) > 0 && (e = tabAt(tab, (n - 1) & h)) != null) { if ((eh = e.hash) == h) { if ((ek = e.key) == key || (ek != null && key.equals(ek))) return e.val; } else if (eh < 0) return (p = e.find(h, key)) != null ? p.val : null; while ((e = e.next) != null) { if (e.hash == h && ((ek = e.key) == key || (ek != null && key.equals(ek)))) return e.val; } } return null; }

从代码中不难看出,ConcurrentHashMap的get方法与HashMap的get方法的大致思路是一致的。

按Key删除

remove方法的源码如下:

    /**
     * Removes the key (and its corresponding value) from this map.
     * This method does nothing if the key is not in the map.
     *
     * @param  key the key that needs to be removed
     * @return the previous value associated with {@code key}, or
     *         {@code null} if there was no mapping for {@code key}
     * @throws NullPointerException if the specified key is null
     */
    public V remove(Object key) {
        return replaceNode(key, null, null);
    }

    /**
     * Implementation for the four public remove/replace methods:
     * Replaces node value with v, conditional upon match of cv if
     * non-null.  If resulting value is null, delete.
     */
    final V replaceNode(Object key, V value, Object cv) {
        int hash = spread(key.hashCode());
        for (Node[] tab = table;;) {
            Node f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0 ||
                (f = tabAt(tab, i = (n - 1) & hash)) == null)
                break;
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else {
                V oldVal = null;
                boolean validated = false;
                synchronized (f) {
                    if (tabAt(tab, i) == f) {
                        if (fh >= 0) {
                            validated = true;
                            for (Node e = f, pred = null;;) {
                                K ek;
                                if (e.hash == hash &&
                                    ((ek = e.key) == key ||
                                     (ek != null && key.equals(ek)))) {
                                    V ev = e.val;
                                    if (cv == null || cv == ev ||
                                        (ev != null && cv.equals(ev))) {
                                        oldVal = ev;
                                        if (value != null)
                                            e.val = value;
                                        else if (pred != null)
                                            pred.next = e.next;
                                        else
                                            setTabAt(tab, i, e.next);
                                    }
                                    break;
                                }
                                pred = e;
                                if ((e = e.next) == null)
                                    break;
                            }
                        }
                        else if (f instanceof TreeBin) {
                            validated = true;
                            TreeBin t = (TreeBin)f;
                            TreeNode r, p;
                            if ((r = t.root) != null &&
                                (p = r.findTreeNode(hash, key, null)) != null) {
                                V pv = p.val;
                                if (cv == null || cv == pv ||
                                    (pv != null && cv.equals(pv))) {
                                    oldVal = pv;
                                    if (value != null)
                                        p.val = value;
                                    else if (t.removeTreeNode(p))
                                        setTabAt(tab, i, untreeify(t.first));
                                }
                            }
                        }
                    }
                }
                if (validated) {
                    if (oldVal != null) {
                        if (value == null)
                            addCount(-1L, -1);
                        return oldVal;
                    }
                    break;
                }
            }
        }
        return null;
    }

从中不难看出,其操作流程与put方法极为类似。

 

  1. 通过key计算hashCode,继而计算index;
  2. 根据index,定位到具体的分桶,如果分桶中没有数据,返回,说明要删除的key并不存在;
  3. 如果定位到的分桶中的节点的hash值为-1,则说明节点在迁移过程中,则helpTransfer;
  4. 2,3都不满足,则通过synchronized锁住节点,根据该分桶中存储的节点是采用红黑树进行存储还是链表进行存储,进行相应的删除处理操作。

Set

Set与List的区别在于,Set中存储的元素是经过了去重的(即如果a.equals(b),则Set中只可能存在一个)。

Set的典型实现是HashSet,其主要方法add,remove,contains,均是通过内置的HashMap来进行实现的。

比如add方法,本质上是调用了HashMap的put方法,以传入的object为Key,并以dummy value(private static final Object PRESENT = new Object();)为Value。

remove方法,也是在内部调用了HashMap的remove方法,将传入的object作为key,从而对HashSet中保存的object进行删除。

你可能感兴趣的:(JAVA)