最近,和A同学聊到了ArrayMap
和HashMap
哪个更好,A一口咬定ArrayMap
更高效,这是google爸爸出的,大家都说好用,墙裂推荐。说的我一脸懵逼,没有测试过,也不好下反驳,分析一波瞧瞧!
粗看
简单看下ArrayMap
的类介绍
ArrayMap is a generic key->value mapping data structure that is designed to be more memory efficient than a traditional {@link java.util.HashMap}...This allows it to avoid having to create an extra object for every entry put in to the map, and it also tries to control the growth of the size of these arrays more aggressively (since growing them only requires copying the entries in the array, not rebuilding a hash map).
...
Note that this implementation is not intended to be appropriate for data structures that may contain large numbers of items. It is generally slower than a traditional HashMap, since lookups require a binary search and adds and removes require inserting and deleting entries in the array. For containers holding up to hundreds of items, the performance difference is not significant, less than 50%.
...
从其中截取了两段简单翻译下:
ArrayMap是一个保存
key->value
的数据结构,相比较HashMap
来说内存利用更加高效…尽量避免未每个entry
创建额外的对象和积极的控制map中数组大小的增长(因为增长它们只需要复制数组中的条目,而不需要重新构建一个散列映射)...
注意,此实现不适合可能包含大量数据的数据结构。它通常比传统的
HashMap
慢,因为查找需要二分搜索,添加和删除需要插入和删除数组中的条目。对于容纳数百个项目的容器,性能差异并不显著,小于50%。...
从上述描述中可以看出ArrayMap
并不比HashMao
高效,只是在内存的利用上更好些。仔细想想也有一定的道理,Android开发的过程中通常通过map保存的数据不会太多,在这种场景下,内存利用更好的ArrayMap
就有优势了。
下面用一段简单的测试代码验证下上述的猜想:
public static void main(String[] args) {
int max = 10;
for (int i = 0; i < 4; i++) {
max *= 10;
test(max);
}
}
private static void test(int max) {
ArrayMap arrayMap = new ArrayMap();
HashMap hashMap = new HashMap();
Random random = new Random();
//初始化
long startTime = System.currentTimeMillis();
for (int i = 0; i < max; i++) {
arrayMap.put(random.nextInt(max), random.nextInt(max));
}
long endTime = System.currentTimeMillis();
System.out.println("arrayMap init " + max + " time:" + (endTime - startTime));
startTime = System.currentTimeMillis();
for (int i = 0; i < max; i++) {
hashMap.put(random.nextInt(max), random.nextInt(max));
}
endTime = System.currentTimeMillis();
System.out.println("hashMap init " + max + " time:" + (endTime - startTime));
//取值
startTime = System.currentTimeMillis();
for (int i = 0; i < max; i++) {
arrayMap.get(random.nextInt(max));
}
endTime = System.currentTimeMillis();
System.out.println("arrayMap get " + max + " time:" + (endTime - startTime));
startTime = System.currentTimeMillis();
for (int i = 0; i < max; i++) {
hashMap.get(random.nextInt(max));
}
endTime = System.currentTimeMillis();
System.out.println("hashMap get " + max + " time:" + (endTime - startTime));
System.out.println("------------");
}
运行结果基本满足上述猜想,在运行次数1000的时候基本无差,数据越大ArrayMap的效率越差:
arrayMap init 100 time:1
hashMap init 100 time:1
arrayMap get 100 time:1
hashMap get 100 time:0
------------
arrayMap init 1000 time:2
hashMap init 1000 time:1
arrayMap get 1000 time:3
hashMap get 1000 time:2
------------
arrayMap init 10000 time:12
hashMap init 10000 time:9
arrayMap get 10000 time:11
hashMap get 10000 time:3
------------
arrayMap init 100000 time:642
hashMap init 100000 time:36
arrayMap get 100000 time:28
hashMap get 100000 time:30
------------
分析
- 存储数据结构
- put/get过程
- 扩容
存储数据结构
ArrayMap
//存储key的hash值,是一个有序数组
int[] mHashes = new int[size];
//大小是存储大小的2倍,因为存储了key和value,key=index*2,value=index*2+1
Object[] mArray = new Object[size<<1];
HashMap
//key对应的hash,通过hash(key)找到对应的index,再找到对应的value(可能冲突就得遍历链表或者红黑树)
transient Node[] table;
//存储的节点,可能是个链表,当长度>=TREEIFY_THRESHOLD(8)-1,会将链表扩展成红黑树
static class Node implements Map.Entry {
final int hash;
final K key;
V value;
Node next;
}
put 过程
public V put(K key, V value) {}
ArrayMap
-
通过key的hash,找是否存在的下标index==>有序数组二分查找,时间复杂度O(LogN)
//找到的都是正数,未找到返回的都是负数 int indexOf(Object key, int hash) { final int N = mSize; //数组没数据,直接返回~0 if (N == 0) { //~0 = -1 return ~0; } //二分查找,O(LogN)时间负责度 int index = binarySearchHashes(mHashes, N, hash); //未找到返回~0 if (index < 0) { return index; } //如果找到的正好是该值,则直接返回index if (key.equals(mArray[index<<1])) { return index; } //因为有些key的hash值可能相同,所以通过hash找到的index不一定就是对于的key int end; //从index向后遍历 for (end = index + 1; end < N && mHashes[end] == hash; end++) { if (key.equals(mArray[end << 1])) return end; } //向后遍历未找到,从index向前遍历 for (int i = index - 1; i >= 0 && mHashes[i] == hash; i--) { if (key.equals(mArray[i << 1])) return i; } //还未找到,说明不存在此key,但是hash存在,所以返回的index就是该hash对应的最后一个位置 return ~end; }
-
如找到key存在,则更新value返回
//找到的都是正数,未找到返回的都是负数 if (index >= 0) { //存在 index = (index<<1) + 1; final V old = (V)mArray[index]; mArray[index] = value; return old; }
-
对于未找到的key,对得到的index取反,拿到真正的下标
index = ~index;
-
如hash的size不足,先进行扩容
if (osize >= mHashes.length) { //BASE_SIZE = 4 //1.osize >= 8 ? 扩容大小是osize+osize/2 //2.osize >= 4 ? 扩容大小是8 //3.osize < 4 ? 扩容大小是4 final int n = osize >= (BASE_SIZE*2) ? (osize+(osize>>1)) : (osize >= BASE_SIZE ? (BASE_SIZE*2) : BASE_SIZE); if (DEBUG) Log.d(TAG, "put: grow from " + mHashes.length + " to " + n); //保存原始hash和array final int[] ohashes = mHashes; final Object[] oarray = mArray; //申请n大小的空间,具体实现后面分析 allocArrays(n); if (CONCURRENT_MODIFICATION_EXCEPTIONS && osize != mSize) { throw new ConcurrentModificationException(); } //将原始数组copy到扩容数组中 if (mHashes.length > 0) { if (DEBUG) Log.d(TAG, "put: copy 0-" + osize + " to 0"); System.arraycopy(ohashes, 0, mHashes, 0, ohashes.length); System.arraycopy(oarray, 0, mArray, 0, oarray.length); } //释放?缓存?后面具体分析 freeArrays(ohashes, oarray, osize); }
-
如index在数组中间,先将index+1的数组通过copy的方式整体后移
if (index < osize) { if (DEBUG) Log.d(TAG, "put: move " + index + "-" + (osize-index) + " to " + (index+1)); System.arraycopy(mHashes, index, mHashes, index + 1, osize - index); System.arraycopy(mArray, index << 1, mArray, (index + 1) << 1, (mSize - index) << 1); }
-
再对index进行赋值
//赋值key的hash mHashes[index] = hash; //赋值key和value mArray[index<<1] = key; mArray[(index<<1)+1] = value; //数组大小+1 mSize++;
HashMap
putVal(hash(key), key, value, false, true);
-
获取key对于的hash()
static final int hash(Object key) { int h; //将hash的高16位和低16位进行异或一下,使其都参与到计算中,进一步的避免hash冲突 return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); } //将拿到的hash和table的大小n进行&操作,得到在hash数组中对应的下标 int index = (n - 1) & hash;
-
table不存在,扩容
if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length;
-
hash对应的未找到,直接生成一个node,赋值
//通过(n - 1) & hash定位到tab的下标 if ((p = tab[i = (n - 1) & hash]) == null){ tab[i] = newNode(hash, key, value, null); }
-
hash对应的已存在,分几种情况处理
如nodel头指针的key == key,则直接返回对应的nodel
如是红黑树,则遍历找到对应的nodel(时间复杂度O(LogN))
如是链表,则遍历找到对应的nodel(时间复杂度O(N))
如是链表,未找到对应的nodel,新生成一个,当size>=TREEIFY_THRESHOLD-1,链表转换成红黑树
Node
e; K k; //hash对应的node直接是目标 if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) e = p; else if (p instanceof TreeNode) //红黑树,遍历寻找 e = ((TreeNode )p).putTreeVal(this, tab, hash, key, value); else { //链表 for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); //超过7,链表转红黑树 if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } //链表遍历找到了 if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } -
更新nodel对应的value
if (e != null) { // existing mapping for key V oldValue = e.value; if (!onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; }
-
容量不够,则扩容
if (++size > threshold) resize(); //扩容
get 过程
V get(Object key);
ArrayMap
public V get(Object key) {
//拿到hash对应的index==>O(LogN)
final int index = indexOfKey(key);
//index*2+1拿到value
return index >= 0 ? (V)mArray[(index<<1)+1] : null;
}
HashMap
final Node getNode(int hash, Object key) {
Node[] tab; Node first, e; int n; K k;
//根据hash找到对应的index
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // 头结点匹配,直接返回
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
//红黑树搜索
return ((TreeNode)first).getTreeNode(hash, key);
//链表搜索
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
扩容
ArrayMap
-
扩容时机:
osize >= mHashes.length
也就是说,存储的数据超过之前容器大小就开始扩容 -
扩容策略:
- BASE_SIZE默认等于4
- osize >= 8 ==> 扩容到osize的1.5倍
- osize >= 4 ==> 扩容到8
- osize < 4 ==> 扩容到4
final int n = osize >= (BASE_SIZE*2) ? (osize+(osize>>1)) : (osize >= BASE_SIZE ? (BASE_SIZE*2) : BASE_SIZE);
扩容过程:
-
申请空间==>allocArrays
private void allocArrays(final int size) { //利用缓存,减少内存的申请 if (size == (BASE_SIZE*2)) { synchronized (ArrayMap.class) { if (mTwiceBaseCache != null) { final Object[] array = mTwiceBaseCache; mArray = array; mTwiceBaseCache = (Object[])array[0]; mHashes = (int[])array[1]; array[0] = array[1] = null; mTwiceBaseCacheSize--; if (DEBUG) Log.d(TAG, "Retrieving 2x cache " + mHashes + " now have " + mTwiceBaseCacheSize + " entries"); return; } } } else if (size == BASE_SIZE) { synchronized (ArrayMap.class) { if (mBaseCache != null) { final Object[] array = mBaseCache; mArray = array; mBaseCache = (Object[])array[0]; mHashes = (int[])array[1]; array[0] = array[1] = null; mBaseCacheSize--; if (DEBUG) Log.d(TAG, "Retrieving 1x cache " + mHashes + " now have " + mBaseCacheSize + " entries"); return; } } } //没有缓存并超过4和8的,申请新的内存空间 mHashes = new int[size]; mArray = new Object[size<<1]; }
-
迁移数据
//通过数组拷贝迁移 System.arraycopy(ohashes, 0, mHashes, 0, ohashes.length); System.arraycopy(oarray, 0, mArray, 0, oarray.length);
-
回收空间==>freeArrays
static Object[] mBaseCache; //size=4的缓存 static int mBaseCacheSize; static Object[] mTwiceBaseCache; //size=8的缓存 static int mTwiceBaseCacheSize; //mBaseCache,mTwiceBaseCache=key&value的cache,array[1]=hash的cache //最大可利用的cache大小 private static final int CACHE_SIZE = 10; //将老的hash进行回收利用 freeArrays(ohashes, oarray, osize); private static void freeArrays(final int[] hashes, final Object[] array, final int size) { if (hashes.length == (BASE_SIZE*2)) {//size=8的缓存,防止小对象频繁创建 synchronized (ArrayMap.class) { if (mTwiceBaseCacheSize < CACHE_SIZE) { array[0] = mTwiceBaseCache; array[1] = hashes; for (int i=(size<<1)-1; i>=2; i--) { //从2开始的内容清除 array[i] = null; } mTwiceBaseCache = array; mTwiceBaseCacheSize++; if (DEBUG) Log.d(TAG, "Storing 2x cache " + array + " now have " + mTwiceBaseCacheSize + " entries"); } } } else if (hashes.length == BASE_SIZE) {//size=4的缓存,防止小对象频繁创建 synchronized (ArrayMap.class) { if (mBaseCacheSize < CACHE_SIZE) { array[0] = mBaseCache; array[1] = hashes; for (int i=(size<<1)-1; i>=2; i--) { array[i] = null; } mBaseCache = array; mBaseCacheSize++; if (DEBUG) Log.d(TAG, "Storing 1x cache " + array + " now have " + mBaseCacheSize + " entries"); } } } }
HashMap
-
扩容时机:
// static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; //默认初始大小为16 // static final float DEFAULT_LOAD_FACTOR = 0.75; //默认负载因子 // int threshold; //阈值,等于capacity*loadFactory // final float loadFactor = DEFAULT_LOAD_FACTOR; //当前负载因子 if (++size > threshold) resize();
-
扩容策略:
未超过最大值的时候,扩容到原来的2倍
if (oldCap > 0) { // 超过最大值就不再扩充了,就只好随你碰撞去吧MAXIMUM_CAPACITY = 1 << 30; if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } // 没超过最大值,就扩充为原来的2倍 else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold }
-
扩容过程:
当超过限制的时候会resize,然而又因为我们使用的是2次幂的扩展(指长度扩为原来2倍),所以,元素的位置要么是在原位置,要么是在原位置再移动2次幂的位置。
怎么理解呢?例如我们从16扩展为32时,具体的变化如下所示:
因此元素在重新计算hash之后,因为n变为2倍,那么n-1的mask范围在高位多1bit(红色),因此新的index就会发生这样的变化:
因此,我们在扩充HashMap的时候,不需要重新计算hash,只需要看看原来的hash值新增的那个bit是1还是0就好了,是0的话索引没变,是1的话索引变成“原索引+oldCap”。可以看看下图为16扩充为32的resize示意图:
这个设计确实非常的巧妙,既省去了重新计算hash值的时间,而且同时,由于新增的1bit是0还是1可以认为是随机的,因此resize的过程,均匀的把之前的冲突的节点分散到新的bucket了。
final Node
[] resize() { Node [] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; if (oldCap > 0) { // 超过最大值就不再扩充了,就只好随你碰撞去吧MAXIMUM_CAPACITY = 1 << 30; if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } // 没超过最大值,就扩充为原来的2倍 else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // 计算新的resize上限 if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node [] newTab = (Node [])new Node[newCap]; table = newTab; if (oldTab != null) { // 把每个bucket都移动到新的buckets中 for (int j = 0; j < oldCap; ++j) { Node e; if ((e = oldTab[j]) != null) { oldTab[j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; else if (e instanceof TreeNode) ((TreeNode )e).split(this, newTab, j, oldCap); else { // preserve order Node loHead = null, loTail = null; Node hiHead = null, hiTail = null; Node next; do { next = e.next; // 原索引 if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } // 原索引+oldCap else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null); // 原索引放到bucket里 if (loTail != null) { loTail.next = null; newTab[j] = loHead; } // 原索引+oldCap放到bucket里 if (hiTail != null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }
总结
- 数据结构
- ArrayMap是采用两个数组(hash数据是有序的)
- HashMap采用的是数据+链表+红黑树
- 内存方面
- ArrayMap从存储、扩容时机(完全无法存储)、扩容大小(1.5倍),size=4or8会利用缓存,更节省内存
- HashMap存储有额外的entry map开销,扩容在0.75就开始扩容,扩容大小是原来的2倍,没有缓存机制
- 性能
- ArrayMap的put和get操作的平均时间复杂度在O(LogN),主要耗时在查找上(二分查找)
- HashMap查找、修改的时间复杂度为O(1),只有在hash冲突非常严重的情况下,才会退化到O(LogN)
使用建议:
- 存储大小小于1000的时候,强烈推荐使用ArrayMap,读取和查找效率基本无差,但更省内存
- 源码剖析结论和实验结果基本一致