SparseArray 是 android.util 包下一的一个类
SparseArray中文直译过来就是:稀疏数组。顾名思义,这个数组和我们平时用到的Array不太一样。平时我们用到的Array都是按顺序排列的。而SparseArray可以实现间隔排列,也就是说一个大小为10的SparseArray,有可能是这样的[null, null, 1, null ,2, null, null, null, 3, null]。是不是有点神奇?
问题 1: 为什么要实现这样一个数组呢?它的优势是什么?
源码里面是这么说的:
* <code>SparseArray</code> maps integers to Objects and, unlike a normal array of Objects, * its indices can contain gaps. <code>SparseArray</code> is intended to be more memory-efficient * than a * <a href="/reference/java/util/HashMap"><code>HashMap</code></a>, because it avoids * auto-boxing keys and its data structure doesn't rely on an extra entry object * for each mapping. * * <p>Note that this container keeps its mappings in an array data structure, * using a binary search to find keys. The implementation is not intended to be appropriate for * data structures * that may contain large numbers of items. It is generally slower than a * <code>HashMap</code> because lookups require a binary search, * and adds and removes require inserting * and deleting entries in the array. For containers holding up to hundreds of items, * the performance difference is less than 50%. * * <p>To help with performance, the container includes an optimization when removing * keys: instead of compacting its array immediately, it leaves the removed entry marked * as deleted. The entry can then be re-used for the same key or compacted later in * a single garbage collection of all removed entries. This garbage collection * must be performed whenever the array needs to be grown, or when the map size or * entry values are retrieved. * * <p>It is possible to iterate over the items in this container using * {@link #keyAt(int)} and {@link #valueAt(int)}. Iterating over the keys using * <code>keyAt(int)</code> with ascending values of the index returns the * keys in ascending order. In the case of <code>valueAt(int)</code>, the * values corresponding to the keys are returned in ascending order.
翻译过来就是这么几点:
源码里面的注释已经写得比较清楚了,原来SparseArray虽然说是数组,但是他的对手是HashMap啊。关于性能方面的对比我们最后会通过例子来比较,现在让我们继续看看SparseArray的其他有意思的源码。
SparseArray提供了两个构造方法
public SparseArray() {
this(10);
}
/**
* Creates a new SparseArray containing no mappings that will not
* require any additional memory allocation to store the specified
* number of mappings. If you supply an initial capacity of 0, the
* sparse array will be initialized with a light-weight representation
* not requiring any additional array allocations.
*/
public SparseArray(int initialCapacity) {
if (initialCapacity == 0) {
mKeys = EmptyArray.INT;
mValues = EmptyArray.OBJECT;
} else {
mValues = ArrayUtils.newUnpaddedObjectArray(initialCapacity);
mKeys = new int[mValues.length];
}
mSize = 0;
}
由于SparseArray的源码都挺简单的,所以我们就粗略的说一下常用方法以及对应的源码。
public void put(int key, E value) {
// 1
int i = ContainerHelpers.binarySearch(mKeys, mSize, key);
if (i >= 0) {
mValues[i] = value;
} else {
i = ~i; // 2
if (i < mSize && mValues[i] == DELETED) {
mKeys[i] = key;
mValues[i] = value;
return;
}
// 3
if (mGarbage && mSize >= mKeys.length) {
gc();
// Search again because indices may have changed.
i = ~ContainerHelpers.binarySearch(mKeys, mSize, key);
}
mKeys = GrowingArrayUtils.insert(mKeys, mSize, i, key);
mValues = GrowingArrayUtils.insert(mValues, mSize, i, value);
mSize++;
}
}
static int binarySearch(int[] array, int size, int value) {
int lo = 0;
int hi = size - 1;
while (lo <= hi) {
final int mid = (lo + hi) >>> 1;
final int midVal = array[mid];
if (midVal < value) {
lo = mid + 1;
} else if (midVal > value) {
hi = mid - 1;
} else {
return mid; // value found
}
}
return ~lo; // value not present
}
~
操作。其实按位补这个操作在binarySearch()
里面也用到了,如果找不到对应数据,那么在最后也会对i进行按位补的操作,所以返回的是一个负数。 public void append(int key, E value) {
if (mSize != 0 && key <= mKeys[mSize - 1]) {
put(key, value);
return;
}
if (mGarbage && mSize >= mKeys.length) {
gc();
}
mKeys = GrowingArrayUtils.append(mKeys, mSize, key);
mValues = GrowingArrayUtils.append(mValues, mSize, value);
mSize++;
}
append()和put()最大的区别就是这两行,append是在末尾插入元素,put是在计算出来的index处插入元素。
mKeys = GrowingArrayUtils.append(mKeys, mSize, key);
mValues = GrowingArrayUtils.append(mValues, mSize, value);
问题2: SparseArray是如何实现key升序排列的呢?
private void gc() {
// Log.e("SparseArray", "gc start with " + mSize);
int n = mSize;
int o = 0;
int[] keys = mKeys;
Object[] values = mValues;
for (int i = 0; i < n; i++) {
Object val = values[i];
if (val != DELETED) {
if (i != o) {
keys[o] = keys[i];
values[o] = val;
values[i] = null;
}
o++;
}
}
mGarbage = false;
mSize = o;
// Log.e("SparseArray", "gc end with " + mSize);
}
在SparseArray中,gc()
方法随处可见。只要涉及到元素修改,无论是添加还是删除,都会调用这个方法。
其实在SparseArray中,我们通过一个标志位和一个DELETED
对象来进行gc()操作。
private static final Object DELETED = new Object();
private boolean mGarbage = false;
在我们调用remove()或者delete()方法的时候,都会修改mGarbage
为true
,并且把对应的value设置为DELETED
,在真正调用gc()
的时候,再把对应的value设置为null
这样做的好处就是不用每次立马执行gc操作,而是在有需要的时候才执行,节省了性能开销。
我们可以通过:
因为key和value都是存放在数组中,而且key是按照升序排序的。因此所有关于key的查找方法,都是调用了二分查找。而value相关的查找方法则是通过for循环遍历values数组。
我们进行三组测试:
第一组:顺序插入
首先是SparseArray:
@Test
fun test_sparse_array() {
val array = SparseArrayCompat<String>(1000000)
val current = System.currentTimeMillis()
println("BlueLzy - test_sparse_array: ---- start: 0")
for (i in 0..1000000) {
array.put(i, i.toString())
}
println("BlueLzy - test_sparse_array: ---- end:" + (System.currentTimeMillis() - current))
}
100万条数据的耗时:
BlueLzy - test_sparse_array: ---- start: 0
BlueLzy - test_sparse_array: ---- end:69
然后是HashMap:
@Test
fun test_hash_map() {
val hashMap = HashMap<Int, String>()
val current = System.currentTimeMillis()
println("BlueLzy - test_hash_map: ---- start: 0")
for (i in 0..1000000) {
hashMap[i] = i.toString()
}
println("BlueLzy - test_hash_map: ---- end: " + (System.currentTimeMillis() - current))
}
100万条数据的耗时:
BlueLzy - test_hash_map: ---- start: 0
BlueLzy - test_hash_map: ---- end: 170
第二组:倒序插入
SparseArray:
@Test
fun test_sparse_array2() {
val array = SparseArrayCompat<String>(1000000)
val current = System.currentTimeMillis()
println("BlueLzy - test_sparse_array: ---- start: 0")
for (i in 1000000 downTo 0) {
array.put(i, i.toString())
}
println("BlueLzy - test_sparse_array: ---- end:" + (System.currentTimeMillis() - current))
}
耗时:
BlueLzy - test_sparse_array: ---- start: 0
BlueLzy - test_sparse_array: ---- end:200705
HashMap
@Test
fun test_hash_map2() {
val hashMap = HashMap<Int, String>()
val current = System.currentTimeMillis()
println("BlueLzy - test_hash_map: ---- start: 0")
for (i in 1000000 downTo 0) {
hashMap[i] = i.toString()
}
println("BlueLzy - test_hash_map: ---- end: " + (System.currentTimeMillis() - current))
}
耗时:
BlueLzy - test_hash_map: ---- start: 0
BlueLzy - test_hash_map: ---- end: 200
这一组的对比就有点夸张了,SparseArray花了200秒,而HashMap只需要200毫秒
第三组:随机插入
SparseArray:
@Test
fun test_sparse_array3() {
val array = SparseArrayCompat<String>(1000000)
val current = System.currentTimeMillis()
println("BlueLzy - test_sparse_array: ---- start: 0")
val random = Random()
for (i in 0..1000000) {
array.put(random.nextInt() *1000000, i.toString())
}
println("BlueLzy - test_sparse_array: ---- end:" + (System.currentTimeMillis() - current))
}
耗时:
BlueLzy - test_sparse_array: ---- start: 0
BlueLzy - test_sparse_array: ---- end:85158
HashMap:
@Test
fun test_hash_map3() {
val hashMap = HashMap<Int, String>()
val current = System.currentTimeMillis()
println("BlueLzy - test_hash_map: ---- start: 0")
val random = Random()
for (i in 0..1000000) {
hashMap[random.nextInt() * 1000000] = i.toString()
}
println("BlueLzy - test_hash_map: ---- end: " + (System.currentTimeMillis() - current))
}
耗时
BlueLzy - test_hash_map: ---- start: 0
BlueLzy - test_hash_map: ---- end: 650
类似的结果,随机插入100W条数据,HashMap时间也比SparseArray少很多。
我们可以得出这样的结论:
首先我们回答一下问题 2
关于SparseArray是如何实现key升序排列这个问题,如果有认真看他的put方法,会发现他传入的最后一个参数是key
,但是在二分查找里面这个参数被当成value来比较了,也就是说二分查找排序使用的元素是key。
举个例子:
当前数组为空数组,那么我们传入key = 3,二分查找返回负数,就会在index = 3处插入元素。
传入key = 2,这个时候二分查找返回2,所以会在第2个位置插入元素,
现在keys = [Object, Object, 2,3 …]
我们二分查找的时候其实就是比较传入的key值,并在对应的位置插入元素,从而实现了数组升序排列这一特性。
其实我们看到二分查找就应该知道,因为只有有序排列的数组才能通过使用二分查找快速定位到元素的位置。也正因为把对应的key存放到对应的数组下标,所以才会有稀疏数组
这个名称。
在Android开发的时候,我们经常都会用到HashMap,但是好像很少会在项目里用到SparseArray。而Google官方给我们提供的这个工具类,在某些场景下会是更优解。特别是移动端很少需要一次性插入或者移除上千条数据,这个时候可以使用SparseArray来替代HashMap,节约一点内存空间。