这篇总结一下java.util.IdentityHashMap。从类名上可以猜到,这个类本质应该还是一个散列表,只是前面有Identity修饰,是一种特殊的HashMap。
简单的说,IdentityHashMap和HashMap的区别在于对key的比较。
HashMap中会调用key的hashCode方法,hashCode方法可能会根据具体情况进行重写。在比较key时会用equals方法进行比较,equals方法也可能被重写。
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
//调用key的hashCode方法
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
//比较:k1==k2 或者 k1.equals(k2)
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
IdentityHashMap中会调用System.identityHashCode(x)来获得对象的hashCode(也就是对象的hashCode方法没有被覆盖情况下的返回值),仅用“==”来进行后面key的比较。
public V put(K key, V value) {
Object k = maskNull(key);
Object[] tab = table;
int len = tab.length;
//在hash方法中会调用System.identityHashCode(x)
int i = hash(k, len);
Object item;
while ( (item = tab[i]) != null) {
//只用==进行比较
if (item == k) {
V oldValue = (V) tab[i + 1];
tab[i + 1] = value;
return oldValue;
}
i = nextKeyIndex(i, len);
}
modCount++;
tab[i] = k;
tab[i + 1] = value;
if (++size >= threshold)
resize(len); // len == 2 * current capacity.
return null;
}
private static int hash(Object x, int length) {
int h = System.identityHashCode(x);
// Multiply by -127, and left-shift to use least bit as part of hash
return ((h << 1) - (h << 8)) & (length - 1);
}
看个具体的例子:
public static void main(String[] args) {
String k1 = new String("a");
String v1 = new String("A");
String k2 = new String("a");
String v2 = new String("A");
HashMap<String, String> hashMap = new HashMap<String, String>();
hashMap.put(k1, v1);
hashMap.put(k2, v2);
System.out.println("hashMap:"+hashMap);
IdentityHashMap<String, String> identityHashMap = new IdentityHashMap<String, String>();
identityHashMap.put(k1, v1);
identityHashMap.put(k2, v2);
System.out.println("identityHashMap:"+identityHashMap);
}
运行结果:
hashMap:{a=A}
identityHashMap:{a=A, a=A}
IdentityHashMap和HashMap内部都实现了散列表,但有区别,体现在对散列冲突的处理上。HashMap中以分离链表的方式来解决散列冲突,也就是将散列在同一个桶内的数据组织成一个链表结构;IdentityHashMap中则以开放寻址的方式来解决散列冲突,当发生散列冲突时,数据被放入下一个空闲地址(也叫线程探测法)。
所以IdentityHashMap和HashMap在实现细节上区别很大,来看一下吧。
public class IdentityHashMap<K,V>
extends AbstractMap<K,V>
implements Map<K,V>, java.io.Serializable, Cloneable
{
/**
* The initial capacity used by the no-args constructor.
* MUST be a power of two. The value 32 corresponds to the
* (specified) expected maximum size of 21, given a load factor
* of 2/3.
*/
private static final int DEFAULT_CAPACITY = 32;
/**
* The minimum capacity, used if a lower value is implicitly specified
* by either of the constructors with arguments. The value 4 corresponds
* to an expected maximum size of 2, given a load factor of 2/3.
* MUST be a power of two.
*/
private static final int MINIMUM_CAPACITY = 4;
/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<29.
*/
private static final int MAXIMUM_CAPACITY = 1 << 29;
/**
* The table, resized as necessary. Length MUST always be a power of two.
*/
private transient Object[] table;
/**
* The number of key-value mappings contained in this identity hash map.
*
* @serial
*/
private int size;
/**
* The number of modifications, to support fast-fail iterators
*/
private transient volatile int modCount;
/**
* The next size value at which to resize (capacity * load factor).
*/
private transient int threshold;
/**
* Value representing null keys inside tables.
*/
private static final Object NULL_KEY = new Object();
几个地方要注意下,MAXIMUM_CAPACITY是2的29次方,还记得HashMap的是2的30次方,为啥会差2呗呢?这个问题先记住。还有数组的类型是Object类型,由于数组的每个位置只放一个数据(目前看起来是这样),所以下面出现NULL_KEY,表示一个为null的键,为了区别于null。那value放在哪呢?继续往下看。
/**
* Constructs a new, empty identity hash map with a default expected
* maximum size (21).
*/
public IdentityHashMap() {
init(DEFAULT_CAPACITY);
}
/**
* Constructs a new, empty map with the specified expected maximum size.
* Putting more than the expected number of key-value mappings into
* the map may cause the internal data structure to grow, which may be
* somewhat time-consuming.
*
* @param expectedMaxSize the expected maximum size of the map
* @throws IllegalArgumentException if <tt>expectedMaxSize</tt> is negative
*/
public IdentityHashMap(int expectedMaxSize) {
if (expectedMaxSize < 0)
throw new IllegalArgumentException("expectedMaxSize is negative: "
+ expectedMaxSize);
init(capacity(expectedMaxSize));
}
/**
* Returns the appropriate capacity for the specified expected maximum
* size. Returns the smallest power of two between MINIMUM_CAPACITY
* and MAXIMUM_CAPACITY, inclusive, that is greater than
* (3 * expectedMaxSize)/2, if such a number exists. Otherwise
* returns MAXIMUM_CAPACITY. If (3 * expectedMaxSize)/2 is negative, it
* is assumed that overflow has occurred, and MAXIMUM_CAPACITY is returned.
*/
private int capacity(int expectedMaxSize) {
// Compute min capacity for expectedMaxSize given a load factor of 2/3
int minCapacity = (3 * expectedMaxSize)/2;
// Compute the appropriate capacity
int result;
if (minCapacity > MAXIMUM_CAPACITY || minCapacity < 0) {
result = MAXIMUM_CAPACITY;
} else {
result = MINIMUM_CAPACITY;
while (result < minCapacity)
result <<= 1;
}
return result;
}
/**
* Initializes object to be an empty map with the specified initial
* capacity, which is assumed to be a power of two between
* MINIMUM_CAPACITY and MAXIMUM_CAPACITY inclusive.
*/
private void init(int initCapacity) {
// assert (initCapacity & -initCapacity) == initCapacity; // power of 2
// assert initCapacity >= MINIMUM_CAPACITY;
// assert initCapacity <= MAXIMUM_CAPACITY;
threshold = (initCapacity * 2)/3;
table = new Object[2 * initCapacity];
}
/**
* Constructs a new identity hash map containing the keys-value mappings
* in the specified map.
*
* @param m the map whose mappings are to be placed into this map
* @throws NullPointerException if the specified map is null
*/
public IdentityHashMap(Map<? extends K, ? extends V> m) {
// Allow for a bit of growth
this((int) ((1 + m.size()) * 1.1));
putAll(m);
}
还记得HashMap中有个加载因子,这里没有这个属性,在代码中固定为2/3了。要注意init方法中的table = new Object[2 * initCapacity],这里怎么变成了容量的2倍了?继续看看添加方法吧。
/**
* Associates the specified value with the specified key in this identity
* hash map. If the map previously contained a mapping for the key, the
* old value is replaced.
*
* @param key the key with which the specified value is to be associated
* @param value the value to be associated with the specified key
* @return the previous value associated with <tt>key</tt>, or
* <tt>null</tt> if there was no mapping for <tt>key</tt>.
* (A <tt>null</tt> return can also indicate that the map
* previously associated <tt>null</tt> with <tt>key</tt>.)
* @see Object#equals(Object)
* @see #get(Object)
* @see #containsKey(Object)
*/
public V put(K key, V value) {
Object k = maskNull(key);
Object[] tab = table;
int len = tab.length;
int i = hash(k, len);
Object item;
while ( (item = tab[i]) != null) {
if (item == k) {
V oldValue = (V) tab[i + 1];
tab[i + 1] = value;
return oldValue;
}
i = nextKeyIndex(i, len);
}
modCount++;
tab[i] = k;
tab[i + 1] = value;
if (++size >= threshold)
resize(len); // len == 2 * current capacity.
return null;
}
首先是对"Null"key的处理,将其转化成前面的NULL_KEY对象,以便计算hash值,同时能和真正为null的位置区分。
接下来通过一个hash函数计算hash值:
/**
* Returns index for Object x.
*/
private static int hash(Object x, int length) {
int h = System.identityHashCode(x);
// Multiply by -127, and left-shift to use least bit as part of hash
return ((h << 1) - (h << 8)) & (length - 1);
}
这个hash函数中,语句((h<<1) - (h<<8)) & (length-1),后半部分之前分析过,就是取余。前面的(h<<1) - (h<<8)会保留h的后7位和一个0作为后8位,前面的由于减操作变的不确定了。这么做的意图还不是很清晰(路过高手指点下),但这样做会产生一个结果就是整合hash函数得到的结果都是偶数。为什么是偶数?继续往下看。
接下来会用得到的hash值作为内部数据的下标来获取数组中对应的数据,如果对应位置没有数据的话(没有发生冲突),会把Key放到这个位置,然后把Value放到下一个位置(hash + 1)。也就是说,所有的Key都保存在内部数组的偶数下标位置,所有的Value都保存在所有的奇数下标位置。现在明白了为什么初始化时内部容量变成了2倍,为什么hash函数的结果是偶数了吧。所以当产生冲突的时候,会继续寻找下一个可用位置:
/**
* Circularly traverses table of size len.
*/
private static int nextKeyIndex(int i, int len) {
return (i + 2 < len ? i + 2 : 0);
}
有了put方法的分析,get方法也就显而易见了。
/**
* Returns the value to which the specified key is mapped,
* or {@code null} if this map contains no mapping for the key.
*
* <p>More formally, if this map contains a mapping from a key
* {@code k} to a value {@code v} such that {@code (key == k)},
* then this method returns {@code v}; otherwise it returns
* {@code null}. (There can be at most one such mapping.)
*
* <p>A return value of {@code null} does not <i>necessarily</i>
* indicate that the map contains no mapping for the key; it's also
* possible that the map explicitly maps the key to {@code null}.
* The {@link #containsKey containsKey} operation may be used to
* distinguish these two cases.
*
* @see #put(Object, Object)
*/
public V get(Object key) {
Object k = maskNull(key);
Object[] tab = table;
int len = tab.length;
int i = hash(k, len);
while (true) {
Object item = tab[i];
if (item == k)
return (V) tab[i + 1];
if (item == null)
return null;
i = nextKeyIndex(i, len);
}
}
添加和获取的方法实现起来比较容易,但删除一个数据就蛋疼了。需要考虑冲突的问题,如果删除的位置之前发生过n次冲突,那么删除动作不仅要删除当前位置的数据,还要把之前发生过冲突的n个元素的位置往前调整(不严格的说),否则线性探测链就会断掉,会影响其他操作(如get)的正确性。来看一下具体实现:
/**
* Removes the mapping for this key from this map if present.
*
* @param key key whose mapping is to be removed from the map
* @return the previous value associated with <tt>key</tt>, or
* <tt>null</tt> if there was no mapping for <tt>key</tt>.
* (A <tt>null</tt> return can also indicate that the map
* previously associated <tt>null</tt> with <tt>key</tt>.)
*/
public V remove(Object key) {
Object k = maskNull(key);
Object[] tab = table;
int len = tab.length;
int i = hash(k, len);
while (true) {
Object item = tab[i];
if (item == k) {
modCount++;
size--;
V oldValue = (V) tab[i + 1];
tab[i + 1] = null;
tab[i] = null;
closeDeletion(i);
return oldValue;
}
if (item == null)
return null;
i = nextKeyIndex(i, len);
}
}
可以看到,前面的代码和put类似,只是删除东西完成之后还要调用一个closeDeletion方法。
/**
* Rehash all possibly-colliding entries following a
* deletion. This preserves the linear-probe
* collision properties required by get, put, etc.
*
* @param d the index of a newly empty deleted slot
*/
private void closeDeletion(int d) {
// Adapted from Knuth Section 6.4 Algorithm R
Object[] tab = table;
int len = tab.length;
// Look for items to swap into newly vacated slot
// starting at index immediately following deletion,
// and continuing until a null slot is seen, indicating
// the end of a run of possibly-colliding keys.
Object item;
for (int i = nextKeyIndex(d, len); (item = tab[i]) != null;
i = nextKeyIndex(i, len) ) {
// The following test triggers if the item at slot i (which
// hashes to be at slot r) should take the spot vacated by d.
// If so, we swap it in, and then continue with d now at the
// newly vacated i. This process will terminate when we hit
// the null slot at the end of this run.
// The test is messy because we are using a circular table.
int r = hash(item, len);
if ((i < r && (r <= d || d <= i)) || (r <= d && d <= i)) {
tab[d] = item;
tab[d + 1] = tab[i + 1];
tab[i] = null;
tab[i + 1] = null;
d = i;
}
}
}
大概过程是从删除的位置开始往下找(通过nextKeyIndex方法),当存在下一个元素的时候,通过重新计算元素hash值的方式,判断下一个元素放到本容器中时是否产生过冲突。如果产生过冲突,那么将该元素放到当前位置(d、d+1),将该元素之前位置(i、i+1)清空。然后把i赋给d,继续往下找。知道找到null位置为止。
有了上面的分析,其他代码很容易看懂了。java.util.IdentityHashMap总结到这里。