import java.io.Serializable; import java.util.AbstractMap; import java.util.Map; import java.util.HashMap.Entry; public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable { /** * The default initial capacity - MUST be a power of two. * 数组默认大小 */ static final int DEFAULT_INITIAL_CAPACITY = 16; /** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two <= 1<<30. * 数组最大值 */ static final int MAXIMUM_CAPACITY = 1 << 30; /** * The load factor used when none specified in constructor. * 默认加载因子,当HashMap的数据大小>=容量*加载因子时,HashMap会将容量扩容 */ static final float DEFAULT_LOAD_FACTOR = 0.75f; /** * The table, resized as necessary. Length MUST Always be a power of two. * 初始化数组 */ transient Entry[] table; /** * The number of key-value mappings contained in this map. */ transient int size; /** * The next size value at which to resize (capacity * load factor). * @serial * 当实际数据大小超过threshold时,HashMap会将容量扩容,threshold=容量*加载因子 */ int threshold; /** * The load factor for the hash table. *加载因子 * @serial */ final float loadFactor; /** * The number of times this HashMap has been structurally modified * Structural modifications are those that change the number of mappings in * the HashMap or otherwise modify its internal structure (e.g., * rehash). This field is used to make iterators on Collection-views of * the HashMap fail-fast. (See ConcurrentModificationException). * 线程安全考虑,修改次数 */ transient volatile int modCount;
public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); // Find a power of 2 >= initialCapacity int capacity = 1; while (capacity < initialCapacity) capacity <<= 1; this.loadFactor = loadFactor; threshold = (int)(capacity * loadFactor); table = new Entry[capacity]; init(); } static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); } /** * Returns index for hash code h. */ static int indexFor(int h, int length) { return h & (length-1); }
public V put(K key, V value) { if (key == null) return putForNullKey(value); int hash = hash(key.hashCode()); int i = indexFor(hash, table.length); for (Entry<K,V> e = table[i]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null; }
private V putForNullKey(V value) { for (Entry<K,V> e = table[0]; e != null; e = e.next) { if (e.key == null) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(0, null, value, 0); return null; }
通过代码我们可以看到,hashmap本身会动态的扩展大小,所以我们上面担心的那种情况就不会出现了,但是你以为这样就万事大吉了么,错了!为什么呢,因为即使数组大小够用,我们的元素不同的key,算出来的hashcode相同,那么hash后一样是相同的,那么用它来算出的数组位置应该也是相同的,这样又出现了碰撞,比如枣庄和无锡的hash算出来是一致的,还有Aa和BB。假如我是个黑客,恶意的把这些hash算出来相同的值排列组合,AaAa,BBBB,AaBB,BBAa,然后提交表单到你的系统,那你的系统没几分钟CPU就100%了,这就是传说中的hash dos攻击。
h & (length-1)
中的h是多少&完0也是0,这样最后一位无论如何都是0,那么00001,00011,00101,00111,01001….这些位置就无法存放数据了,这样就造成了空间的浪费,同时数组的实际存放位置也少了很多,hash碰撞的概率也就更高了。当数组长度为16时,2n-1得到的二进制数的每个位上的值都为1,这使得在低位上&时,得到的和原hash的低位相同,加之hash(int h)方法对key的hashCode的进一步优化,加入了高位计算,就使得只有相同的hash值的两个值才会被放到数组中的同一个位置上形成链表。
public V get(Object key) { if (key == null) return getForNullKey(); int hash = hash(key.hashCode()); for (Entry<K,V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) return e.value; } return null; }
通过源码很清楚的可以看到,通过indexFor(hash, table.length)找到数组位置,然后通过对比key是否相等去链表中找到对应的value.
void resize(int newCapacity) { Entry[] oldTable = table; int oldCapacity = oldTable.length; if (oldCapacity == MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return; } Entry[] newTable = new Entry[newCapacity]; transfer(newTable); table = newTable; threshold = (int)(newCapacity * loadFactor); } /** * Transfers all entries from current table to newTable. */ void transfer(Entry[] newTable) { Entry[] src = table; int newCapacity = newTable.length; for (int j = 0; j < src.length; j++) { Entry<K,V> e = src[j]; if (e != null) { src[j] = null; do { Entry<K,V> next = e.next; int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; } while (e != null); } } }
建议:如果数据大小是已知的,那么在初始化hashmap的时候带上大小,而如果有并发操作,不要使用hashmap, 用ConcurrentMap,按照sun官方的说法,会引起闭环,最终导致CPU100%。具体并发是如何引起闭环的,有篇文章说的很详细:<疫苗:Java HashMap的死循环>