Java高级-HashMap工作机制

原文:How hashmap works in java

Most of you will agree that HashMap is most favorite topic for discussion in interviews now-a-days. I have gone through several discussions with my colleagues time to time and it really helped. Now, I am continuing this discussion with you all.

现在的面试中,HashMap成为了大部分人最喜欢讨论的话题。我也曾与同事之间时不时地也进行过这方面的讨论。现在,我将继续与大家一起讨论这个话题。

I am assuming that if you are interested in internal working of HashMap, you already know basics of HashMap, so i am skipping that part. But if you are new to concept, follow official java docs.

我假定现在你对HashMap的工作机制感兴趣,也了解基本的HashMap使用,所以我将跳过基本的讲解,如果你是第一次接触,可以看下java docs。

Before moving forward, i will highly recommend reading my previous post : Working with hashCode and equals methods in java

在继续之前,推荐你读下我前篇的一篇文章:Working with hashCode and equals methods in java

Sections in this post 目录

  1. Single statement answer 一句话回答
  2. What is Hashing 什么是哈希
  3. A little about Entry class 关于Entry Class一点介绍
  4. What put() method actually does put()方法做了什么
  5. How get() methods works internally get()方法做了什么
  6. Key Notes 关键几点

Single statement answer 一句话回答

If anybody asks me to describe “How HashMap works?“, I simply answer: “On principle of Hashing“. As simple as it is. Now before answering it, one must be very sure to know at least basics of Hashing. Right??

如果有人问我“How HashMap works?”,我会简单的回答:“依据hash原则”。它就是这么简单。在回答之前,首先确保必须了之最基本的Hash散列,对不?

What is Hashing 什么是哈希

Hashing in its simplest form, is a way to assigning a unique code for any variable/object after applying any formula/algorithm on its properties. A true Hashing function must follow this rule:

哈希简单的说就是对变量/对象的属性应用某种算法后得到的一个唯一的串,用这个串来确定变量/对象的唯一性。一个正确的哈希函数必须遵守如下准则:

Hash function should return the same hash code each and every time, when function is applied on same or equal objects. In other words, two equal objects must produce same hash code consistently.

当哈希函数应用在相同的对象或者equal的对象的时候,每次执行都应该返回相同的值。换句话说,两个相等的对象应该有相同的hashcode。

Note: All objects in java inherit a default implementation of hashCode() function defined in Object class. This function produce hash code by typically converting the internal address of the object into an integer, thus producing different hash codes for  all different objects.

注:所有Java对象都从Object类继承了一个默认的hashCode()方法。这个方法将对象在内存中的地址作为整数返回,这是一个很好的hash实现,他确保了不同的对象拥有不同的hashcode。

A little about Entry class 关于Entry Class一点介绍

A map by definition is : “An object that maps keys to values”. Very easy.. right?

一个map的定义是:一个映射键(key)到值(value)的对象。非常简单对吧?

So, there must be some mechanism in HashMap to store this key value pair. Answer is YES. HashMap has an inner class Entry, which looks like this:

所以,在HashMap中一定有一定的机制来存储这些键值对。使得,HashMap有一个内部类Entry,看起来如此:

static class Entry<K,V> implements Map.Entry<K,V> 
{
final K key;
V value;
Entry<K,V> next;
final int hash;
...//More code goes here
}

Surely Entry class has key and value mapping stored as attributes. Key has been marked as final and two more fields are there: next and hash. We will try to understand the need of these fields as we go forward.

当然,Entry类有属性用来存储键值对映射。key被final标记,除了key和value,我们还能看到两个变量next和hash。接下来我们试着理解这些变量的含义。

What put() method actually does put()方法做了什么

Before going into put() method’s implementation, it is very important to learn that instances of Entry class are stored in an array. HashMap class defines this variable as:

再进一步看put方法的实现之前,我们有必要看一看Entry实例在数组中的存储,HashMap中是这样定义的:

transient Entry[] table;

Now look at code implementation of put() method:

现在再来看put方法的实现:

public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}

Lets note down the steps one by one:

让我们一步一步的看:

- First of all, key object is checked for null. If key is null, value is stored in table[0] position. Because hash code for null is always 0.

首先,检查key是否为null,如果key是null值被存在table[0]的位置,因为null的hashcode始终为0。

- Then on next step, a hash value is calculated using key’s hash code by calling its hashCode() method. This hash value is used to calculate index in array for storing Entry object.  JDK designers well assumed that there might be some poorly written hashCode() functions that can return very high or low hash code value. To solve this issue, they introduced another hash() function, and passed the object’s hash code to this hash() function to bring hash value in range of array index size.

接下来,通过key的hashCode()方法计算了这个key的hash值,这个hash值被用来计算存储Entry对象的数组中的位置。JDK的设计者假设会有一些人可能写出非常差的hashCode()方法,会出现一些非常大或者非常小的hash值。为了解决这个问题,他们引入了另外一个hash函数,接受对象的hashCode(),并转换到适合数组的容量大小。

- Now indexFor(hash, table.length) function is called to calculate exact index position for storing the Entry object.

接着是indexFor(hash,table,length)方法,这个方法计算了entry对象存储的准确位置。

- Here comes the main part. Now, as we know that two unequal objects can have same hash code value, how two different objects will be stored in same array location [called bucket].
Answer is LinkedList. If you remember, Entry class had an attribute “next”. This attribute always points to next object in chain. This is exactly the behavior of LinkedList.

接下来就是主要的部分,我们都知道两个不相等的对象可能拥有过相同的hashCode值,两个不同的对象是怎么存储在相同的位置[叫做bucket]呢?
答案是LinkedList。如果你记得,Entry类有一个next变量,这个变量总是指向链中的下一个变量,这完全符合链表的特点。

So, in case of collision, Entry objects are stored in LinkedList form. When an Entry object needs to be stored in particular index, HashMap checks whether there is already an entry?? If there is no entry already present, Entry object is stored in this location.

If there is already an object sitting on calculated index, its next attribute is checked. If it is null, and current Entry object becomes next node in LinkedList. If next variable is not null, procedure is followed until next is evaluated as null.

所以,在发生碰撞的时候,entry对象会被以链表的形式存储起来,当一个Entry对象需要被存储的时候,hashmap检查该位置是否已近有了一个entry对象,如果没有就存在那里,如果有了就检查她的next属性,如果是空,当前的entry对象就作为已经存储的entry对象的下一个节点,依次类推。

What if we add the another value object with same key as entered before. Logically, it should replace the old value.  How it is done? Well, after determining the index position of Entry object, while iterating over LinkedList on calculated index, HashMap calls equals method on key object for each Entry object. All these Entry objects in LinkedList will have similar hash code but equals() method will test for true equality. If key.equals(k) will be true then both keys are treated as same key object. This will cause the replacing of value object inside Entry object only.

如果我们给已经存在的key存入另一个value会怎么样的?逻辑上,旧的值将被替换掉。在检测了Entry对象的存储位置后,hashmap将会遍历那个位置的entry链表,对每一个entry调用equals方法,这个链表中的所有对象都具有相同的hashCode()而equals方法都不等。如果发现equals方法有相等的就执行替换。

In this way, HashMap ensure the uniqueness of keys.

在这种方式下HashMap就能保证key的唯一性。

How get() methods works internally get()方法做了什么

Now we have got the idea, how key-value pairs are stored in HashMap. Next big question is : what happens when an object is passed in get method of HashMap? How the value object is determined?

Answer we already should know that the way key uniqueness is determined in put() method , same logic is applied in get() method also. The moment HashMap identify exact match for the key object passed as argument, it simply returns the value object stored in current Entry object.

If no match is found, get() method returns null.

现在我们已经了解了HashMap中存储键值对的机制。下一个问题是:怎样从一个HashMap中查询结果。其实逻辑跟put是一样的,如果传入的key有匹配就将该位置的value返回,如果没有就返回null。

Let have a look at code:

如下代码:

public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}

上面的代码看起来跟put()方法很像,除了if (e.hash == hash && ((k = e.key) == key || key.equals(k)))。

Key Notes 关键点

1) Data structure to store Entry objects is an array named table of type Entry.

存储Entry对象的数据结构是一个叫做Entry类型的table数组。

2) A particular index location in array is referred as bucket, because it can hold the first element of a LinkedList of Entry objects.

数组中一个特定的索引位置称为bucket,因为它可以容纳一个LinkedList的第一个元素的对象。

3) Key object’s hashCode() is required to calculate the index location of Entry object.

Key对象的hashCode()需要用来计算Entry对象的存储位置。

4) Key object’s equals() method is used to maintain uniqueness of Keys in map.

Key对象的equals()方法需要用来维持Map中对象的唯一性。

5) Value object’s hashCode() and equals() method are not used in HashMap’s get() and put() methods.

get()和put()方法跟Value对象的hashCode和equals方法无关。

6) Hash code for null keys is always zero, and such Entry object is always stored in zero index in Entry[].

null的hashCode总是0,这样的Entry对象总是被存储在数组的第一个位置

 

最后用一句话总结:

hashCode相等,equals不见得相等,但equals相等,hashCode一定会相等。

你可能感兴趣的:(Java高级-HashMap工作机制)