1. Both hashCode() and equals() are defined in Object:
public native int hashCode();
public boolean equals(Object obj) { return (this == obj); }
If our customized object doesn't override hashCode(), then hashCode will be generated according to the object's address.
If our customized object doesn't override equals(), then equals() will compare the addresses of the two objects.
Thus we can conclude that if two objects are equals, then their hashCode must be the same, because their address are the same.
If we use the default hashCode() method in Object, then we can also guarantee that different object will produce different hashcode.
"This function(hashCode) produces hashcode by typically converting the internal address of the object into and integer, thus produce different hashcodes for all different objects."
Remember that hashCode is useful only when the Object is intented to be put in a hash based container, such as HashMap or HashSet.
2. String.hashCode() & String.equals()
String has override the default hashCode() and equals() method.
public boolean equals(Object anObject) { if (this == anObject) { return true; } if (anObject instanceof String) { String anotherString = (String) anObject; int n = value.length; if (n == anotherString.value.length) { char v1[] = value; char v2[] = anotherString.value; int i = 0; while (n-- != 0) { if (v1[i] != v2[i]) return false; i++; } return true; } } return false; }
public int hashCode() { int h = hash; if (h == 0 && value.length > 0) { char val[] = value; for (int i = 0; i < value.length; i++) { h = 31 * h + val[i]; } hash = h; } return h; }
We can find in method above that the equals() will compare the literal value of the String,
and the hashCode() will calculate hashCode based on the literal value of the String.
Thus we can conclude that if two String are equals, then their hashCode must be the same
as they have the same literal value and hash code is calculated based on literal value,
3. HashMap & hashCode() & equals()
The structure of HashMap is:
public class HashMap<K,V> { /** * The default initial capacity - MUST be a power of two. */ static final int DEFAULT_INITIAL_CAPACITY = 16; static final int MAXIMUM_CAPACITY = 1 << 30; /** * The load factor used when none specified in constructor. */ static final float DEFAULT_LOAD_FACTOR = 0.75f; /** * The table, resized as necessary. Length MUST Always be a power of two. */ transient Entry<K,V>[] table; /** * The number of key-value mappings contained in this map. */ transient int size; /** * The next size value at which to resize (capacity * load factor). * @serial */ int threshold; /** * The load factor for the hash table. * * @serial */ final float loadFactor; public V put(K key, V value) { if (key == null) return putForNullKey(value); int hash = hash(key); int i = indexFor(hash, table.length); for (Entry<K,V> e = table[i]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null; } static int indexFor(int h, int length) { return h & (length-1); } void addEntry(int hash, K key, V value, int bucketIndex) { if ((size >= threshold) && (null != table[bucketIndex])) { resize(2 * table.length); hash = (null != key) ? hash(key) : 0; bucketIndex = indexFor(hash, table.length); } createEntry(hash, key, value, bucketIndex); } void createEntry(int hash, K key, V value, int bucketIndex) { Entry<K,V> e = table[bucketIndex]; table[bucketIndex] = new Entry<>(hash, key, value, e); size++; } static class Entry<K,V> implements Map.Entry<K,V> { final K key; V value; Entry<K,V> next; int hash; Entry(int h, K k, V v, Entry<K,V> n) { value = v; next = n; key = k; hash = h; } //.... } }
HashMap is backed by a bucket: Entry<K, V>[] table.
When we put key&value into the table.
1> key is checked for null. If key is null, the value is stored in table[0] position, because hashcode for null is always 0.
2> Calculate the hashcode of key using hash(),
the reason JDK introduced hash() instead of simply using key.hashCode() is that
there might be some poorly written hashCode() function that can return very high or low hash code value
and thus make the hashCode out of table size range.
3> Find the index in table according to hashCodeOfKey.
4> Find if we have the key in the table/bucket already, if true, then we will put the new key/value in the position then return.
5> Add the new entry into the head of LinkedList in table[index] position.
Eg1.
package edu.xmu.java.core; import static org.junit.Assert.assertEquals; import java.util.HashMap; import java.util.Map; import org.junit.Test; public class HashMapTest { @Test public void putTest() { Map<Student, String> map = new HashMap<Student, String>(); Student s = new Student("AAA", 12); Student s2 = new Student("BBB", 14); Student s3 = new Student("AAA", 12); map.put(s, "DUMMY_VAL"); map.put(s2, "DUMMY_VAL"); map.put(s3, "DUMMY_VAL"); assertEquals(3, map.size()); } public static class Student { private String name; private int age; public Student(String name, int age) { super(); this.name = name; this.age = age; } @Override public int hashCode() { return 1; } } }
The three objects: s, s2, s3 sits in the same position of the bucket because they have the same hashCode, and s3.next=s2; s2.next=s1.
If we assume that s, s3 is the same object, then we should override equals() for Student, because it is what we use to judge whether they are the same objects.
The hashCode() method is only for index of the bucket, it is just for the purpose of quick search.
Eg2:
package edu.xmu.java.core; import static org.junit.Assert.assertEquals; import java.util.HashMap; import java.util.Map; import org.junit.Test; public class HashMapTest { @Test public void putTest() { Map<Student, String> map = new HashMap<Student, String>(); Student s = new Student("AAA", 12); Student s2 = new Student("BBB", 14); Student s3 = new Student("AAA", 12); map.put(s, "DUMMY_VAL"); map.put(s2, "DUMMY_VAL"); map.put(s3, "DUMMY_VAL"); assertEquals(3, map.size()); } public static class Student { private String name; private int age; public Student(String name, int age) { super(); this.name = name; this.age = age; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; Student other = (Student) obj; if (age != other.age) return false; if (name == null) { if (other.name != null) return false; } else if (!name.equals(other.name)) return false; return true; } } }
Assumption1: If we override the equals but didn't override hashcode, thus hashcode is calculated based on the object's address.
And thus in the HashMap, s and s3 have different index in the bucket, they never have the chance to compare with each other.
// s.hash != s3.hash
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; }
It would make no sence, thus the best practice is that if we override equals() method, that means our customized object will not compare based on address and might compare based on some of its properties, then we should override hashCode accordingly to avoid that we have duplilcate objects in hash based container, like HashMap or HashSet.
Assumption2: If we override hashCode, say return the same hashCode for every instances, and didn't override equals, all instances would be stored in the hash based container, and they will be stored in the same index of the bucket-table, it would be degenerated into an LinkedList. And thus the performance of hash container would be largely affected.
The best practice is that if we want to override any of hashCode() and equals(), then we should override the other one at the same time.
Reference Links:
1) http://stackoverflow.com/questions/17803775/why-can-index-of-backing-table-in-java-util-hashmap-be-the-same-for-two-differen
2) http://howtodoinjava.com/2012/10/09/how-hashmap-works-in-java/