Consistent hashing is hot since the popularity of Dynamo and its open source
implementation Cassandra.
For a quick learning without involving too much theory, I suggest
- http://www.java.net/blog/2007/11/27/consistent-hashing
This post is written by Tom White and has a simple implementation of consistent
hashing in Java. I have add the missing part of hash function missing in the
post. Here is the source code.
import java.util.*; import java.security.*; import java.math.*; class HashFunction { private static final long mask32 = (1l<<8) - 1; private static final BigInteger divider = BigInteger.valueOf(mask32); public int hash(Object o) { try { String str = o.toString(); MessageDigest m = MessageDigest.getInstance("MD5"); m.reset(); m.update(str.getBytes("UTF-8")); byte[] digest = m.digest(); BigInteger bigInt = new BigInteger(1, digest); return bigInt.mod(divider).intValue(); } catch (Exception e) { throw new RuntimeException(e); } } public static void debug() { } } public class ConsistentHash<T> { private final HashFunction hashFunction; private final int numberOfReplicas; private final SortedMap<Integer, T> circle = new TreeMap<Integer, T>(); public ConsistentHash(HashFunction hashFunction, int numberOfReplicas, Collection<T> nodes) { this.hashFunction = hashFunction; this.numberOfReplicas = numberOfReplicas; for (T node : nodes) add(node); } public void add(T node) { for (int i = 0; i < numberOfReplicas; i++) circle.put(hashFunction.hash(node.toString() + i), node); } public void remove(T node) { for (int i = 0; i < numberOfReplicas; i++) circle.remove(hashFunction.hash(node.toString() + i)); } /* * Returns the node for the given key. */ public T get(Object key) { if (circle.isEmpty()) return null; int hash = hashFunction.hash(key); if (!circle.containsKey(hash)) { SortedMap<Integer, T> tailMap = circle.tailMap(hash); hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey(); } return circle.get(hash); } public String toString() { return circle.toString(); } public static void main (String [] args) { HashFunction hashFunc = new HashFunction(); List<String> ls = new ArrayList<String>(); ls.add("A"); ls.add("B"); ls.add("C"); int num = Integer.parseInt(args[0]); ConsistentHash<String> ch = new ConsistentHash<String>(hashFunc, num, ls); System.out.println(ch); ch.remove("A"); System.out.println(ch); } }
- http://michaelnielsen.org/blog/consistent-hashing
This post has a good explanation why a lot of key-value will need to move to
other nodes. Before reading this post, I spent a long time figuring out the
explanation by myself. Here I give a concrete example to help me to understnad
it.
Imagine there are 3 machines used as web cache for key-value pair (k, v). The
function for allocating (k, v) is is hash(k) mode 3.
+-----------+---+---+---+----+----+----+----+ | machine-0 | 0 | 3 | 6 | 9 | 12 | 15 | 18 | +-----------+---+---+---+----+----+----+----+ | machine-1 | 1 | 4 | 7 | 10 | 13 | 16 | 19 | +-----------+---+---+---+----+----+----+----+ | machine-2 | 2 | 5 | 8 | 11 | 14 | 17 | 20 | +---------------+---+---+----+----+----+----+
Now a new machine is added. The function is hash(k) mode 4. This new function
indicates the following key-value pair allocation among the machines.
+-----------+---+---+----+----+----+ | machine-0 | 0 | 4 | 8 | 12 | 16 | +-----------+---+---+----+----+----+ | machine-1 | 1 | 5 | 9 | 13 | 17 | +-----------+---+---+----+----+----+ | machine-2 | 2 | 6 | 10 | 14 | 18 | +-----------+---+---+----+----+----+ | machine-3 | 3 | 7 | 11 | 15 | 19 | +-----------+---+---+----+----+----+