Consistent Hashing

Consistent hashing is hot since the popularity of Dynamo and its open source
implementation Cassandra. 

For a quick learning without involving too much theory, I suggest

- http://www.java.net/blog/2007/11/27/consistent-hashing
This post is written by Tom White and has a simple implementation of consistent
hashing in Java. I have add the missing part of hash function missing in the
post. Here is the source code.

import java.util.*;
import java.security.*;
import java.math.*;

class HashFunction {

  private static final long mask32 = (1l<<8) - 1;
  private static final BigInteger divider = BigInteger.valueOf(mask32);

  public int hash(Object o) {
    try {
      String str = o.toString();
      MessageDigest m = MessageDigest.getInstance("MD5");
      m.reset();
      m.update(str.getBytes("UTF-8"));
      byte[] digest = m.digest();
      BigInteger bigInt = new BigInteger(1, digest);
      return bigInt.mod(divider).intValue();
    } catch (Exception e) {
      throw new RuntimeException(e);
    }
  }

  public static void debug() {
  }
}

public class ConsistentHash<T> {
  private final HashFunction hashFunction;
  private final int numberOfReplicas;
  private final SortedMap<Integer, T> circle = new TreeMap<Integer, T>();

  public ConsistentHash(HashFunction hashFunction, 
      int numberOfReplicas, Collection<T> nodes) {
    this.hashFunction = hashFunction;
    this.numberOfReplicas = numberOfReplicas;

    for (T node : nodes) 
      add(node);
  }

  public void add(T node) {
    for (int i = 0; i < numberOfReplicas; i++) 
      circle.put(hashFunction.hash(node.toString() + i), node);
  }

  public void remove(T node) {
    for (int i = 0; i < numberOfReplicas; i++)
      circle.remove(hashFunction.hash(node.toString() + i));
  }

  /*
   * Returns the node for the given key.
   */
  public T get(Object key) {
    if (circle.isEmpty())
      return null;

    int hash = hashFunction.hash(key);
    if (!circle.containsKey(hash)) {
      SortedMap<Integer, T> tailMap = circle.tailMap(hash);
      hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();
    }

    return circle.get(hash);
  }

  public String toString() {
    return circle.toString();
  }

  public static void main (String [] args)  {
    HashFunction hashFunc = new HashFunction();

    List<String> ls = new ArrayList<String>();
    ls.add("A");
    ls.add("B");
    ls.add("C");

    int num = Integer.parseInt(args[0]);
    ConsistentHash<String> ch = new ConsistentHash<String>(hashFunc, 
        num, 
        ls);
    System.out.println(ch);  

    ch.remove("A");
    System.out.println(ch);  
  }

}

 

- http://michaelnielsen.org/blog/consistent-hashing
This post has a good explanation why a lot of key-value will need to move to
other nodes. Before reading this post, I spent a long time figuring out the
explanation by myself. Here I give a concrete example to help me to understnad
it.

Imagine there are 3 machines used as web cache for key-value pair (k, v). The
function for allocating (k, v) is is hash(k) mode 3.

+-----------+---+---+---+----+----+----+----+
| machine-0 | 0 | 3 | 6 | 9  | 12 | 15 | 18 |
+-----------+---+---+---+----+----+----+----+
| machine-1 | 1 | 4 | 7 | 10 | 13 | 16 | 19 |
+-----------+---+---+---+----+----+----+----+
| machine-2 | 2 | 5 | 8 | 11 | 14 | 17 | 20 |
+---------------+---+---+----+----+----+----+

 

Now a new machine is added. The function is hash(k) mode 4. This new function
indicates the following key-value pair allocation among the machines.

+-----------+---+---+----+----+----+
| machine-0 | 0 | 4 | 8  | 12 | 16 |
+-----------+---+---+----+----+----+
| machine-1 | 1 | 5 | 9  | 13 | 17 |
+-----------+---+---+----+----+----+
| machine-2 | 2 | 6 | 10 | 14 | 18 |
+-----------+---+---+----+----+----+
| machine-3 | 3 | 7 | 11 | 15 | 19 |
+-----------+---+---+----+----+----+

 

 

你可能感兴趣的:(Web,cache,Security,Blog,cassandra)