For completeness here is a simple implementation in Java. In order for consistent hashing to be effective it is important to have a hash function thatmixes well. Most implementations ofObject
'shashCode
donot mix well - for example, they typically produce a restricted number of small integer values - so we have aHashFunction
interface to allow a custom hash function to be used. MD5 hashes are recommended here.
import java.util.Collection; import java.util.SortedMap; import java.util.TreeMap; public class ConsistentHash<T> { private final HashFunction hashFunction; private final int numberOfReplicas; private final SortedMap<Integer, T> circle = new TreeMap<Integer, T>(); public ConsistentHash(HashFunction hashFunction, int numberOfReplicas, Collection<T> nodes) { this.hashFunction = hashFunction; this.numberOfReplicas = numberOfReplicas; for (T node : nodes) { add(node); } } public void add(T node) { for (int i = 0; i < numberOfReplicas; i++) { circle.put(hashFunction.hash(node.toString() + i), node); } } public void remove(T node) { for (int i = 0; i < numberOfReplicas; i++) { circle.remove(hashFunction.hash(node.toString() + i)); } } public T get(Object key) { if (circle.isEmpty()) { return null; } int hash = hashFunction.hash(key); if (!circle.containsKey(hash)) { SortedMap<Integer, T> tailMap = circle.tailMap(hash); hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey(); }
return circle.get(hash);//这一行可以有很大优化,毕竟在万个以内的整数中查找一个最接近的大于等于hash的算法是非常简单的,而不必用treemap的实现。
}}numberOfReplicas的经验值在100-200之间,这就是一个物理 节点对应多少个虚拟节点,如果我们把环形拉直,其实就是每个节点在数组中的位置,物理节点很少,比如10个物理节点,如果平均分布在Integer.MIN-Integer.MAX中,那么每个节点间的区间大约有2^29这么大,假如某一时间段的一些key的hash正好在这一范围,那么它们就被聚集到某一台物理节点上。在采用了虚拟节点后,每个物理节点对应的虚拟节点和其它物理节点对应的虚拟节点是平均交叉分布的,极大地减少了节点区间带来的分布聚集。以下是一个简单实现的测试: