1. 用你熟悉的编程语言实现一致性 hash 算法。
为了解决某一个节点挂了或者实时增加一个节点,带来分片规则改变,数据需要迁移的问题。以下是虚拟节点hash一致算法图
代码如下
package com;
import com.alibaba.fastjson.JSON;
import org.apache.commons.collections.MapUtils;
import java.util.*;
public class UniformityHashWithVirtualNode {
//大hash环上的服务节点
private static String[]nodeServerArr = {"192.168.1.1","192.168.1.2","192.168.1.3","192.168.1.4",
"192.168.1.5","192.168.1.6","192.168.1.7","192.168.1.8","192.168.1.9","192.168.1.10"};
//真实节点列表
private static ListrealNodeList =new LinkedList<>();
//虚拟节点个数(若有虚拟节点设置虚拟节点个数在服务列表后拼接后加入真是节点列表)
private static final int VIRTUAL_NODE_NUM =1;
//虚拟节点(key 虚拟节点hash值,值为虚拟节点名称)
private static SortedMapvirtualNodeMap =new TreeMap<>();
static {
for (String nodeServer :nodeServerArr) {
realNodeList.add(nodeServer);
}
for (int i =0; i
for (String nodeServer :realNodeList) {
String virtualNodeServer = nodeServer +"_VIRTUAL_NODE_" + i;
int hashCode = Math.abs((virtualNodeServer).hashCode());
virtualNodeMap.put(hashCode, virtualNodeServer);
}
}
System.out.println("虚拟节点列表:" + JSON.toJSONString(virtualNodeMap));
}
public static void main(String[] args) {
List keyList =new ArrayList<>();
for (int i =0; i <1000000; i++) {
String key = i +"_" +"KV";
keyList.add(key);
}
Map nodeServerCountMap =new HashMap<>();
keyList.forEach(key -> {
String serverNode =getServerNode(key);
System.out.println(key +"的 hash 值" + key.hashCode() +"被路由到了[" + serverNode +"] 节点上");
if (nodeServerCountMap.containsKey(serverNode)) {
nodeServerCountMap.put(serverNode,nodeServerCountMap.get(serverNode) +1);
}else {
nodeServerCountMap.put(serverNode,1);
}
});
System.out.println("虚拟节点与个数: [nodeServerCountMap] = " + JSON.toJSONString(nodeServerCountMap));
}
private static String getServerNode(String hashKey) {
SortedMap nodeMap =virtualNodeMap.tailMap(Math.abs((hashKey).hashCode()));
if (MapUtils.isEmpty(nodeMap)) {
int firstNodeIndex =virtualNodeMap.firstKey();
return virtualNodeMap.get(firstNodeIndex);
}
int firstNodeIndex = nodeMap.firstKey();
return nodeMap.get(firstNodeIndex);
}
}
2. 编写测试用例测试这个算法,测试 100万KV 数据,10个服务器节点的情况下,计算这些 KV 数据在服务器上分布数量的标准差,以评估算法的存储负载不均衡性。
跑完测试后 100万KV 数据 分布情况如下:
虚拟节点与个数: [nodeServerCountMap] = {"192.168.1.3_VIRTUAL_NODE_0":154174,"192.168.1.8_VIRTUAL_NODE_0":30856,"192.168.1.10_VIRTUAL_NODE_0":53507,"192.168.1.9_VIRTUAL_NODE_0":30418,"192.168.1.6_VIRTUAL_NODE_0":30772,"192.168.1.7_VIRTUAL_NODE_0":30509,"192.168.1.1_VIRTUAL_NODE_0":49726,"192.168.1.2_VIRTUAL_NODE_0":205338,"192.168.1.5_VIRTUAL_NODE_0":208867,"192.168.1.4_VIRTUAL_NODE_0":205833}
2W,3W,4W,15W不等,负载并不太均衡