libconhash is a consistent hashing library which can be compiled both on Windows and Linux platforms, with the following features:
Now we will consider the common way to do load balance. The machine number chosen to cache object o will be:
hash(o) mod n
Here, n is the total number of cache machines. While this works well until you add or remove cache machines:
hash(o) mod (n+1)
hash(o) mod (n-1)
So you can see that almost all objects will hashed into a new location. This will be a disaster since the originating content servers are swamped with requests from the cache machines. And this is why you need consistent hashing.
Consistent hashing can guarantee that when a cache machine is removed, only the objects cached in it will be rehashed; when a new cache machine is added, only a fairly few objects will be rehashed.
Now we will go into consistent hashing step by step.
Commonly, a hash function will map a value into a 32-bit key, 0~2^32-1
. Now imagine mapping the range into a circle, then the key will be wrapped, and 0 will be followed by 2^32-1, as illustrated in figure 1.
Now consider four objects: object1~object4
. We use a hash function to get their key values and map them into the circle, as illustrated in figure 2.
hash(object1) = key1; ..... hash(object4) = key4;
The basic idea of consistent hashing is to map the cache and objects into the same hash space using the same hash function.
Now consider we have three caches, A, B and C, and then the mapping result will look like in figure 3.
hash(cache A) = key A; .... hash(cache C) = key C;
Now all the caches and objects are hashed into the same space, so we can determine how to map objects into caches. Take object obj
for example, just start from where obj
is and head clockwise on the ring until you find a server. If that server is down, you go to the next one, and so forth. See figure 3 above.
According to the method, object1
will be cached into cache A; object2
and object3
will be cached into cache C, and object4
will be cached into cache B.
Now consider the two scenarios, a cache is down and removed; and a new cache is added.
If cache B is removed, then only the objects that cached in B will be rehashed and moved to C; in the example, see object4
illustrated in figure 4.
If a new cache D is added, and D is hashed between object2
and object3
in the ring, then only the objects that are between D and B will be rehashed; in the example, see object2
, illustrated in figure 5.
It is possible to have a very non-uniform distribution of objects between caches if you don't deploy enough caches. The solution is to introduce the idea of "virtual nodes".
Virtual nodes are replicas of cache points in the circle, each real cache corresponds to several virtual nodes in the circle; whenever we add a cache, actually, we create a number of virtual nodes in the circle for it; and when a cache is removed, we remove all its virtual nodes from the circle.
Consider the above example. There are two caches A and C in the system, and now we introduce virtual nodes, and the replica is 2, then three will be 4 virtual nodes. Cache A1 and cache A2 represent cache A; cache C1 and cache C2 represent cache C, illustrated as in figure 6.
Then, the map from object to the virtual node will be:
objec1->cache A2; objec2->cache A1; objec3->cache C1; objec4->cache C2
When you get the virtual node, you get the cache, as in the above figure.
So object1 and object2 are cached into cache A, and object3 and object4 are cached into cache. The result is more balanced now.
So now you know what consistent hashing is.
/* initialize conhash library * @pfhash : hash function, NULL to use default MD5 method * return a conhash_s instance */ CONHASH_API struct conhash_s* conhash_init(conhash_cb_hashfunc pfhash); /* finalize lib */ CONHASH_API void conhash_fini(struct conhash_s *conhash); /* set node */ CONHASH_API void conhash_set_node(struct node_s *node, const char *iden, u_int replica); /* * add a new node * @node: the node to add */ CONHASH_API int conhash_add_node(struct conhash_s *conhash, struct node_s *node); /* remove a node */ CONHASH_API int conhash_del_node(struct conhash_s *conhash, struct node_s *node); ... /* * lookup a server which object belongs to * @object: the input string which indicates an object * return the server_s structure, do not modify the value, * or it will cause a disaster */ CONHASH_API const struct node_s* conhash_lookup(const struct conhash_s *conhash, const char *object);
Libconhash is very easy to use. There is a sample in the project that shows how to use the library.
First, create a conhash instance. And then you can add or remove nodes of the instance, and look up objects.
The update node's replica function is not implemented yet.
/* init conhash instance */ struct conhash_s *conhash = conhash_init(NULL); if(conhash) { /* set nodes */ conhash_set_node(&g_nodes[0], "titanic", 32); /* ... */ /* add nodes */ conhash_add_node(conhash, &g_nodes[0]); /* ... */ printf("virtual nodes number %d\n", conhash_get_vnodes_num(conhash)); printf("the hashing results--------------------------------------:\n"); /* lookup object */ node = conhash_lookup(conhash, "James.km"); if(node) printf("[%16s] is in node: [%16s]\n", str, node->iden); }
Reference