CHAPTER 5: DESIGN CONSISTENT HASHING

The rehashing problem

serverIndex = hash(key) % N
CHAPTER 5: DESIGN CONSISTENT HASHING_第1张图片CHAPTER 5: DESIGN CONSISTENT HASHING_第2张图片
However, problems arise when new servers are added, or existing servers are
removed.
CHAPTER 5: DESIGN CONSISTENT HASHING_第3张图片
CHAPTER 5: DESIGN CONSISTENT HASHING_第4张图片
This means that when server 1 goes offline, most cache clients will
connect to the wrong servers to fetch data. This causes a storm of cache misses.

Consistent hashing

Quoted from Wikipedia: "Consistent hashing is a special kind of hashing such that when a hash table is re-sized and consistent hashing is used, only k/n keys need to be remapped on average, where k is the number of keys, and n is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped [1]”.

Hash space and hash ring

CHAPTER 5: DESIGN CONSISTENT HASHING_第5张图片

Hash servers

CHAPTER 5: DESIGN CONSISTENT HASHING_第6张图片

Hash keys

CHAPTER 5: DESIGN CONSISTENT HASHING_第7张图片
CHAPTER 5: DESIGN CONSISTENT HASHING_第8张图片

Add a server

CHAPTER 5: DESIGN CONSISTENT HASHING_第9张图片
Remove a server
CHAPTER 5: DESIGN CONSISTENT HASHING_第10张图片

Two issues in the basic approach

• Map servers and keys on to the ring using a uniformly distributed hash function.
• To find out which server a key is mapped to, go clockwise from the key position until the first server on the ring is found.

First, it is impossible to keep the same size
of partitions on the ring for all servers considering a server can be added or removed
CHAPTER 5: DESIGN CONSISTENT HASHING_第11张图片
Second, it is possible to have a non-uniform key distribution on the ring. For instance, if servers are mapped to positions listed in Figure 5-11, most of the keys are stored on server 2. However, server 1 and server 3 have no data.
CHAPTER 5: DESIGN CONSISTENT HASHING_第12张图片

Virtual nodes

CHAPTER 5: DESIGN CONSISTENT HASHING_第13张图片
As the number of virtual nodes increases, the distribution of keys becomes more balanced.

However, more spaces are needed to store data about virtual nodes.
This is a tradeoff, and we can tune the number of virtual nodes to fit our system requirements.

Find affected keys

CHAPTER 5: DESIGN CONSISTENT HASHING_第14张图片
located between s3 and s4 need to be redistributed to s4.

CHAPTER 5: DESIGN CONSISTENT HASHING_第15张图片

keys located between s0 and s1 must be redistributed to s2.

The benefits of consistent hashing include:
• Minimized keys are redistributed when servers are added or removed.
• It is easy to scale horizontally because data are more evenly distributed.
• Mitigate hotspot key problem. Excessive access to a specific shard could cause server
overload. Imagine data for Katy Perry, Justin Bieber, and Lady Gaga all end up on the
same shard. Consistent hashing helps to mitigate the problem by distributing the data more
evenly.

你可能感兴趣的:(System,Design,系统架构)