分布式服务 Partition

1. Two ways to do Partition:

Key range partition:

1. easy to support range queries.

2. Risk of hot spots if the application often accesses keys that are close together in the sorted order.

 

Hash partition:

1. making range queries inefficient, but may distribute load more evenly.

2. When partitioning by hash, it is common to create a fixed number of partitions in advance, to assign several partitions to each node, and to move entire partitions from one node to another when nodes are added or removed. Dynamic partitioning can also be used.

 

2. How to solve hot spot ?

1. a simple techinique is to add a random number to the begining or the end of the key.

2. having split the writes across different keys, any reads now have to do additional work, as they have to read the data from all n keys and combine it.

3. you also need some way of keeping track of which keys are being split.

 

3. Secondary Index:

two main approach: document-based partition (local index) and term-based partition (global index)

local index:

1. write is easy. you only need to deal with the partition that contains the document ID that you are writing

2. read is hard. scatter gather.

global index: 

1. write is hard. a write to a single document may now affect multiple partitions of the index.

2. read is easy. a client only needs to make a request to the partition containing the term that it wants.

 

4. Rebalance Partition

Fixed numbers of Partition: it's operationally simpler but it's difficult to choose which number is right.

Dynamic numbers of Partition: 

 

5. Service Discovery

1. Allow client to contact any node. (proxy in server side)

2. Sending all requests from clients to routing tier first. ( isolated proxy)

3. Require clients be aware of the partitioning and assignment of partition to nodes. (SDK side)

 

 

 

 

你可能感兴趣的:(分布式)