Redis cluster tutorial

This document is a gentle introduction to Redis Cluster,that does not use complex to understand distributed systems concepts. Itprovides instructions about how to setup a cluster, test, and operate it,without going into the details that are covered in the Redis Cluster specification butjust describing how the system behaves from the point of view of the user.

这篇文档是对Redis Cluster一个优雅的介绍,不会使用负责的描述来理解分布式系统的概念。这里提供了关于如何设置集群,测试和操作方面的指导,而不会涉及到太多的细节,那些内容会在Redis Cluster Specification中将解,这里只是从用户的角度来描述系统是如何工作的。

However this tutorial tries to provide information aboutthe availability and consistency characteristics of Redis Cluster from thepoint of view of the final user, stated in a simple to understand way.


Note this tutorial requires Redis version 3.0 or higher.

本指导要求Redis的版本是3.0 或更高

If you plan to run a serious Redis Cluster deployment, themore formal specification is a suggested reading, even if not strictlyrequired. However it is a good idea to start from this document, play withRedis Cluster some time, and only later read the specification.

如果你打算去运行一个重要的Redis集群部署,建议阅读更多正式的文档,及时不是严格要求的。当然从这篇文档开始是一个很好的主意,然后再运行Redis 集群一段时间,最后再阅读规范。

Redis Cluster 101

Redis Cluster provides a way to run a Redis installationwhere data is automaticallysharded across multiple Redis nodes.


Redis Cluster also provides some degree of availability during partitions,that is in practical terms the ability to continue the operations when somenodes fail or are not able to communicate. However the cluster stops to operatein the event of larger failures (for example when the majority of masters areunavailable).


So in practical terms, what you get with Redis Cluster?


  • The ability to automatically split your dataset among multiple nodes.
  • 可以自动将数据集分离到多个节点中。
  • The ability to continue operations when a subset of the nodes are experiencing failures or are unable to communicate with the rest of the cluster.
  • 当部分节点失败或无法连接到集群中其它节点的时候可以继续操作。

Redis Cluster TCP ports

Every Redis Cluster node requires two TCP connections open.The normal Redis TCP port used to serve clients, for example 6379, plus theport obtained by adding 10000 to the data port, so 16379 in the example.

每一个Redis 集群节点需要两个TCP连接建立,通常的Redis TCP端口用来供客户端服务,比如6739,额外的端口是前一个端口号+10000,如16379.

This second high port is used for the Cluster bus, that is a node-to-nodecommunication channel using a binary protocol. The Cluster bus is used by nodesfor failure detection, configuration update, failover authorization and soforth. Clients should never try to communicate with the cluster bus port, butalways with the normal Redis command port, however make sure you open bothports in your firewall, otherwise Redis cluster nodes will be not able tocommunicate.


The command port and cluster bus port offset is fixed andis always 10000.


Note that for a Redis Cluster to work properly you need,for each node:


  1. The normal client communication port (usually 6379) used to communicate with clients to be open to all the clients that need to reach the cluster, plus all the other cluster nodes (that use the client port for keys migrations).
  2. 通常的客户端通讯端口(通常是6379)必须是想所有客户端和集群中的所有端点开放的,客户端用来访问集群,集群节点用来进行keys迁移
  3. The cluster bus port (the client port + 10000) must be reachable from all the other cluster nodes.
  4. 集群总线端口(客户端端口+10000)必须是对集群中的其它节点开放的。

If you don't open both TCP ports, your cluster will notwork as expected.


The cluster bus uses a different, binary protocol, for nodeto node data exchange, which is more suited to exchange information betweennodes using little bandwidth and processing time.


Redis Cluster data sharding

Redis Cluster does not use consistent hashing, but adifferent form of sharding where every key is conceptually part of what we callan hash slot.


There are 16384 hash slots in Redis Cluster, and to computewhat is the hash slot of a given key, we simply take the CRC16 of the keymodulo 16384.


Every node in a Redis Cluster is responsible for a subsetof the hash slots, so for example you may have a cluster with 3 nodes, where:


  • Node A contains hash slots from 0 to 5500.
  • 节点A包含从0到5500哈希槽
  • Node B contains hash slots from 5501 to 11000.
  • 节点B包含从5501到11000的哈希槽
  • Node C contains hash slots from 11001 to 16384.
  • 节点C包含从11001到16384的哈希槽

This allows to add and remove nodes in the cluster easily.For example if I want to add a new node D, I need to move some hash slot fromnodes A, B, C to D. Similarly if I want to remove node A from the cluster I canjust move the hash slots served by A to B and C. When the node A will be emptyI can remove it from the cluster completely.


Because moving hash slots from a node to another does notrequire to stop operations, adding and removing nodes, or changing thepercentage of hash slots hold by nodes, does not require any downtime.


Redis Cluster supports multiple key operations as long asall the keys involved into a single command execution (or whole transaction, orLua script execution) all belong to the same hash slot. The user can forcemultiple keys to be part of the same hash slot by using a concept called hash tags.

Redis集群支持多个key操作,只需要一个命令中(或者是事务,LUA脚本中)的key都属于同一个哈希槽。用户可以使用 hash tags的概念来迫使多个key属于同一个哈希槽。

Hash tags are documented in the Redis Clusterspecification, but the gist is that if there is a substring between {} bracketsin a key, only what is inside the string is hashed, so for example this{foo}keyand another{foo}key are guaranteed to be in the same hash slot,and can be used together in a command with multiple keys as arguments.

哈希标签会在Redis Cluster specification中说明,要点就是如果key中有一对大括号{}包含的子字符串,那只有这一部分会用来进行哈希计算。因此比如 this{foo}key 和 another{foo}key 会保证使用同一个哈希槽,也就可以在一个命令一起使用多个key来作为参数。

Redis Cluster master-slave model

In order to remain available when a subset of master nodesare failing or are not able to communicate with the majority of nodes, RedisCluster uses a master-slave model where every hash slot has from 1 (the masteritself) to N replicas (N-1 additional slaves nodes).


In our example cluster with nodes A, B, C, if node B failsthe cluster is not able to continue, since we no longer have a way to servehash slots in the range 5501-11000.


However when the cluster is created (or at a latter time)we add a slave node to every master, so that the final cluster is composed ofA, B, C that are masters nodes, and A1, B1, C1 that are slaves nodes, thesystem is able to continue if node B fails.


Node B1 replicates B, and B fails, the cluster will promotenode B1 as the new master and will continue to operate correctly.


However note that if nodes B and B1 fail at the same timeRedis Cluster is not able to continue to operate.


Redis Cluster consistency guarantees

Redis Cluster is not able to guarantee strong consistency.In practical terms this means that under certain conditions it is possible thatRedis Cluster will lose writes that were acknowledged by the system to theclient.

Redis集群无法保证强一致性,在实际情况中意味着在特定的条件下 Redis集群可能会丢失一些系统已经向客户端承认的写操作。

The first reason why Redis Cluster can lose writes is becauseit uses asynchronous replication. This means that during writes the followinghappens:


  • Your client writes to the master B.
  • 客户端写向B
  • The master B replies OK to your client.
  • B返回OK到客户端
  • The master B propagates the write to its slaves B1, B2 and B3.
  • B将写传播到B1,B2,B3

As you can see B does not wait for an acknowledge from B1,B2, B3 before replying to the client, since this would be a prohibitive latencypenalty for Redis, so if your client writes something, B acknowledges thewrite, but crashes before being able to send the write to its slaves, one ofthe slaves (that did not received the write) can be promoted to master, losingthe write forever.


This is very similar to what happens with most databases that areconfigured to flush data to disk every second, so it is a scenario you arealready able to reason about because of past experiences with traditionaldatabase systems not involving distributed systems. Similarly you can improveconsistency by forcing the database to flush data on disk before replying tothe client, but this usually results into prohibitively low performance. Thatwould be the equivalent of synchronous replication in the case of RedisCluster.


Basically there is a trade-off to take between performanceand consistency.


Redis Cluster has support for synchronous writes whenabsolutely needed, implemented via the WAITcommand, this makes losing writes a lot less likely, however note that RedisCluster does not implement strong consistency even when synchronous replicationis used: it is always possible under more complex failure scenarios that aslave that was not able to receive the write is elected as master.


There is another notable scenario where Redis Cluster willlose writes, that happens during a network partition where a client is isolatedwith a minority of instances including at least a master.


Take as an example our 6 nodes cluster composed of A, B, C,A1, B1, C1, with 3 masters and 3 slaves. There is also a client, that we willcall Z1.


After a partition occurs, it is possible that in one sideof the partition we have A, C, A1, B1, C1, and in the other side we have B andZ1.


Z1 is still able to write to B, that will accept itswrites. If the partition heals in a very short time, the cluster will continuenormally. However if the partition lasts enough time for B1 to be promoted tomaster in the majority side of the partition, the writes that Z1 is sending toB will be lost.


Note that there is a maximum window to the amount of writesZ1 will be able to send to B: if enough time has elapsed for the majority sideof the partition to elect a slave as master, every master node in the minorityside stops accepting writes.


This amount of time is a very important configurationdirective of Redis Cluster, and is called the node timeout.


After node timeout has elapsed, a master node is consideredto be failing, and can be replaced by one of its replicas. Similarly after nodetimeout has elapsed without a master node to be able to sense the majority ofthe other master nodes, it enters an error state and stops accepting writes.

当node timeout过去后,主节点会被认为是失败,可以被它的复制中的某一个替代。类似的当node timeout时间过去后,如果主服务器节点无法感知到集群中的大部分节点,它会进入错误状态,拒绝接受写请求。

Redis Cluster 配置参数

We are about to create an example cluster deployment.Before to continue let's introduce the configuration parameters that RedisCluster introduces in the redis.conf file. Some will be obvious,others will be more clear as you continue reading.


  • cluster-enabled : 如果是yes则Redis节点支持集群,否则节点最为独立模式。
  • If yes enables Redis Cluster support in a specific Redis instance. Otherwise the instance starts as a stand alone instance as usually.
  • cluster-config-file : 注意到不要管这个选项名字,这并不是一个用户可以编辑的配置文件,而是Redis集群节点当发生变化时自动固话集群的配置(通常来说是状态)的文件,这样就可以在启动的时候重新读取。这个文件列举了集群中的其它节点和他们的状态,固话的值等。通常来说这个文件会在接收到某些信息时重写然后刷新到硬盘。
  • Note that despite the name of this option, this is not an user editable configuration file, but the file where a Redis Cluster node automatically persists the cluster configuration (the state, basically) every time there is a change, in order to be able to re-read it at startup. The file lists things like the other nodes in the cluster, their state, persistent variables, and so forth. Often this file is rewritten and flushed on disk as a result of some message reception.
  • cluster-node-timeout : Redis集群中一个节点不可用的最长时间,如果没有配置则认为是失败的。如果一个主服务器节点在失联超过指定的时间,就会进行故障转移到它的从服务器上。这个参数也控制着其它Redis中其它重要的事情。显著的是,每一个节点在无法连接到集群中的大部分节点特定时间后就会拒绝接受写
  • The maximum amount of time a Redis Cluster node can be unavailable, without it being considered as failing. If a master node is not reachable for more than the specified amount of time, it will be failed over by its slaves. This parameter controls other important things in Redis Cluster. Notably, every node that can't reach the majority of master nodes for the specified amount of time, will stop accepting queries.
  • cluster-slave-validity-factor :如果设置为0,一个从服务器会总是尝试故障转移为主服务器,而不管主服务器和该从服务器失联的时间。如果该只是整数,有node timeout值乘以这里指定的值计算出来的最大失联时间,如果该节点是从服务器,并且和主服务器失联的时间超过了计算出来的最大失联时间,则该节点永远不会被提升为主服务器。比如节点的node timeout设置为5秒,validity factor设置为10,而从服务器和主服务器的失联时间超过50秒后就不会尝试和其主服务器进行故障转移。任何和0不一样的值都可能会造成Redis集群在主服务器失败,而没有从服务器可以进行故障转移时变得不可用。这种情况下只有在原先的主服务器重新接入到集群中才可以恢复正常。
  •  If set to zero, a slave will always try to failover a master, regardless of the amount of time the link between the master and the slave remained disconnected. If the value is positive, a maximum disconnection time is calculated as the node timeout value multiplied by the factor provided with this option, and if the node is a slave, it will not try to start a failover if the master link was disconnected for more than the specified amount of time. For example if the node timeout is set to 5 seconds, and the validity factor is set to 10, a slave disconnected from the master for more than 50 seconds will not try to failover its master. Note that any value different than zero may result in Redis Cluster to be not available after a master failure if there is no slave able to failover it. In that case the cluster will return back available only when the original master rejoins the cluster.
  • cluster-migration-barrier :
  • Minimum number of slaves a master will remain connected with, for another slave to migrate to a master which is no longer covered by any slave. See the appropriate section about replica migration in this tutorial for more information.
  • cluster-require-full-coverage :如果设置为yes,也是默认的值,集群会在如果某些键的空间没有被集群中的任何一个节点覆盖时拒绝写,古国设置为no,集群会集训处理这些请求,即使只能处理其中的一部分key.
  •  If this is set to yes, as it is by default, the cluster stops accepting writes if some percentage of the key space is not covered by any node. If the option is set to no, the cluster will still serve queries even if only requests about a subset of keys can be processed.


Note: to deploy a Redis Cluster manually is very important to learncertain operation aspects of it. However if you want to get a cluster up andrunning ASAP skip this section and the next one and go directly to Creating a Redis Cluster usingthe create-cluster script.


To create a cluster, the first thing we need is to have afew empty Redis instances running in cluster mode. This basically meansthat clusters are not created using normal Redis instances, but a special modeneeds to be configured so that the Redis instance will enable the Clusterspecific features and commands.


The following is a minimal Redis cluster configurationfile:


port 7000
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes

As you can see what enables the cluster mode is simply the cluster-enableddirective. Every instance also contains the path of a file where theconfiguration for this node is stored, that by default is nodes.conf.This file is never touched by humans, it is simply generated at startup by theRedis Cluster instances, and updated every time it is needed.


Note that the minimal cluster that works as expectedrequires to contain at least three master nodes. For your first tests it isstrongly suggested to start a six nodes cluster with three masters and threeslaves.


To do so, enter a new directory, and create the followingdirectories named after the port number of the instance we'll run inside anygiven directory.


Something like:

mkdir cluster-test
cd cluster-test
mkdir 7000 7001 7002 7003 7004 7005

Create a redis.conf file inside each of thedirectories, from 7000 to 7005. As a template for your configuration file justuse the small example above, but make sure to replace the port number 7000with the right port number according to the directory name.


Now copy your redis-server executable, compiled from the latest sourcesin the unstable branch at GitHub, into the cluster-testdirectory, and finally open 6 terminal tabs in your favorite terminalapplication.


Start every instance like that, one every tab:


cd 7000
../redis-server ./redis.conf

As you can see from the logs of every instance, since no nodes.conffile existed, every node assigns itself a new ID.


[82462] 26 Nov 11:56:55.329 * No cluster configuration found, I'm 97a3a64667477371c4479320d683e4c8db5858b1

This ID will be used forever by this specific instance inorder for the instance to have a unique name in the context of the cluster.Every node remembers every other node using this IDs, and not by IP or port. IPaddresses and ports may change, but the unique node identifier will neverchange for all the life of the node. We call this identifier simply Node ID.

这个ID会被这个特定实例永远使用来保证该实例在集群的上下文中有一个唯一的名字。每一个节点使用ID来记录其它的节点,而不是IP或端口,因为IP地址和端口是可能改变的,但是唯一的节点标识在节点的整个生命周期都是不变的,我们简单的称呼这个标识为Node ID.


Now that we have a number of instances running, we need tocreate our cluster by writing some meaningful configuration to the nodes.


This is very easy to accomplish as we are helped by theRedis Cluster command line utility called redis-trib, that is aRuby program executing special commands in the instances in order to create newclusters, check or reshard an existing cluster, and so forth.


The redis-trib utility is in the srcdirectory of the Redis source code distribution. To create your cluster simplytype:


./redis-trib.rb create --replicas 1 \

The command used here is create, since we want to create anew cluster. The option --replicas 1 means that we want a slavefor every master created. The other arguments are the list of addresses of theinstances I want to use to create the new cluster.

这里使用的命令是create,因为我们想要创建一个新的集群,选项—replicas1 标识我们想要对每一个主服务器有一个从服务器,其它的参数就是我们使用来创建集群的实例的地址列表。

Obviously the only setup with our requirements is to createa cluster with 3 masters and 3 slaves.


Redis-trib will propose you a configuration. Accept typing yes. The clusterwill be configured and joined, that means, instances will be bootstrapped into talkingwith each other. Finally if everything went ok you'll see a message like that:


[OK] All 16384 slots covered

This means that there is at least a master instance servingeach of the 16384 slots available.



If you don't want to create a Redis Cluster by configuringand executing individual instances manually as explained above, there is a muchsimpler system (but you'll not learn the same amount of operational details).


Just check utils/create-cluster directory inthe Redis distribution. There is a script called create-clusterinside (same name as the directory it is contained into), it's a simple bashscript. In order to start a 6 nodes cluster with 3 masters and 3 slaves justtype the following commands:


  1. create-cluster start
  2. create-cluster create

Reply to yes in step 2 when the redis-tributility wants you to accept the cluster layout.


You can now interact with the cluster, the first node willstart at port 30001 by default. When you are done, stop the cluster with:


  1. create-cluster stop.

Please read the README inside this directoryfor more information on how to run the script.



At this stage one of the problems with Redis Cluster is thelack of client libraries implementations.


I'm aware of the following implementations:


  • redis-rb-cluster 是由我自己(@antirez)写的,用Ruby实现来让其它语言使用,这是一个原始的redis-rb的简单包装,实现了和集群有效交互的最小语义。
  • is a Ruby implementation written by me (@antirez) as a reference for other languages. It is a simple wrapper around the original redis-rb, implementing the minimal semantics to talk with the cluster efficiently.
  • redis-py-cluster redis-rb-cluster的python入口,支持大部分的redis-py功能,目前正在开发中。
  • A port of redis-rb-cluster to Python. Supports majority of redis-py functionality. Is in active development.
  • The popular Predis has support for Redis Cluster, the support was recently updated and is in active development.
  • 最流行的Predis已经支持了Redis集群,是最近更新并已经在有效部署了。
  • The most used Java client, Jedis recently added support for Redis Cluster, see the Jedis Cluster section in the project README.
  • 使用最多的Java客户端,jedis最近增减了对Redis 集群的支持,可以查看其工程的README中关于Jedis 集群的章节。
  • StackExchange.Redis 提供了对C#的支持(对于大部分.net的语言也可以使用,如VB,F#等)
  • offers support for C# (and should work fine with most .NET languages; VB, F#, etc)
  • thunk-redis offers support for Node.js and io.js, it is a thunk/promise-based redis client with pipelining and cluster.
  • 在Github中的Redis仓库中的非稳定版本的redis-cli 工具实现了一些基本的集群支持,使用-c开关来打开。
  •  utility in the unstable branch of the Redis repository at GitHub implements a very basic cluster support when started with the -c switch.

An easy way to test Redis Cluster is either to try any ofthe above clients or simply the redis-cli command line utility.The following is an example of interaction using the latter:


$ redis-cli -c -p 7000
redis> set foo bar
-> Redirected to slot [12182] located at
redis> set hello world
-> Redirected to slot [866] located at
redis> get foo
-> Redirected to slot [12182] located at
redis> get hello
-> Redirected to slot [866] located at

Note: if you created the cluster using the script your nodes may listento different ports, starting from 30001 by default.


The redis-cli cluster support is very basic so it alwaysuses the fact that Redis Cluster nodes are able to redirect a client to theright node. A serious client is able to do better than that, and cache the map betweenhash slots and nodes addresses, to directly use the right connection to theright node. The map is refreshed only when something changed in the clusterconfiguration, for example after a failover or after the system administratorchanged the cluster layout by adding or removing nodes.



Before going forward showing how to operate the RedisCluster, doing things like a failover, or a resharding, we need to create someexample application or at least to be able to understand the semantics of asimple Redis Cluster client interaction.


In this way we can run an example and at the same time tryto make nodes failing, or start a resharding, to see how Redis Cluster behavesunder real world conditions. It is not very helpful to see what happens whilenobody is writing to the cluster.


This section explains some basic usage of redis-rb-clustershowing two examples. The first is the following, and is the example.rbfile inside the redis-rb-cluster distribution:


     1  require './cluster'
     3  startup_nodes = [
     4      {:host => "", :port => 7000},
     5      {:host => "", :port => 7001}
     6  ]
     7  rc =,32,:timeout => 0.1)
     9  last = false
    11  while not last
    12      begin
    13          last = rc.get("__last__")
    14          last = 0 if !last
    15      rescue => e
    16          puts "error #{e.to_s}"
    17          sleep 1
    18      end
    19  end
    21  ((last.to_i+1)..1000000000).each{|x|
    22      begin
    23          rc.set("foo#{x}",x)
    24          puts rc.get("foo#{x}")
    25          rc.set("__last__",x)
    26      rescue => e
    27          puts "error #{e.to_s}"
    28      end
    29      sleep 0.1
    30  }

The application does a very simple thing, it sets keys inthe form foo to number, one after theother. So if you run the program the result is the following stream ofcommands:


  • SET foo0 0
  • SET foo1 1
  • SET foo2 2
  • 以此类推...

The program looks more complex than it should usually as itis designed to show errors on the screen instead of exiting with an exception,so every operation performed with the cluster is wrapped by begin rescueblocks.


The line 7 is the first interesting line in the program. Itcreates the Redis Cluster object, using as argument a list of startup nodes, themaximum number of connections this object is allowed to take against differentnodes, and finally the timeout after a given operation is considered to befailed.

第7行是程序中应该感兴趣的第一行。创建了一个Redis集群对象,使用了startup nodes,对象允许的连接到不同节点的连接的最大数,最后是操作认为失败的超时作为参数。

The startup nodes don't need to be all the nodes of thecluster. The important thing is that at least one node is reachable. Also notethat redis-rb-cluster updates this list of startup nodes as soon as it is ableto connect with the first node. You should expect such a behavior with anyother serious client.

启动节点不需要指定集群的全部节点,最终要的事情是至少一个节点是可以连接的。另外也注意到redis-rb-cluster会在其可以连接到第一个节点可连接的节点删ghoulish会更新这个startup nodes列表。你在其它严肃的客户端中也会看到类似的行为。

Now that we have the Redis Cluster object instance storedin the rcvariable we are ready to use the object like if it was a normal Redis objectinstance.

现在我们有了Redis 集群对象实例,保存在rc变量中。我们打算向普通的Redis对象实例一样使用这个对象。

This is exactly what happens in line 11 to 19:when we restart the example we don't want to start again with foo0,so we store the counter inside Redis itself. The code above is designed to readthis counter, or if the counter does not exist, to assign it the value of zero.


However note how it is a while loop, as we want to tryagain and again even if the cluster is down and is returning errors. Normal applicationsdon't need to be so careful.

然而注意到这是一个while 循环,因此我们会一直尝试即使是集群已经关闭或返回错误。普通的应用是不需要如此小心的。

Linesbetween 21 and 30 start the main loop wherethe keys are set or an error is displayed.


Note the sleep call at the end of the loop. Inyour tests you can remove the sleep if you want to write to the cluster as fastas possible (relatively to the fact that this is a busy loop without realparallelism of course, so you'll get the usually 10k ops/second in the best ofthe conditions).


Normally writes are slowed down in order for the exampleapplication to be easier to follow by humans.


Starting the application produces the following output:


ruby ./example.rb
^C (停止应用)

This is not a very interesting program and we'll use abetter one in a moment but we can already see what happens during a reshardingwhen the program is running.



Now we are ready to try a cluster resharding. To do thisplease keep the example.rb program running, so that you can see if there issome impact on the program running. Also you may want to comment the sleepcall in order to have some more serious write load during resharding.

Resharding basically means to move hash slots from a set ofnodes to another set of nodes, and like cluster creation it is accomplishedusing the redis-trib utility.


To start a resharding just type:


./redis-trib.rb reshard

You only need to specify a single node, redis-trib willfind the other nodes automatically.


Currently redis-trib is only able to reshard with theadministrator support, you can't just say move 5% of slots from this node tothe other one (but this is pretty trivial to implement). So it starts withquestions. The first is how much a big resharding do you want to do:


How many slots do you want to move (from 1 to 16384)?

We can try to reshard 1000 hash slots, that should alreadycontain a non trivial amount of keys if the example is still running withoutthe sleep call.


Then redis-trib needs to know what is the target of theresharding, that is, the node that will receive the hash slots. I'll use thefirst master node, that is,, but I need to specify the Node IDof the instance. This was already printed in a list by redis-trib, but I canalways find the ID of a node with the following command if I need:

Redis-trib需要知道重分配的目标,也就是接收哈希槽的新节点。我会使用第一个节点,也就是127.0.0.1:7000,但是我需要指定实例的Node ID,这通常会被redis-trib打印出来,不过我可以通过下面的命令找到节点的ID:

$ redis-cli -p 7000 cluster nodes | grep myself
97a3a64667477371c4479320d683e4c8db5858b1 :0 myself,master - 0 0 0 connected 0-5460

Ok so my target node is97a3a64667477371c4479320d683e4c8db5858b1.


Now you'll get asked from what nodes you want to take thosekeys. I'll just type all in order to take a bit of hash slots fromall the other master nodes.


After the final confirmation you'll see a message for everyslot that redis-trib is going to move from a node to another, and a dot will beprinted for every actual key moved from one side to the other.


While the resharding is in progress you should be able tosee your example program running unaffected. You can stop and restart itmultiple times during the resharding if you want.


At the end of the resharding, you can test the health ofthe cluster with the following command:


./redis-trib.rb check

All the slots will be covered as usually, but this time themaster at will have more hash slots, something around 6461.



Reshardings can be performed automatically without the needto manually enter the parameters in an interactive way. This is possible usinga command line like the following:


./redis-trib.rb reshard : --from  --to  --slots --yes

This allows to build some automatism if you are likely toreshard often, however currently there is no way for redis-trib toautomatically rebalance the cluster checking the distribution of keys acrossthe cluster nodes and intelligently moving slots as needed. This feature willbe added in the future.



The example application we wrote early is not very good. Itwrites to the cluster in a simple way without even checking if what was writtenis the right thing.


From our point of view the cluster receiving the writescould just always write the key foo to 42 to everyoperation, and we would not notice at all.

从我们的角度来看,集群接收写操作可能每次只是对key foo写道了42,我们无法注意到全部。

So in the redis-rb-cluster repository, thereis a more interesting application that is called consistency-test.rb.It uses a set of counters, by default 1000, and sends INCR commands in order to incrementthe counters.


However instead of just writing, the application does twoadditional things:


  • When a counter is updated using INCR, the application remembers the write.
  • 当使用INCR更新计数器是,应用记录写操作。
  • It also reads a random counter before every write, and check if the value is what we expected it to be, comparing it with the value it has in memory.
  • 每次写操作之前会随机读取一个计数器,和内存中的值进行对比检查它的值是不是正确的

What this means is that this application is a simple consistency checker,and is able to tell you if the cluster lost some write, or if it accepted awrite that we did not received acknowledgment for. In the first case we'll seea counter having a value that is smaller than the one we remember, while in thesecond case the value will be greater.


Running the consistency-test application produces a line ofoutput every second:


$ ruby consistency-test.rb
925 R (0 err) | 925 W (0 err) |
5030 R (0 err) | 5030 W (0 err) |
9261 R (0 err) | 9261 W (0 err) |
13517 R (0 err) | 13517 W (0 err) |
17780 R (0 err) | 17780 W (0 err) |
22025 R (0 err) | 22025 W (0 err) |
25818 R (0 err) | 25818 W (0 err) |

The line shows the number of Reads and Writesperformed, and the number of errors (query not accepted because of errors sincethe system was not available).


If some inconsistency is found, new lines are added to theoutput. This is what happens, for example, if I reset a counter manually whilethe program is running:


$ redis> set key_217 0
(in the other tab I see...)
94774 R (0 err) | 94774 W (0 err) |
98821 R (0 err) | 98821 W (0 err) |
102886 R (0 err) | 102886 W (0 err) | 114 lost |
107046 R (0 err) | 107046 W (0 err) | 114 lost |

When I set the counter to 0 the real value was 114, so theprogram reports 114 lost writes (INCRcommands that are not remembered by the cluster).


This program is much more interesting as a test case, sowe'll use it to test the Redis Cluster failover.



Note: during this test, you should take a tab open with theconsistency test application running.

注意:在这个测试中,你应该重新打开一个窗口来运行consistency test应用。

In order to trigger the failover, the simplest thing we cando (that is also the semantically simplest failure that can occur in adistributed system) is to crash a single process, in our case a single master.


We can identify a cluster and crash it with the followingcommand:


$ redis-cli -p 7000 cluster nodes | grep master
3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 master - 0 1385482984082 0 connected 5960-10921
2938205e12de373867bf38f1ca29d31d0ddb3e46 master - 0 1385482983582 0 connected 11423-16383
97a3a64667477371c4479320d683e4c8db5858b1 :0 myself,master - 0 0 0 connected 0-5959 10922-11422

Ok, so 7000, 7001, and 7002 are masters. Let's crash node7002 with the DEBUGSEGFAULT command:

OK ,因此7000,7001,7002是主服务器,让我们使用DEBUG SEGFAULT命令来关闭7002节点:

$ redis-cli -p 7002 debug segfault
Error: Server closed the connection

Now we can look at the output of the consistency test tosee what it reported.

现在我们可以看到consistency test的输出来看下发生了什么

18849 R (0 err) | 18849 W (0 err) |
23151 R (0 err) | 23151 W (0 err) |
27302 R (0 err) | 27302 W (0 err) |
... many error warnings here ...
29659 R (578 err) | 29660 W (577 err) |
33749 R (578 err) | 33750 W (577 err) |
37918 R (578 err) | 37919 W (577 err) |
42077 R (578 err) | 42078 W (577 err) |

As you can see during the failover the system was not ableto accept 578 reads and 577 writes, however no inconsistency was created in thedatabase. This may sound unexpected as in the first part of this tutorial westated that Redis Cluster can lose writes during the failover because it usesasynchronous replication. What we did not say is that this is not very likelyto happen because Redis sends the reply to the client, and the commands toreplicate to the slaves, about at the same time, so there is a very smallwindow to lose data. However the fact that it is hard to trigger does not meanthat it is impossible, so this does not change the consistency guaranteesprovided by Redis cluster.

正如看到的,在系统的故障转移过程中无法接收了578个读和577个写,然而没有任何不一致的情况会产生。这听起来和本篇指导中提到的Redis在故障转移过程中会由于异步复制丢失写保持一致。我们没有说的是这几乎不可能发生是因为Redis会向客户端发送响应同时将命令复制到从服务器,所有只会有一个非常小的窗口丢失数据。而是事实是这很难出发不意味着 不可能,因此这并不改变Redis集群保证的一致性。

We can now check what is the cluster setup after the failover(note that in the meantime I restarted the crashed instance so that it rejoinsthe cluster as a slave):


$ redis-cli -p 7000 cluster nodes
3fc783611028b1707fd65345e763befb36454d73 slave 3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 0 1385503418521 0 connected
a211e242fc6b22a9427fed61285e85892fa04e08 slave 97a3a64667477371c4479320d683e4c8db5858b1 0 1385503419023 0 connected
97a3a64667477371c4479320d683e4c8db5858b1 :0 myself,master - 0 0 0 connected 0-5959 10922-11422
3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e master - 0 1385503419023 3 connected 11423-16383
3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 master - 0 1385503417005 0 connected 5960-10921
2938205e12de373867bf38f1ca29d31d0ddb3e46 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385503418016 3 connected

Now the masters are running on ports 7000, 7001 and 7005.What was previously a master, that is the Redis instance running on port 7002,is now a slave of 7005.


The output of the CLUSTER NODES command maylook intimidating, but it is actually pretty simple, and is composed of thefollowing tokens:

CLUSTER NODES命令的输出可能看起来吓人,但是其实是非常简单的,有下面的这些部分组成:

  • Node ID
  • ip:port
  • flags: master, slave, myself, fail, ...
  • if it is a slave, the Node ID of the master
  • Time of the last pending PING still waiting for a reply.
  • Time of the last PONG received.
  • Configuration epoch for this node (see the Cluster specification).
  • Status of the link to this node.
  • Slots served...


Sometimes it is useful to force a failover without actuallycausing any problem on a master. For example in order to upgrade the Redisprocess of one of the master nodes it is a good idea to failover it in order toturn it into a slave with minimal impact on availability.


Manual failovers are supported by Redis Cluster using the CLUSTER FAILOVER command,that must be executed in one of the slaves of the master you want tofailover.

Redis集群支持使用CLUSTER FAILOVER来进行手工故障转移,这必须是在你想要进行故障转移的主服务器的某一个从服务器上运行。

Manual failovers are special and are safer compared tofailovers resulting from actual master failures, since they occur in a way thatavoid data loss in the process, by switching clients from the original masterto the new master only when the system is sure that the new master processedall the replication stream from the old one.


This is what you see in the slave log when you perform amanual failover:


# Manual failover user request accepted.
# Received replication offset for paused master manual failover: 347540
# All master replication stream processed, manual failover can start.
# Start of election delayed for 0 milliseconds (rank #0, offset 347540).
# Starting a failover election for epoch 7545.
# Failover election won: I'm the new master.

Basically clients connected to the master we are failingover are stopped. At the same time the master sends its replication offset tothe slave, that waits to reach the offset on its side. When the replicationoffset is reached, the failover starts, and the old master is informed aboutthe configuration switch. When the clients are unblocked on the old master,they are redirected to the new master.



Adding a new node is basically the process of adding anempty node and then moving some data into it, in case it is a new master, ortelling it to setup as a replica of a known node, in case it is a slave.


We'll show both, starting with the addition of a new masterinstance.


In both cases the first step to perform is adding an empty node.


This is as simple as to start a new node in port 7006 (wealready used from 7000 to 7005 for our existing 6 nodes) with the sameconfiguration used for the other nodes, except for the port number, so what youshould do in order to conform with the setup we used for the previous nodes:


  • Create a new tab in your terminal application.
  • 在你的终端应用中打开一个新的窗口
  • Enter the cluster-test directory.
  • 进入cluster-test目录
  • Create a directory named 7006
  • 创建一个叫7006的目录
  • Create a redis.conf file inside, similar to the one used for the other nodes but using 7006 as port number.
  • 在目录下创建一个redis.conf目录,同其它节点使用的类似,端口改为7006
  • Finally start the server with ../redis-server ./redis.conf
  • 最后使用命令./redis-server ./redis.conf启动服务器。

At this point the server should be running.


Now we can use redis-trib as usually in order to addthe node to the existing cluster.


./redis-trib.rb add-node

As you can see I used the add-node command specifying theaddress of the new node as first argument, and the address of a random existingnode in the cluster as second argument.


In practical terms redis-trib here did very little to helpus, it just sent a CLUSTER MEETmessage to the node, something that is also possible to accomplish manually.However redis-trib also checks the state of the cluster before to operate, soit is a good idea to perform cluster operations always via redis-trib even whenyou know how the internals work.


Now we can connect to the new node to see if it reallyjoined the cluster:


redis> cluster nodes
3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 master - 0 1385543178575 0 connected 5960-10921
3fc783611028b1707fd65345e763befb36454d73 slave 3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 0 1385543179583 0 connected
f093c80dde814da99c5cf72a7dd01590792b783b :0 myself,master - 0 0 0 connected
2938205e12de373867bf38f1ca29d31d0ddb3e46 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385543178072 3 connected
a211e242fc6b22a9427fed61285e85892fa04e08 slave 97a3a64667477371c4479320d683e4c8db5858b1 0 1385543178575 0 connected
97a3a64667477371c4479320d683e4c8db5858b1 master - 0 1385543179080 0 connected 0-5959 10922-11422
3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e master - 0 1385543177568 3 connected 11423-16383

Note that since this node is already connected to thecluster it is already able to redirect client queries correctly and isgenerally speaking part of the cluster. However it has two peculiaritiescompared to the other masters:


  • It holds no data as it has no assigned hash slots.
  • 因为它没有被分配哈希槽,没有持有任何数据。
  • Because it is a master without assigned slots, it does not participate in the election process when a slave wants to become a master.
  • 因为这是一个没有被分配哈希槽的主服务器,它不会参与到从服务器提升到主服务器的进程中来。

Now it is possible to assign hash slots to this node usingthe resharding feature of redis-trib. It is basically useless toshow this as we already did in a previous section, there is no difference, itis just a resharding having as a target the empty node.



Adding a new Replica can be performed in two ways. Theobvious one is to use redis-trib again, but with the --slave option, like this:


./redis-trib.rb add-node --slave

Note that the command line here is exactly like the one weused to add a new master, so we are not specifying to which master we want toadd the replica. In this case what happens is that redis-trib will add the newnode as replica of a random master among the masters with less replicas.


However you can specify exactly what master you want totarget with your new replica with the following command line:


./redis-trib.rb add-node --slave --master-id 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e

This way we assign the new replica to a specific master.


A more manual way to add a replica to a specific master isto add the new node as an empty master, and then turn it into a replica usingthe CLUSTER REPLICATEcommand. This also works if the node was added as a slave but you want to moveit as a replica of a different master.

另一个手动增加复制到指定主服务器的方式就是将新节点增加为空的主服务器,然后通过CLUSTER REPLCATE命令转换为一个复制,如果这个节点是以从服务器来增加的也是可以作用的,但是就是想要将一个复制转移到另一个不同的主服务器上。

For example in order to add a replica for the node127.0.0.1:7005 that is currently serving hash slots in the range 11423-16383,that has a Node ID 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e, all I need to dois to connect with the new node (already added as empty master) and send thecommand:

例如为了增加一个127.0.0.1:7005的复制,当前服务的哈希槽范围是11423-16383,Node Id 为3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e,所需要做的就是连接到新的节点(已经作为一个空的主服务器)然后发送以下命令:

redis> cluster replicate 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e

That's it. Now we have a new replica for this set of hashslots, and all the other nodes in the cluster already know (after a few secondsneeded to update their config). We can verify with the following command:


$ redis-cli -p 7000 cluster nodes | grep slave | grep 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e
f093c80dde814da99c5cf72a7dd01590792b783b slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385543617702 3 connected
2938205e12de373867bf38f1ca29d31d0ddb3e46 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385543617198 3 connected

The node 3c3a0c... now has two slaves, running on ports7002 (the existing one) and 7006 (the new one).



To remove a slave node just use the del-nodecommand of redis-trib:


./redis-trib del-node ``

The first argument is just a random node in the cluster,the second argument is the ID of the node you want to remove.


You can remove a master node in the same way as well, however in order to remove amaster node it must be empty. If the master is not empty youneed to reshard data away from it to all the other master nodes before.


An alternative to remove a master node is to perform amanual failover of it over one of its slaves and remove the node after itturned into a slave of the new master. Obviously this does not help when youwant to reduce the actual number of masters in your cluster, in that case, aresharding is needed.



In Redis Cluster it is possible to reconfigure a slave to replicatewith a different master at any time just using the following command:



However there is a special scenario where you want replicasto move from one master to another one automatically, without the help of thesystem administrator. The automatic reconfiguration of replicas is called replicas migrationand is able to improve the reliability of a Redis Cluster.

然而又一个特殊的场景中,你想要将复制从一个主服务器自动的迁移到另一个主服务器,而不需要系统管理员的帮助。这个自动的复制重配置称为replicas migration,这是可以改进Redis集群的可靠性的。

Note: you can read the details of replicas migration in theRedis Cluster Specification,here we'll only provide some information about the general idea and what youshould do in order to benefit from it.

注意:你可以在Redis Cluster Specification中读到关于复制迁移的详细内容,这里我们只提供了这个主意的一些信息和你应该怎么做来获得这个好处。

The reason why you may want to let your cluster replicas tomove from one master to another under certain condition, is that usually theRedis Cluster is as resistant to failures as the number of replicas attached toa given master.


For example a cluster where every master has a singlereplica can't continue operations if the master and its replica fail at thesame time, simply because there is no other instance to have a copy of the hashslots the master was serving. However while netsplits are likely to isolate anumber of nodes at the same time, many other kind of failures, like hardware orsoftware failures local to a single node, are a very notable class of failuresthat are unlikely to happen at the same time, so it is possible that in yourcluster where every master has a slave, the slave is killed at 4am, and themaster is killed at 6am. This still will result in a cluster that can no longeroperate.

比如一个每一个主服务器有一个复制的集群当主服务器和复制同时失败时就无法正常工作了,就仅仅是因为没有其它实例有着那个主服务器服务器哈希槽的拷贝。虽然当网络隔离有可能会同时分隔一些节点,另外一些类型的故障,比如的某个节点的硬件或软件故障就是一种几乎不可能同时发生的常见的故障类型,因此可能是你的从服务器在4am 被关闭,而主服务器在6am倍关闭,这样依然会造成集群无法继续工作。

To improve reliability of the system we have the option toadd additional replicas to every master, but this is expensive. Replicamigration allows to add more slaves to just a few masters. So you have 10masters with 1 slave each, for a total of 20 instances. However you add, forexample, 3 instances more as slaves of some of your masters, so certain masterswill have more than a single slave.


With replicas migration what happens is that if a master isleft without slaves, a replica from a master that has multiple slaves willmigrate to the orphanedmaster. So after your slave goes down at 4am as in the example we made above,another slave will take its place, and when the master will fail as well at5am, there is still a slave that can be elected so that the cluster cancontinue to operate.


So what you should know about replicas migration in short?


  • The cluster will try to migrate a replica from the master that has the greatest number of replicas in a given moment.
  • 集群会尝试迁移那些某时刻最多的从服务器的主服务器的复制。
  • To benefit from replica migration you have just to add a few more replicas to a single master in your cluster, it does not matter what master.
  • 要从复制迁移中获取好处你只要增加一些额外的复制到你集群的某个主服务器上,无所谓是哪个主服务器。
  • There is a configuration parameter that controls the replica migration feature that is called cluster-migration-barrier: you can read more about it in the example redis.conf file provided with Redis Cluster.
  • 有一个控制复制迁移功能的参数:cluster-migration-barrier:你可以从Redis集群的配置示例redis.conf了解更多。


Upgrading slave nodes is easy since you just need to stopthe node and restart it with an updated version of Redis. If there are clientsscaling reads using slave nodes, they should be able to reconnect to adifferent slave if a given one is not available.


Upgrading masters is a bit more complex, and the suggestedprocedure is:


  1. Use CLUSTER FAILOVER to trigger a manual failover of the master to one of its slaves (see the "Manual failover" section of this documentation).
  2. 使用CLUSTER FAILOVER来出发一个手动的故障转移从主服务器到它的从服务器上(见 手动故障转移 章节)
  3. Wait for the master to turn into a slave.
  4. 等待主服务器转变会从服务器。
  5. Finally upgrade the node as you do for slaves.
  6. 按照从服务器的步骤更新节点
  7. If you want the master to be the node you just upgraded, trigger a new manual failover in order to turn back the upgraded node into a master.
  8. 如果希望使用更新的节点来作为主服务器,触发一个新的手动故障转移将更新后的节点转变为主服务器。

Following this procedure you should upgrade one node afterthe other until all the nodes are upgraded.



Users willing to migrate to Redis Cluster may have just asingle master, or may already using a preexisting sharding setup, where keysare split among N nodes, using some in-house algorithm or a sharding algorithmimplemented by their client library or Redis proxy.


In both cases it is possible to migrate to Redis Clustereasily, however what is the most important detail is if multiple-keysoperations are used by the application, and how. There are three differentcases:


  1. Multiple keys operations, or transactions, or Lua scripts involving multiple keys, are not used. Keys are accessed independently (even if accessed via transactions or Lua scripts grouping multiple commands, about the same key, together).
  2. 多key操作,或事务,或涉及到多个key的Lua脚本没有被使用到,key是单独的获取的(即使是通过事务或Lua脚本来组织多个命令,但key是相同的)
  3. Multiple keys operations, transactions, or Lua scripts involving multiple keys are used but only with keys having the same hash tag, which means that the keys used together all have a {...} sub-string that happens to be identical. For example the following multiple keys operation is defined in the context of the same hash tag: SUNION {user:1000}.foo {user:1000}.bar.
  4. 多个key操作,事务或设计多个key的Lua脚本使用,但是key用同样的hash标签,这意味使用的key都有一个{…}的子字符串来确定。比如下面的多个key操作都处于同样的哈希tag上下文:SUNION{user:1000}.foo {user:1000}.bar
  5. Multiple keys operations, transactions, or Lua scripts involving multiple keys are used with key names not having an explicit, or the same, hash tag.
  6. 多个key操作,事务,或涉及多个key的Lua脚本中的key名字没有显式的,或者相同的哈希标签。

The third case is not handled by Redis Cluster: theapplication requires to be modified in order to don't use multi keys operationsor only use them in the context of the same hash tag.


Case 1 and 2 are covered, so we'll focus on those two cases,that are handled in the same way, so no distinction will be made in thedocumentation.


Assuming you have your preexisting data set split into Nmasters, where N=1 if you have no preexisting sharding, the following steps areneeded in order to migrate your data set to Redis Cluster:


  1. Stop your clients. No automatic live-migration to Redis Cluster is currently possible. You may be able to do it orchestrating a live migration in the context of your application / environment.
  2. 停止客户端。现在自动的活跃状态下迁移到Redis集群是不可能的。你可能在你的应用环境上下文中进行这种活跃状态下的迁移。
  3. Generate an append only file for all of your N masters using the BGREWRITEAOF command, and waiting for the AOF file to be completely generated.
  4. 使用BGREWRITEAOF命令来为你的每一个主服务器生成一个只允许增加的文件,等到AOF文件生成完成。
  5. Save your AOF files from aof-1 to aof-N somewhere. At this point you can stop your old instances if you wish (this is useful since in non-virtualized deployments you often need to reuse the same computers).
  6. 从aof-1到aof-N的AOF文件到某个地方,目前你可以选择停止的你实例(当在非虚拟部署的环境下通常有用,因为你常常要重复使用同样的计算机)
  7. Create a Redis Cluster composed of N masters and zero slaves. You'll add slaves later. Make sure all your nodes are using the append only file for persistence.
  8. 创建一个包含N个主服务器和0个从服务器的Redis集群。后面再增加从服务器,确保你所有的节点使用只允许增加的文件来作为存储。
  9. Stop all the cluster nodes, substitute their append only file with your pre-existing append only files, aof-1 for the first node, aof-2 for the second node, up to aof-N.
  10. 停止集群的所有节点,用之前保存的AOF文件替换,aof-1替换第一个节点,aof-2替换第二个节点,直到aof-n
  11. Restart your Redis Cluster nodes with the new AOF files. They'll complain that there are keys that should not be there according to their configuration.
  12. 使用新的AOF文件重启你的Redis集群。他们可能会抱怨根据他们的配置这里不应该存在key。
  13. Use redis-trib fix command in order to fix the cluster so that keys will be migrated according to the hash slots each node is authoritative or not.
  14. 使用redis-trib fix命令来固定集群,这样keys会被迁移到对应的哈希槽和节点上。
  15. Use redis-trib check at the end to make sure your cluster is ok.
  16. 使用redis-trib check命令最后来确认你的集群是否ok
  17. Restart your clients modified to use a Redis Cluster aware client library.
  18. 重启你修改后的客户端来使用支持Redis集群的客户端库。

There is an alternative way to import data from externalinstances to a Redis Cluster, which is to use the redis-trib importcommand.

这里有一个从现有实例导入数据到Redis集群的替换方法,就是使用redis-tribimport 命令。

The command moves all the keys of a running instance(deleting the keys from the source instance) to the specified pre-existingRedis Cluster. However note that if you use a Redis 2.8 instance as sourceinstance the operation may be slow since 2.8 does not implement migrateconnection caching, so you may want to restart your source instance with aRedis 3.x version before to perform such operation.

