Redis --- Redis Cluster

Redis --- Redis Cluster

以下内容翻译自:http://redis.io/topics/cluster-tutorial

Redis cluster tutorial

This document is a gentle introduction to Redis Cluster,that does not use complex to understand distributed systems concepts. Itprovides instructions about how to setup a cluster, test, and operate it,without going into the details that are covered in the Redis Cluster specification butjust describing how the system behaves from the point of view of the user.

这篇文档是对Redis Cluster一个优雅的介绍,不会使用负责的描述来理解分布式系统的概念。这里提供了关于如何设置集群,测试和操作方面的指导,而不会涉及到太多的细节,那些内容会在Redis Cluster Specification中将解,这里只是从用户的角度来描述系统是如何工作的。

However this tutorial tries to provide information aboutthe availability and consistency characteristics of Redis Cluster from thepoint of view of the final user, stated in a simple to understand way.

然而这篇指导也尝试从最终用户的角度去提供关于Redis集群可用性和一致性特点方面的信息,当然以一种易于理解的方式。

Note this tutorial requires Redis version 3.0 or higher.

本指导要求Redis的版本是3.0 或更高

If you plan to run a serious Redis Cluster deployment, themore formal specification is a suggested reading, even if not strictlyrequired. However it is a good idea to start from this document, play withRedis Cluster some time, and only later read the specification.

如果你打算去运行一个重要的Redis集群部署,建议阅读更多正式的文档,及时不是严格要求的。当然从这篇文档开始是一个很好的主意,然后再运行Redis 集群一段时间,最后再阅读规范。

Redis Cluster 101

Redis Cluster provides a way to run a Redis installationwhere data is automaticallysharded across multiple Redis nodes.

Redis集群提供了一种Redis的安装方式:数据会自动的在多个Redis节点之间共享。

Redis Cluster also provides some degree of availability during partitions,that is in practical terms the ability to continue the operations when somenodes fail or are not able to communicate. However the cluster stops to operatein the event of larger failures (for example when the majority of masters areunavailable).

Redis集群同样提供了在网络隔离的情况下一定程度的可用性,实际的情况就是可以在某些节点失败或无法通讯的情况下继续操作的能力。然而如果出现了大量的失败(超过半数的主服务器不可用)的情况下会拒绝接受操作。

So in practical terms, what you get with Redis Cluster?

因此在实际环境中,你可以使用Redis集群做到?

  • The ability to automatically split your dataset among multiple nodes.
  • 可以自动将数据集分离到多个节点中。
  • The ability to continue operations when a subset of the nodes are experiencing failures or are unable to communicate with the rest of the cluster.
  • 当部分节点失败或无法连接到集群中其它节点的时候可以继续操作。

Redis Cluster TCP ports

Every Redis Cluster node requires two TCP connections open.The normal Redis TCP port used to serve clients, for example 6379, plus theport obtained by adding 10000 to the data port, so 16379 in the example.

每一个Redis 集群节点需要两个TCP连接建立,通常的Redis TCP端口用来供客户端服务,比如6739,额外的端口是前一个端口号+10000,如16379.

This second high port is used for the Cluster bus, that is a node-to-nodecommunication channel using a binary protocol. The Cluster bus is used by nodesfor failure detection, configuration update, failover authorization and soforth. Clients should never try to communicate with the cluster bus port, butalways with the normal Redis command port, however make sure you open bothports in your firewall, otherwise Redis cluster nodes will be not able tocommunicate.

第二个高的端口是用于集群总线的,这是一个节点对节点的通信通道,使用二进制协议。集群总线被节点使用来进行失败检测,配置更新,故障转移授权等。客户端应该永远不要尝试连接集群总线端口,而是用平常的Redis命令端口,然而不需要保证防火墙都打开了这两个端口,否则Redis集群节点就不能相互通信了。

The command port and cluster bus port offset is fixed andis always 10000.

命令端口和集群总线端口之间的偏移是固定的:10000

Note that for a Redis Cluster to work properly you need,for each node:

注意为了使Rediis集群正常工作,你需要每一个节点都:

  1. The normal client communication port (usually 6379) used to communicate with clients to be open to all the clients that need to reach the cluster, plus all the other cluster nodes (that use the client port for keys migrations).
  2. 通常的客户端通讯端口(通常是6379)必须是想所有客户端和集群中的所有端点开放的,客户端用来访问集群,集群节点用来进行keys迁移
  3. The cluster bus port (the client port + 10000) must be reachable from all the other cluster nodes.
  4. 集群总线端口(客户端端口+10000)必须是对集群中的其它节点开放的。

If you don't open both TCP ports, your cluster will notwork as expected.

如果你没有全部打开两个端口,你的集群会无法正常工作。

The cluster bus uses a different, binary protocol, for nodeto node data exchange, which is more suited to exchange information betweennodes using little bandwidth and processing time.

集群总线使用一个不同的二进制协议来在节点间进行数据交换,使用更少的带宽和处理时间对于节点之间的信息交换更加适用。

Redis Cluster data sharding

Redis Cluster does not use consistent hashing, but adifferent form of sharding where every key is conceptually part of what we callan hash slot.

Redis集群使用一致性哈希,而是一种不同的分隔方式,每一个key从概念上说是一个哈希槽的一部分。

There are 16384 hash slots in Redis Cluster, and to computewhat is the hash slot of a given key, we simply take the CRC16 of the keymodulo 16384.

Redis集群有16384个哈希槽,我们简单使用KEY模16384的CRC16来计算给定key的哈希槽。

Every node in a Redis Cluster is responsible for a subsetof the hash slots, so for example you may have a cluster with 3 nodes, where:

Redis集群中的每一个节点负责哈希槽的一部分,比如说你可能有一个3个节点的集群:

  • Node A contains hash slots from 0 to 5500.
  • 节点A包含从0到5500哈希槽
  • Node B contains hash slots from 5501 to 11000.
  • 节点B包含从5501到11000的哈希槽
  • Node C contains hash slots from 11001 to 16384.
  • 节点C包含从11001到16384的哈希槽

This allows to add and remove nodes in the cluster easily.For example if I want to add a new node D, I need to move some hash slot fromnodes A, B, C to D. Similarly if I want to remove node A from the cluster I canjust move the hash slots served by A to B and C. When the node A will be emptyI can remove it from the cluster completely.

这样使得在集群中增加和移除节点变得容易。比如,想要增加一个节点D,我需要从节点A,B,C中移除一部分哈希槽到D,类似的想要移除节点A,只要将节点A中的哈希槽移到B和C中,当A空是就可以从集群中完全移除A了。

Because moving hash slots from a node to another does notrequire to stop operations, adding and removing nodes, or changing thepercentage of hash slots hold by nodes, does not require any downtime.

因为在节点间移动哈希槽不需要停止操作,增加和移除节点,或者是改变节点持有的哈希槽的百分比都不需要停机时间

Redis Cluster supports multiple key operations as long asall the keys involved into a single command execution (or whole transaction, orLua script execution) all belong to the same hash slot. The user can forcemultiple keys to be part of the same hash slot by using a concept called hash tags.

Redis集群支持多个key操作,只需要一个命令中(或者是事务,LUA脚本中)的key都属于同一个哈希槽。用户可以使用 hash tags的概念来迫使多个key属于同一个哈希槽。

Hash tags are documented in the Redis Clusterspecification, but the gist is that if there is a substring between {} bracketsin a key, only what is inside the string is hashed, so for example this{foo}keyand another{foo}key are guaranteed to be in the same hash slot,and can be used together in a command with multiple keys as arguments.

哈希标签会在Redis Cluster specification中说明,要点就是如果key中有一对大括号{}包含的子字符串,那只有这一部分会用来进行哈希计算。因此比如 this{foo}key 和 another{foo}key 会保证使用同一个哈希槽,也就可以在一个命令一起使用多个key来作为参数。

Redis Cluster master-slave model

In order to remain available when a subset of master nodesare failing or are not able to communicate with the majority of nodes, RedisCluster uses a master-slave model where every hash slot has from 1 (the masteritself) to N replicas (N-1 additional slaves nodes).

为了保证当主服务器的一部分失败会无法和集群的大部分节点通讯后能保持可用性,Redis集群使用主-从模式,就是每一个哈希槽都有1(主服务器自己)到N分复制(N-1个额外的从节点)

In our example cluster with nodes A, B, C, if node B failsthe cluster is not able to continue, since we no longer have a way to servehash slots in the range 5501-11000.

在我们的有节点A,B,C的集群例子中,如果B失败了那集群就无法继续使用了,因为我们无法在处理在范围5501-11000之间的哈希槽的数据了。

However when the cluster is created (or at a latter time)we add a slave node to every master, so that the final cluster is composed ofA, B, C that are masters nodes, and A1, B1, C1 that are slaves nodes, thesystem is able to continue if node B fails.

然而当集群创建的时候(或后续的时间),我们对每一个主服务器增加一个从服务器,因此集群最终会包含A,B,C的主节点,和A1,B1,C1的从节点,这样当节点B失败后系统还可以继续使用。

Node B1 replicates B, and B fails, the cluster will promotenode B1 as the new master and will continue to operate correctly.

节点B1复制B,如果B失败,集群会提升节点B1最为新的主服务器然后继续正确的执行。

However note that if nodes B and B1 fail at the same timeRedis Cluster is not able to continue to operate.

然而注意到如果B和B1同时失败了,那Redis集群就无法继续执行了。

Redis Cluster consistency guarantees

Redis Cluster is not able to guarantee strong consistency.In practical terms this means that under certain conditions it is possible thatRedis Cluster will lose writes that were acknowledged by the system to theclient.

Redis集群无法保证强一致性,在实际情况中意味着在特定的条件下 Redis集群可能会丢失一些系统已经向客户端承认的写操作。

The first reason why Redis Cluster can lose writes is becauseit uses asynchronous replication. This means that during writes the followinghappens:

Redis集群会丢失写操作的第一个原因是Redis使用异步复制,这意味着下面过程中出现的写:

  • Your client writes to the master B.
  • 客户端写向B
  • The master B replies OK to your client.
  • B返回OK到客户端
  • The master B propagates the write to its slaves B1, B2 and B3.
  • B将写传播到B1,B2,B3

As you can see B does not wait for an acknowledge from B1,B2, B3 before replying to the client, since this would be a prohibitive latencypenalty for Redis, so if your client writes something, B acknowledges thewrite, but crashes before being able to send the write to its slaves, one ofthe slaves (that did not received the write) can be promoted to master, losingthe write forever.

你可以看到B并不会等待来自B1,B2,B3的响应后才对客户端响应,因为这对Redis来说是禁止的延迟。所以如果你的客户端写了一些东西,B也响应了这些写,但是在将这些写发送到从服务器之前宕机的话,其中的一个从服务器(还没有接收到写操作)会被提升为主服务器,之前的写也就丢失了。

This is very similar to what happens with most databases that areconfigured to flush data to disk every second, so it is a scenario you arealready able to reason about because of past experiences with traditionaldatabase systems not involving distributed systems. Similarly you can improveconsistency by forcing the database to flush data on disk before replying tothe client, but this usually results into prohibitively low performance. Thatwould be the equivalent of synchronous replication in the case of RedisCluster.

这和对于配置成每秒刷新数据到硬盘的数据库的情况是类似的,因此这个场景是已经可以意识到的,而过去传统的数据库系统并没有涉及到分布式系统。类似的你也可以通过迫使数据库在响应客户端之前将数据刷新到硬盘中来提高一致性,但是这通常会造成无法接受的低性能的结果。这和如果Redis集群采用同步复制是一样的。

Basically there is a trade-off to take between performanceand consistency.

基本上这是一个在性能和一致性之间的权衡。

Redis Cluster has support for synchronous writes whenabsolutely needed, implemented via the WAITcommand, this makes losing writes a lot less likely, however note that RedisCluster does not implement strong consistency even when synchronous replicationis used: it is always possible under more complex failure scenarios that aslave that was not able to receive the write is elected as master.

Redis集群挡在绝对需要的时候也支持同步写,通过WAIT命令实现,这样使得写丢失基本不能发生了,然而注意到Redis集群即使是采用当同步复制也没有实现强一致性:有可能会出现更复杂的失败场景,比如一个从服务器被选为主服务器后无法接受写请求。

There is another notable scenario where Redis Cluster willlose writes, that happens during a network partition where a client is isolatedwith a minority of instances including at least a master.

这里有另一个著名的场景Redis集群会丢失写,这发生在当客户端和少部分的Redis实例,这里面至少包含一个主服务器,被隔断到一个网络区间时。

Take as an example our 6 nodes cluster composed of A, B, C,A1, B1, C1, with 3 masters and 3 slaves. There is also a client, that we willcall Z1.

看下有6个节点:A,B,C,A1,B1,C1的集群例子,3个主服务器和3个从服务器。这里还有一个客户端,我们称为Z1.

After a partition occurs, it is possible that in one sideof the partition we have A, C, A1, B1, C1, and in the other side we have B andZ1.

当隔断发生时,有可能是在一个区间中有A,C,A1,B1,C1而另一个区间中有B和Z1

Z1 is still able to write to B, that will accept itswrites. If the partition heals in a very short time, the cluster will continuenormally. However if the partition lasts enough time for B1 to be promoted tomaster in the majority side of the partition, the writes that Z1 is sending toB will be lost.

Z1依然可以向B写,B也会接受所有的写,当隔断在很短时间和恢复的话,集群会保持正常,然而如果隔断的时间足以使得B1在大部分的这个区间被提升为主服务器,那Z1写到B的数据将会被丢失。

Note that there is a maximum window to the amount of writesZ1 will be able to send to B: if enough time has elapsed for the majority sideof the partition to elect a slave as master, every master node in the minorityside stops accepting writes.

注意到這了有一个最大窗口来保证Z1依然可以写想B:如果过去了时间超过了大部分区间提升一个从服务器为主服务器的时间,在小部分区间的主服务器节点会停止接收写。

This amount of time is a very important configurationdirective of Redis Cluster, and is called the node timeout.

这个时间的总量对于Redis集群的配置来说是非常重要的指令,称为nodetimeout。

After node timeout has elapsed, a master node is consideredto be failing, and can be replaced by one of its replicas. Similarly after nodetimeout has elapsed without a master node to be able to sense the majority ofthe other master nodes, it enters an error state and stops accepting writes.

当node timeout过去后,主节点会被认为是失败,可以被它的复制中的某一个替代。类似的当node timeout时间过去后,如果主服务器节点无法感知到集群中的大部分节点,它会进入错误状态,拒绝接受写请求。

Redis Cluster 配置参数

We are about to create an example cluster deployment.Before to continue let's introduce the configuration parameters that RedisCluster introduces in the redis.conf file. Some will be obvious,others will be more clear as you continue reading.

我们打算创建一个集群部署的例子,在进行之前介绍一些在redis.conf文件中关于Redis集群的配置。某一些可能是显然的,其他的就需要继续阅读来弄清楚。

  • cluster-enabled : 如果是yes则Redis节点支持集群,否则节点最为独立模式。
  • If yes enables Redis Cluster support in a specific Redis instance. Otherwise the instance starts as a stand alone instance as usually.
  • cluster-config-file : 注意到不要管这个选项名字,这并不是一个用户可以编辑的配置文件,而是Redis集群节点当发生变化时自动固话集群的配置(通常来说是状态)的文件,这样就可以在启动的时候重新读取。这个文件列举了集群中的其它节点和他们的状态,固话的值等。通常来说这个文件会在接收到某些信息时重写然后刷新到硬盘。
  • Note that despite the name of this option, this is not an user editable configuration file, but the file where a Redis Cluster node automatically persists the cluster configuration (the state, basically) every time there is a change, in order to be able to re-read it at startup. The file lists things like the other nodes in the cluster, their state, persistent variables, and so forth. Often this file is rewritten and flushed on disk as a result of some message reception.
  • cluster-node-timeout : Redis集群中一个节点不可用的最长时间,如果没有配置则认为是失败的。如果一个主服务器节点在失联超过指定的时间,就会进行故障转移到它的从服务器上。这个参数也控制着其它Redis中其它重要的事情。显著的是,每一个节点在无法连接到集群中的大部分节点特定时间后就会拒绝接受写
  • The maximum amount of time a Redis Cluster node can be unavailable, without it being considered as failing. If a master node is not reachable for more than the specified amount of time, it will be failed over by its slaves. This parameter controls other important things in Redis Cluster. Notably, every node that can't reach the majority of master nodes for the specified amount of time, will stop accepting queries.
  • cluster-slave-validity-factor :如果设置为0,一个从服务器会总是尝试故障转移为主服务器,而不管主服务器和该从服务器失联的时间。如果该只是整数,有node timeout值乘以这里指定的值计算出来的最大失联时间,如果该节点是从服务器,并且和主服务器失联的时间超过了计算出来的最大失联时间,则该节点永远不会被提升为主服务器。比如节点的node timeout设置为5秒,validity factor设置为10,而从服务器和主服务器的失联时间超过50秒后就不会尝试和其主服务器进行故障转移。任何和0不一样的值都可能会造成Redis集群在主服务器失败,而没有从服务器可以进行故障转移时变得不可用。这种情况下只有在原先的主服务器重新接入到集群中才可以恢复正常。
  •  If set to zero, a slave will always try to failover a master, regardless of the amount of time the link between the master and the slave remained disconnected. If the value is positive, a maximum disconnection time is calculated as the node timeout value multiplied by the factor provided with this option, and if the node is a slave, it will not try to start a failover if the master link was disconnected for more than the specified amount of time. For example if the node timeout is set to 5 seconds, and the validity factor is set to 10, a slave disconnected from the master for more than 50 seconds will not try to failover its master. Note that any value different than zero may result in Redis Cluster to be not available after a master failure if there is no slave able to failover it. In that case the cluster will return back available only when the original master rejoins the cluster.
  • cluster-migration-barrier :
  • Minimum number of slaves a master will remain connected with, for another slave to migrate to a master which is no longer covered by any slave. See the appropriate section about replica migration in this tutorial for more information.
  • cluster-require-full-coverage :如果设置为yes,也是默认的值,集群会在如果某些键的空间没有被集群中的任何一个节点覆盖时拒绝写,古国设置为no,集群会集训处理这些请求,即使只能处理其中的一部分key.
  •  If this is set to yes, as it is by default, the cluster stops accepting writes if some percentage of the key space is not covered by any node. If the option is set to no, the cluster will still serve queries even if only requests about a subset of keys can be processed.

创建和使用一个Redis集群。

Note: to deploy a Redis Cluster manually is very important to learncertain operation aspects of it. However if you want to get a cluster up andrunning ASAP skip this section and the next one and go directly to Creating a Redis Cluster usingthe create-cluster script.

注意:手动去部署一个Redis集群对于学习集群的操作是非常重要的。然而如果你需要尽快的有一个可以运行的集群,那可以跳过这章和下一章,直接到使用create-cluster脚本创建Redis集群这章。

To create a cluster, the first thing we need is to have afew empty Redis instances running in cluster mode. This basically meansthat clusters are not created using normal Redis instances, but a special modeneeds to be configured so that the Redis instance will enable the Clusterspecific features and commands.

要创建一个集群,第一件事情就是需要一些运行在集群模式下的空redis实例。实际id意思就是集群并不适用通常的Redis实例来创建,而是需要配置一个特定的模式来使得Redis实例使能集群相关的功能和命令。

The following is a minimal Redis cluster configurationfile:

下面是一个最小的Redis集群配置文件

port 7000
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes

As you can see what enables the cluster mode is simply the cluster-enableddirective. Every instance also contains the path of a file where theconfiguration for this node is stored, that by default is nodes.conf.This file is never touched by humans, it is simply generated at startup by theRedis Cluster instances, and updated every time it is needed.

你可以看到打开集群模式就是简单的cluster-enabled指令,每一个实例同样包括一个节点配置保存的文件路径,默认是nodes.conf。这个文件永远不会被认为获取,只是在Redis集群实例启动的时候生成,然后在需要的时候进行更新。

Note that the minimal cluster that works as expectedrequires to contain at least three master nodes. For your first tests it isstrongly suggested to start a six nodes cluster with three masters and threeslaves.

注意:正常工作的最小集群需要至少3个集群节点,对于你的第一个测试来说,强烈建议启动6个节点的集群,其中3个主节点,3个从节点。

To do so, enter a new directory, and create the followingdirectories named after the port number of the instance we'll run inside anygiven directory.

为了这么做,进入一个新的目录,然后创建一下的子目录,名称为我们想在里面运行的实例的端口,比如:

Something like:

mkdir cluster-test
cd cluster-test
mkdir 7000 7001 7002 7003 7004 7005

Create a redis.conf file inside each of thedirectories, from 7000 to 7005. As a template for your configuration file justuse the small example above, but make sure to replace the port number 7000with the right port number according to the directory name.

在每一个目录中创建redis.conf文件,从70000到7005。最为你配置文件的模板只需要使用上面例子中的小文件,但是要保证将端口号7000替换为对应目录的端口号。

Now copy your redis-server executable, compiled from the latest sourcesin the unstable branch at GitHub, into the cluster-testdirectory, and finally open 6 terminal tabs in your favorite terminalapplication.

现在拷贝你的redis-server执行文件,(从GitHub的罪行的非稳定分支编译获取),到cluster-test目录,最后在你喜欢的终端应用中打开6个终端窗口。

Start every instance like that, one every tab:

在每一个窗口中像下面这样启动每一个实例:

cd 7000
../redis-server ./redis.conf

As you can see from the logs of every instance, since no nodes.conffile existed, every node assigns itself a new ID.

你可以从每一个实例的日记中看到,因为没有nodes.conf文件存在,每一个节点会给自己分配一个新的ID.

[82462] 26 Nov 11:56:55.329 * No cluster configuration found, I'm 97a3a64667477371c4479320d683e4c8db5858b1

This ID will be used forever by this specific instance inorder for the instance to have a unique name in the context of the cluster.Every node remembers every other node using this IDs, and not by IP or port. IPaddresses and ports may change, but the unique node identifier will neverchange for all the life of the node. We call this identifier simply Node ID.

这个ID会被这个特定实例永远使用来保证该实例在集群的上下文中有一个唯一的名字。每一个节点使用ID来记录其它的节点,而不是IP或端口,因为IP地址和端口是可能改变的,但是唯一的节点标识在节点的整个生命周期都是不变的,我们简单的称呼这个标识为Node ID.

创建一个集群

Now that we have a number of instances running, we need tocreate our cluster by writing some meaningful configuration to the nodes.

现在我们有了一些实例在运行,我们需要往节点中写入一些有意义的配置来创建我们的集群。

This is very easy to accomplish as we are helped by theRedis Cluster command line utility called redis-trib, that is aRuby program executing special commands in the instances in order to create newclusters, check or reshard an existing cluster, and so forth.

通过Redis集群命令行帮助工具redis-trib非常容易完成这个步骤,这是一个Ruby程序,对实例执行一些特殊的命令来创建新的集群,检查或重新配置已经存在的集群等等。

The redis-trib utility is in the srcdirectory of the Redis source code distribution. To create your cluster simplytype:

Redis-trib工具在Redis源码发布的src目录下,为了创建你的集群只要简单的输入:

./redis-trib.rb create --replicas 1 127.0.0.1:7000 127.0.0.1:7001 \
127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005

The command used here is create, since we want to create anew cluster. The option --replicas 1 means that we want a slavefor every master created. The other arguments are the list of addresses of theinstances I want to use to create the new cluster.

这里使用的命令是create,因为我们想要创建一个新的集群,选项—replicas1 标识我们想要对每一个主服务器有一个从服务器,其它的参数就是我们使用来创建集群的实例的地址列表。

Obviously the only setup with our requirements is to createa cluster with 3 masters and 3 slaves.

很显然满足我们需求的唯一设置就是创建一个集群有3个主服务器和3个从服务器。

Redis-trib will propose you a configuration. Accept typing yes. The clusterwill be configured and joined, that means, instances will be bootstrapped into talkingwith each other. Finally if everything went ok you'll see a message like that:

Redis-trib会向你提议一个配置,接收则输入yes.集群会被配置和接入,意味着,实例会被引导进行相互的会话,最终如果一切正常,你会看到一个类似的信息:

[OK] All 16384 slots covered

This means that there is at least a master instance servingeach of the 16384 slots available.

这指的是可用的16384个槽中的每一个都至少有一个主服务器来处理。

使用create-cluster脚本创建Redis集群

If you don't want to create a Redis Cluster by configuringand executing individual instances manually as explained above, there is a muchsimpler system (but you'll not learn the same amount of operational details).

如果你不想通过像上面所说的手动配置和启动每一个实例,有一个更加简单的系统(但是你也学习不到同等数量的操作细节了)

Just check utils/create-cluster directory inthe Redis distribution. There is a script called create-clusterinside (same name as the directory it is contained into), it's a simple bashscript. In order to start a 6 nodes cluster with 3 masters and 3 slaves justtype the following commands:

只需要检查Redis发布版本中的utils/create-cluster目录,有一个称为create-cluster的脚本(名字和所在目录是一样的),这是一个简单的bash脚本,用来启动一个有3个主节点和3个从节点的集群。只需要简单输入:

  1. create-cluster start
  2. create-cluster create

Reply to yes in step 2 when the redis-tributility wants you to accept the cluster layout.

在第二个步骤当redis-trib工具想要你接受这个集群层次时输入yes.

You can now interact with the cluster, the first node willstart at port 30001 by default. When you are done, stop the cluster with:

现在就可以和集群进行交互了,默认第一个节点会在30001端口启动。当你完成后,可以向下面这样停止集群:

  1. create-cluster stop.

Please read the README inside this directoryfor more information on how to run the script.

请阅读目录下的README来获取运行脚本的更多信息。

运行集群

At this stage one of the problems with Redis Cluster is thelack of client libraries implementations.

当前的Redis集群的问题是缺少客户端的实现。

I'm aware of the following implementations:

我关注下面的这些实现:

  • redis-rb-cluster 是由我自己(@antirez)写的,用Ruby实现来让其它语言使用,这是一个原始的redis-rb的简单包装,实现了和集群有效交互的最小语义。
  • is a Ruby implementation written by me (@antirez) as a reference for other languages. It is a simple wrapper around the original redis-rb, implementing the minimal semantics to talk with the cluster efficiently.
  • redis-py-cluster redis-rb-cluster的python入口,支持大部分的redis-py功能,目前正在开发中。
  • A port of redis-rb-cluster to Python. Supports majority of redis-py functionality. Is in active development.
  • The popular Predis has support for Redis Cluster, the support was recently updated and is in active development.
  • 最流行的Predis已经支持了Redis集群,是最近更新并已经在有效部署了。
  • The most used Java client, Jedis recently added support for Redis Cluster, see the Jedis Cluster section in the project README.
  • 使用最多的Java客户端,jedis最近增减了对Redis 集群的支持,可以查看其工程的README中关于Jedis 集群的章节。
  • StackExchange.Redis 提供了对C#的支持(对于大部分.net的语言也可以使用,如VB,F#等)
  • offers support for C# (and should work fine with most .NET languages; VB, F#, etc)
  • thunk-redis offers support for Node.js and io.js, it is a thunk/promise-based redis client with pipelining and cluster.
  • 在Github中的Redis仓库中的非稳定版本的redis-cli 工具实现了一些基本的集群支持,使用-c开关来打开。
  •  utility in the unstable branch of the Redis repository at GitHub implements a very basic cluster support when started with the -c switch.

An easy way to test Redis Cluster is either to try any ofthe above clients or simply the redis-cli command line utility.The following is an example of interaction using the latter:

一个简单的Redis集群测试就是使用上面的任何一个客户端或简单的使用redis-cli命令行工具。下面是一些后面使用到的交互例子:

$ redis-cli -c -p 7000
redis 127.0.0.1:7000> set foo bar
-> Redirected to slot [12182] located at 127.0.0.1:7002
OK
redis 127.0.0.1:7002> set hello world
-> Redirected to slot [866] located at 127.0.0.1:7000
OK
redis 127.0.0.1:7000> get foo
-> Redirected to slot [12182] located at 127.0.0.1:7002
"bar"
redis 127.0.0.1:7000> get hello
-> Redirected to slot [866] located at 127.0.0.1:7000
"world"

Note: if you created the cluster using the script your nodes may listento different ports, starting from 30001 by default.

如果你使用脚本来创建集群,你的节点可能监听不同的端口,默认是从30001开始的。

The redis-cli cluster support is very basic so it alwaysuses the fact that Redis Cluster nodes are able to redirect a client to theright node. A serious client is able to do better than that, and cache the map betweenhash slots and nodes addresses, to directly use the right connection to theright node. The map is refreshed only when something changed in the clusterconfiguration, for example after a failover or after the system administratorchanged the cluster layout by adding or removing nodes.

Redis-cli的集群支持是非常基础的,它总是基于Redis集群节点可以重定向客户端到正确的节点这个事实的。一个真正的客户端在这方面会做的更好,会缓存槽和节点地址的映射表,然后使用真正节点的正确连接。这个映射表只有在集群配置发生变化时才进行更新,比如在故障转移后或在系统管理员通过增加和移除节点改变了集群的层次后。

写一个redis-rb-cluster的例子应用

Before going forward showing how to operate the RedisCluster, doing things like a failover, or a resharding, we need to create someexample application or at least to be able to understand the semantics of asimple Redis Cluster client interaction.

在展示如何操作Redis集群之前,类似:故障转移,重构,我们需要创建一些例子应用或则至少可以理解一个简单的Redis集群客户端交互的语义。

In this way we can run an example and at the same time tryto make nodes failing, or start a resharding, to see how Redis Cluster behavesunder real world conditions. It is not very helpful to see what happens whilenobody is writing to the cluster.

这样我们就可以运行这个例子,以此同时让节点失败或者开始一个重构来看下Redis集群在真是的环境中是如何表现的。如果没有人向集群正在写的话对查看集群发生了什么是没有多大用处的。

This section explains some basic usage of redis-rb-clustershowing two examples. The first is the following, and is the example.rbfile inside the redis-rb-cluster distribution:

这章解释了redis-rb-cluster的一些基本用法,看下面两个例子,首先是第一个,在redis-rb-cluster发布中的example.rb文件

     1  require './cluster'
     2
     3  startup_nodes = [
     4      {:host => "127.0.0.1", :port => 7000},
     5      {:host => "127.0.0.1", :port => 7001}
     6  ]
     7  rc = RedisCluster.new(startup_nodes,32,:timeout => 0.1)
     8
     9  last = false
    10
    11  while not last
    12      begin
    13          last = rc.get("__last__")
    14          last = 0 if !last
    15      rescue => e
    16          puts "error #{e.to_s}"
    17          sleep 1
    18      end
    19  end
    20
    21  ((last.to_i+1)..1000000000).each{|x|
    22      begin
    23          rc.set("foo#{x}",x)
    24          puts rc.get("foo#{x}")
    25          rc.set("__last__",x)
    26      rescue => e
    27          puts "error #{e.to_s}"
    28      end
    29      sleep 0.1
    30  }

The application does a very simple thing, it sets keys inthe form foo to number, one after theother. So if you run the program the result is the following stream ofcommands:

应用做到是非常简单的事情,以foo的格式来设置key,一个接一个。因此如果你运行这个程序,下面就是结果的命令流:

  • SET foo0 0
  • SET foo1 1
  • SET foo2 2
  • 以此类推...

The program looks more complex than it should usually as itis designed to show errors on the screen instead of exiting with an exception,so every operation performed with the cluster is wrapped by begin rescueblocks.

这个程序比看起来复杂在于它在屏幕上展示了错误而不是在异常时退出。因此每一个对集群的操作都包装在beginrescure块中。

The line 7 is the first interesting line in the program. Itcreates the Redis Cluster object, using as argument a list of startup nodes, themaximum number of connections this object is allowed to take against differentnodes, and finally the timeout after a given operation is considered to befailed.

第7行是程序中应该感兴趣的第一行。创建了一个Redis集群对象,使用了startup nodes,对象允许的连接到不同节点的连接的最大数,最后是操作认为失败的超时作为参数。

The startup nodes don't need to be all the nodes of thecluster. The important thing is that at least one node is reachable. Also notethat redis-rb-cluster updates this list of startup nodes as soon as it is ableto connect with the first node. You should expect such a behavior with anyother serious client.

启动节点不需要指定集群的全部节点,最终要的事情是至少一个节点是可以连接的。另外也注意到redis-rb-cluster会在其可以连接到第一个节点可连接的节点删ghoulish会更新这个startup nodes列表。你在其它严肃的客户端中也会看到类似的行为。

Now that we have the Redis Cluster object instance storedin the rcvariable we are ready to use the object like if it was a normal Redis objectinstance.

现在我们有了Redis 集群对象实例,保存在rc变量中。我们打算向普通的Redis对象实例一样使用这个对象。

This is exactly what happens in line 11 to 19:when we restart the example we don't want to start again with foo0,so we store the counter inside Redis itself. The code above is designed to readthis counter, or if the counter does not exist, to assign it the value of zero.

这切确发生在11到19行:当我们重启这个例子时,我们不想从foo0开始,因此我们在Redis中存储了这个计数器。上面的代码就是设计来读取这个计数器,或如果计数器不存在是分配0值。

However note how it is a while loop, as we want to tryagain and again even if the cluster is down and is returning errors. Normal applicationsdon't need to be so careful.

然而注意到这是一个while 循环,因此我们会一直尝试即使是集群已经关闭或返回错误。普通的应用是不需要如此小心的。

Linesbetween 21 and 30 start the main loop wherethe keys are set or an error is displayed.

21到30行开始设置key和展示错误的主要循环。

Note the sleep call at the end of the loop. Inyour tests you can remove the sleep if you want to write to the cluster as fastas possible (relatively to the fact that this is a busy loop without realparallelism of course, so you'll get the usually 10k ops/second in the best ofthe conditions).

注意到循环结尾的休眠。在你的测试中如果你想要尽快的往集群中写,可以删除掉该休眠(对应的事实是这是一个没有真正并发的忙循环,因此你通常可以在最好的条件下达到每秒10K操作的速度)

Normally writes are slowed down in order for the exampleapplication to be easier to follow by humans.

为了例子应用更好的观察,一般来说写速度要减缓下来。

Starting the application produces the following output:

运行应用会产生下面的输出:

ruby ./example.rb
1
2
3
4
5
6
7
8
9
^C (停止应用)

This is not a very interesting program and we'll use abetter one in a moment but we can already see what happens during a reshardingwhen the program is running.

这不是一个非常有趣的应用,我们会在后面使用一个更好的,但是我们已经可以看到在重分配是正在运行的应用会发生什么。

重分配集群

Now we are ready to try a cluster resharding. To do thisplease keep the example.rb program running, so that you can see if there issome impact on the program running. Also you may want to comment the sleepcall in order to have some more serious write load during resharding.

Resharding basically means to move hash slots from a set ofnodes to another set of nodes, and like cluster creation it is accomplishedusing the redis-trib utility.

现在我们已经准备好来尝试集群的重分配。为了进行这个操作请保持example.rb应用持续运行,这样你就能看到对应用的一些影响。另外你可以为了保持在重分配过程中更加严重的写负载将休眠操作注释掉。

To start a resharding just type:

启动一个重分配只需要输入:

./redis-trib.rb reshard 127.0.0.1:7000

You only need to specify a single node, redis-trib willfind the other nodes automatically.

你只需要指定一个节点,redis-trib会自动找到其它的节点。

Currently redis-trib is only able to reshard with theadministrator support, you can't just say move 5% of slots from this node tothe other one (but this is pretty trivial to implement). So it starts withquestions. The first is how much a big resharding do you want to do:

当前的redis-trib只支持管理支持的重分配,你不能只是简单的说从一个节点移动5%的槽道另一个节点(但这对于实现了说是箱单繁琐的)。因此它会由一些问题开始,第一个是你想要重分配多少:

How many slots do you want to move (from 1 to 16384)?

We can try to reshard 1000 hash slots, that should alreadycontain a non trivial amount of keys if the example is still running withoutthe sleep call.

我们可以重分配1000个哈希槽,如果应用在持续运行的话,应该已经包含不少键的数量。

Then redis-trib needs to know what is the target of theresharding, that is, the node that will receive the hash slots. I'll use thefirst master node, that is, 127.0.0.1:7000, but I need to specify the Node IDof the instance. This was already printed in a list by redis-trib, but I canalways find the ID of a node with the following command if I need:

Redis-trib需要知道重分配的目标,也就是接收哈希槽的新节点。我会使用第一个节点,也就是127.0.0.1:7000,但是我需要指定实例的Node ID,这通常会被redis-trib打印出来,不过我可以通过下面的命令找到节点的ID:

$ redis-cli -p 7000 cluster nodes | grep myself
97a3a64667477371c4479320d683e4c8db5858b1 :0 myself,master - 0 0 0 connected 0-5460

Ok so my target node is97a3a64667477371c4479320d683e4c8db5858b1.

因此我的目标ID是97a3a64667477371c4479320d683e4c8db5858b1

Now you'll get asked from what nodes you want to take thosekeys. I'll just type all in order to take a bit of hash slots fromall the other master nodes.

现在回要询问从哪些节点来移除这些key,我只是输入all来使得其它各个主节点都会移除一些哈希槽。

After the final confirmation you'll see a message for everyslot that redis-trib is going to move from a node to another, and a dot will beprinted for every actual key moved from one side to the other.

在最后的确定后,你会看到redis-trib正在节点之间移除的哈希槽的信息,每一个移动的真实密钥还会打印一个点。

While the resharding is in progress you should be able tosee your example program running unaffected. You can stop and restart itmultiple times during the resharding if you want.

当重分配正在进行时,你啃一看到你的程序是不受影响的,在重分配过程中你可以停止和重启多次。

At the end of the resharding, you can test the health ofthe cluster with the following command:

在重分配的最后,你可以使用以下命令来测试集群健康:

./redis-trib.rb check 127.0.0.1:7000

All the slots will be covered as usually, but this time themaster at 127.0.0.1:7000 will have more hash slots, something around 6461.

所有的哈希槽会像之前一样被覆盖,但是现在127.0.0.1:7000的主服务器会有更多的哈希槽,6461个。

重分配过程脚本化

Reshardings can be performed automatically without the needto manually enter the parameters in an interactive way. This is possible usinga command line like the following:

重分配的过程可以不需要以交互的方式输入参数而自动进行,可以通过类似下面的命令行来操作:

./redis-trib.rb reshard : --from  --to  --slots --yes

This allows to build some automatism if you are likely toreshard often, however currently there is no way for redis-trib toautomatically rebalance the cluster checking the distribution of keys acrossthe cluster nodes and intelligently moving slots as needed. This feature willbe added in the future.

这样就可以在经常需要重分配的过程中进行自动化,当前redis-trib没有方法来自动的检查集群节点中的keys的分布而进行平衡和智能的根据需要进行哈希槽的移动。这个特点在将来会增加进来。

一个更有趣的应用例子

The example application we wrote early is not very good. Itwrites to the cluster in a simple way without even checking if what was writtenis the right thing.

之前写的应用例子不是非常好,只是简单的向集群写设置都没有检查写的是否正确。

From our point of view the cluster receiving the writescould just always write the key foo to 42 to everyoperation, and we would not notice at all.

从我们的角度来看,集群接收写操作可能每次只是对key foo写道了42,我们无法注意到全部。

So in the redis-rb-cluster repository, thereis a more interesting application that is called consistency-test.rb.It uses a set of counters, by default 1000, and sends INCR commands in order to incrementthe counters.

因此在redis-rb-cluster仓库中,有一个跟你更有趣的应用:consistency-test.rb。它使用了一个计数器的集合,默认是1000,每一次发送INCR命令来递增计数器。

However instead of just writing, the application does twoadditional things:

然而不仅仅是写,应用还做了两件其它的事情

  • When a counter is updated using INCR, the application remembers the write.
  • 当使用INCR更新计数器是,应用记录写操作。
  • It also reads a random counter before every write, and check if the value is what we expected it to be, comparing it with the value it has in memory.
  • 每次写操作之前会随机读取一个计数器,和内存中的值进行对比检查它的值是不是正确的

What this means is that this application is a simple consistency checker,and is able to tell you if the cluster lost some write, or if it accepted awrite that we did not received acknowledgment for. In the first case we'll seea counter having a value that is smaller than the one we remember, while in thesecond case the value will be greater.

这意味着这个应用是一个简单的一致性检查,可以告诉你集群是不是丢失了一些些,或者接受了一个写而我们没有收到响应。在第一个情况下我们会看到计数器的值比我们记录的下,而后一种情况值会比我们记录的大。

Running the consistency-test application produces a line ofoutput every second:

运行consistency-test应用汇每秒产生一行输出:

$ ruby consistency-test.rb
925 R (0 err) | 925 W (0 err) |
5030 R (0 err) | 5030 W (0 err) |
9261 R (0 err) | 9261 W (0 err) |
13517 R (0 err) | 13517 W (0 err) |
17780 R (0 err) | 17780 W (0 err) |
22025 R (0 err) | 22025 W (0 err) |
25818 R (0 err) | 25818 W (0 err) |

The line shows the number of Reads and Writesperformed, and the number of errors (query not accepted because of errors sincethe system was not available).

一行显示了执行的读和写的数量还有错误的数量(因为系统不可用而产生的拒绝查询的错误)

If some inconsistency is found, new lines are added to theoutput. This is what happens, for example, if I reset a counter manually whilethe program is running:

如果不一致的情况出现,输出中会增加一些新的行。比如在程序运行中我手动的重置了一个计数器。

$ redis 127.0.0.1:7000> set key_217 0
OK
 
(in the other tab I see...)
 
94774 R (0 err) | 94774 W (0 err) |
98821 R (0 err) | 98821 W (0 err) |
102886 R (0 err) | 102886 W (0 err) | 114 lost |
107046 R (0 err) | 107046 W (0 err) | 114 lost |

When I set the counter to 0 the real value was 114, so theprogram reports 114 lost writes (INCRcommands that are not remembered by the cluster).

当我将计数器设置为0,而真正的值是114,一次程序会报告丢失了114个写(INCR命令是不会被集群记录下来的)

This program is much more interesting as a test case, sowe'll use it to test the Redis Cluster failover.

这个程序作为一个测试例子是更加有趣了,因此我们会是用它来测试Redis集群的故障转移。

测试故障转移

Note: during this test, you should take a tab open with theconsistency test application running.

注意:在这个测试中,你应该重新打开一个窗口来运行consistency test应用。

In order to trigger the failover, the simplest thing we cando (that is also the semantically simplest failure that can occur in adistributed system) is to crash a single process, in our case a single master.

为了触发故障转移,最简单的方法就是关闭一个进程(这也是在分布系统中失败的最简单语义了),在我们的例子中就是一个主服务器。

We can identify a cluster and crash it with the followingcommand:

我们可以通过下面命令来确定集群,关闭集群:

$ redis-cli -p 7000 cluster nodes | grep master
3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 127.0.0.1:7001 master - 0 1385482984082 0 connected 5960-10921
2938205e12de373867bf38f1ca29d31d0ddb3e46 127.0.0.1:7002 master - 0 1385482983582 0 connected 11423-16383
97a3a64667477371c4479320d683e4c8db5858b1 :0 myself,master - 0 0 0 connected 0-5959 10922-11422

Ok, so 7000, 7001, and 7002 are masters. Let's crash node7002 with the DEBUGSEGFAULT command:

OK ,因此7000,7001,7002是主服务器,让我们使用DEBUG SEGFAULT命令来关闭7002节点:

$ redis-cli -p 7002 debug segfault
Error: Server closed the connection

Now we can look at the output of the consistency test tosee what it reported.

现在我们可以看到consistency test的输出来看下发生了什么

18849 R (0 err) | 18849 W (0 err) |
23151 R (0 err) | 23151 W (0 err) |
27302 R (0 err) | 27302 W (0 err) |
 
... many error warnings here ...
 
29659 R (578 err) | 29660 W (577 err) |
33749 R (578 err) | 33750 W (577 err) |
37918 R (578 err) | 37919 W (577 err) |
42077 R (578 err) | 42078 W (577 err) |

As you can see during the failover the system was not ableto accept 578 reads and 577 writes, however no inconsistency was created in thedatabase. This may sound unexpected as in the first part of this tutorial westated that Redis Cluster can lose writes during the failover because it usesasynchronous replication. What we did not say is that this is not very likelyto happen because Redis sends the reply to the client, and the commands toreplicate to the slaves, about at the same time, so there is a very smallwindow to lose data. However the fact that it is hard to trigger does not meanthat it is impossible, so this does not change the consistency guaranteesprovided by Redis cluster.

正如看到的,在系统的故障转移过程中无法接收了578个读和577个写,然而没有任何不一致的情况会产生。这听起来和本篇指导中提到的Redis在故障转移过程中会由于异步复制丢失写保持一致。我们没有说的是这几乎不可能发生是因为Redis会向客户端发送响应同时将命令复制到从服务器,所有只会有一个非常小的窗口丢失数据。而是事实是这很难出发不意味着 不可能,因此这并不改变Redis集群保证的一致性。

We can now check what is the cluster setup after the failover(note that in the meantime I restarted the crashed instance so that it rejoinsthe cluster as a slave):

我们可以在故障转移之后检查集群的配置(注意到我同时重启了关闭的实例来让它以从服务器的身份重新连接到集群中)

$ redis-cli -p 7000 cluster nodes
3fc783611028b1707fd65345e763befb36454d73 127.0.0.1:7004 slave 3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 0 1385503418521 0 connected
a211e242fc6b22a9427fed61285e85892fa04e08 127.0.0.1:7003 slave 97a3a64667477371c4479320d683e4c8db5858b1 0 1385503419023 0 connected
97a3a64667477371c4479320d683e4c8db5858b1 :0 myself,master - 0 0 0 connected 0-5959 10922-11422
3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 127.0.0.1:7005 master - 0 1385503419023 3 connected 11423-16383
3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 127.0.0.1:7001 master - 0 1385503417005 0 connected 5960-10921
2938205e12de373867bf38f1ca29d31d0ddb3e46 127.0.0.1:7002 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385503418016 3 connected

Now the masters are running on ports 7000, 7001 and 7005.What was previously a master, that is the Redis instance running on port 7002,is now a slave of 7005.

现在主服务器运行在端口7000.7001和7005上。之前运行在7002端口的Redis是主服务器,现在是7005的一个从服务器。

The output of the CLUSTER NODES command maylook intimidating, but it is actually pretty simple, and is composed of thefollowing tokens:

CLUSTER NODES命令的输出可能看起来吓人,但是其实是非常简单的,有下面的这些部分组成:

  • Node ID
  • ip:port
  • flags: master, slave, myself, fail, ...
  • if it is a slave, the Node ID of the master
  • Time of the last pending PING still waiting for a reply.
  • Time of the last PONG received.
  • Configuration epoch for this node (see the Cluster specification).
  • Status of the link to this node.
  • Slots served...

手动故障转移

Sometimes it is useful to force a failover without actuallycausing any problem on a master. For example in order to upgrade the Redisprocess of one of the master nodes it is a good idea to failover it in order toturn it into a slave with minimal impact on availability.

有时候强制进行一个故障转移一个没有真正发生问题的主服务器是有用的,比如为了生计Redis的一个主服务器的进程而进行故障转移就是就是一个好主意,这样将其转换为从服务器最小化的影响可用性。

Manual failovers are supported by Redis Cluster using the CLUSTER FAILOVER command,that must be executed in one of the slaves of the master you want tofailover.

Redis集群支持使用CLUSTER FAILOVER来进行手工故障转移,这必须是在你想要进行故障转移的主服务器的某一个从服务器上运行。

Manual failovers are special and are safer compared tofailovers resulting from actual master failures, since they occur in a way thatavoid data loss in the process, by switching clients from the original masterto the new master only when the system is sure that the new master processedall the replication stream from the old one.

手工故障转移是特殊的,比较起由于发生真正的主服务器故障而进行的故障转移更加安全,因为可以以一种避免损失数据的方式进行。只要确保只在要转换为新的主服务器已经接收并处理了从旧的主服务器复制流的所有数据在进行故障转移就可以了。

This is what you see in the slave log when you perform amanual failover:

下面是当你进行手工故障转移时在从服务器日记中看到的:

# Manual failover user request accepted.
# Received replication offset for paused master manual failover: 347540
# All master replication stream processed, manual failover can start.
# Start of election delayed for 0 milliseconds (rank #0, offset 347540).
# Starting a failover election for epoch 7545.
# Failover election won: I'm the new master.

Basically clients connected to the master we are failingover are stopped. At the same time the master sends its replication offset tothe slave, that waits to reach the offset on its side. When the replicationoffset is reached, the failover starts, and the old master is informed aboutthe configuration switch. When the clients are unblocked on the old master,they are redirected to the new master.

基本上客户端连接到我们正在进行故障转移的主服务器会被停止,而同时主服务器会发送它的复制偏移到从服务器,然后等待从服务器达到这个偏移,当复制偏移到达后,故障转移启动,旧的主服务器会声称配置的转换。当客户端从旧的主服务器解除堵塞时会被重定向到新的主服务器。

增加新的节点

Adding a new node is basically the process of adding anempty node and then moving some data into it, in case it is a new master, ortelling it to setup as a replica of a known node, in case it is a slave.

增加新的节点实际上是增加一个新的空节点,然后将数据移除到空节点的过程,可能是一个新的主服务器,或者是作为一个已知节点的复制,也就是一个从服务器。

We'll show both, starting with the addition of a new masterinstance.

我们会展示所有的情况,首先从增加一个新的主服务器实例开始。

In both cases the first step to perform is adding an empty node.

在所有情况中,第一步都是增加一个空的节点。

This is as simple as to start a new node in port 7006 (wealready used from 7000 to 7005 for our existing 6 nodes) with the sameconfiguration used for the other nodes, except for the port number, so what youshould do in order to conform with the setup we used for the previous nodes:

这是很简单的,只要使用之前其它节点一样的配置,除了端口不是从7000到7005,这里使用7006,然后启动一个新的实例:

  • Create a new tab in your terminal application.
  • 在你的终端应用中打开一个新的窗口
  • Enter the cluster-test directory.
  • 进入cluster-test目录
  • Create a directory named 7006
  • 创建一个叫7006的目录
  • Create a redis.conf file inside, similar to the one used for the other nodes but using 7006 as port number.
  • 在目录下创建一个redis.conf目录,同其它节点使用的类似,端口改为7006
  • Finally start the server with ../redis-server ./redis.conf
  • 最后使用命令./redis-server ./redis.conf启动服务器。

At this point the server should be running.

服务器应该已经正在运行了。

Now we can use redis-trib as usually in order to addthe node to the existing cluster.

现在想之前一样使用redis-trib来增加节点到现有的集群。

./redis-trib.rb add-node 127.0.0.1:7006 127.0.0.1:7000

As you can see I used the add-node command specifying theaddress of the new node as first argument, and the address of a random existingnode in the cluster as second argument.

你可以看到我使用了add-node命令,指定新节点的地址作为第一个参数,现有集群中随机的一个节点的地址作为第二个参数

In practical terms redis-trib here did very little to helpus, it just sent a CLUSTER MEETmessage to the node, something that is also possible to accomplish manually.However redis-trib also checks the state of the cluster before to operate, soit is a good idea to perform cluster operations always via redis-trib even whenyou know how the internals work.

在实际的情况下,redis-trib其实做了非常少的事情,只是发送了CLUSTERMEET消息到节点,类似的事情也是可以通过手动完成的。然而redis-trib还会在操作之前检查集群的状态,因此通过redis-trib来执行集群操作是一个非常好的主意,即使你已经明白内部的工作原理。

Now we can connect to the new node to see if it reallyjoined the cluster:

现在我们可以连接的新的节点来查看是否已经连接到集群了:

redis 127.0.0.1:7006> cluster nodes
3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 127.0.0.1:7001 master - 0 1385543178575 0 connected 5960-10921
3fc783611028b1707fd65345e763befb36454d73 127.0.0.1:7004 slave 3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 0 1385543179583 0 connected
f093c80dde814da99c5cf72a7dd01590792b783b :0 myself,master - 0 0 0 connected
2938205e12de373867bf38f1ca29d31d0ddb3e46 127.0.0.1:7002 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385543178072 3 connected
a211e242fc6b22a9427fed61285e85892fa04e08 127.0.0.1:7003 slave 97a3a64667477371c4479320d683e4c8db5858b1 0 1385543178575 0 connected
97a3a64667477371c4479320d683e4c8db5858b1 127.0.0.1:7000 master - 0 1385543179080 0 connected 0-5959 10922-11422
3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 127.0.0.1:7005 master - 0 1385543177568 3 connected 11423-16383

Note that since this node is already connected to thecluster it is already able to redirect client queries correctly and isgenerally speaking part of the cluster. However it has two peculiaritiescompared to the other masters:

注意到因为这个节点已经连接到集群,所以已经是可以正确的重定向客户端的请求了,已经是集群堆外的接口的一部分了。然而比起其它的主服务器有两个奇怪的特点:

  • It holds no data as it has no assigned hash slots.
  • 因为它没有被分配哈希槽,没有持有任何数据。
  • Because it is a master without assigned slots, it does not participate in the election process when a slave wants to become a master.
  • 因为这是一个没有被分配哈希槽的主服务器,它不会参与到从服务器提升到主服务器的进程中来。

Now it is possible to assign hash slots to this node usingthe resharding feature of redis-trib. It is basically useless toshow this as we already did in a previous section, there is no difference, itis just a resharding having as a target the empty node.

现在已经可以使用redis-trib重分配的功能来分配哈希槽到这个节点上了,因为之前已经描述了,这里就不在展示,两者是没有任何区别的,只是现在重分配的目标是一个空的节点。

增加一个节点来作为复制

Adding a new Replica can be performed in two ways. Theobvious one is to use redis-trib again, but with the --slave option, like this:

增加一个新的复制可以以两种方式来进行。第一个是使用redis-trib,使用—slave选项,像这样:

./redis-trib.rb add-node --slave 127.0.0.1:7006 127.0.0.1:7000

Note that the command line here is exactly like the one weused to add a new master, so we are not specifying to which master we want toadd the replica. In this case what happens is that redis-trib will add the newnode as replica of a random master among the masters with less replicas.

注意到这里的命令和和之前增加一个新的主服务器几乎一样,因此我们没有设定作为哪个主服务器的复制。在这种情况下redis-trib回将新的节点作为在缺少从服务器的主服务器之间的随机一个的复制。

However you can specify exactly what master you want totarget with your new replica with the following command line:

而然你可以具体设定新的复制要作用多目标主服务器,使用下面的命令行:

./redis-trib.rb add-node --slave --master-id 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 127.0.0.1:7006 127.0.0.1:7000

This way we assign the new replica to a specific master.

这样就指定作为特定主服务器的新的复制了。

A more manual way to add a replica to a specific master isto add the new node as an empty master, and then turn it into a replica usingthe CLUSTER REPLICATEcommand. This also works if the node was added as a slave but you want to moveit as a replica of a different master.

另一个手动增加复制到指定主服务器的方式就是将新节点增加为空的主服务器,然后通过CLUSTER REPLCATE命令转换为一个复制,如果这个节点是以从服务器来增加的也是可以作用的,但是就是想要将一个复制转移到另一个不同的主服务器上。

For example in order to add a replica for the node127.0.0.1:7005 that is currently serving hash slots in the range 11423-16383,that has a Node ID 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e, all I need to dois to connect with the new node (already added as empty master) and send thecommand:

例如为了增加一个127.0.0.1:7005的复制,当前服务的哈希槽范围是11423-16383,Node Id 为3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e,所需要做的就是连接到新的节点(已经作为一个空的主服务器)然后发送以下命令:

redis 127.0.0.1:7006> cluster replicate 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e

That's it. Now we have a new replica for this set of hashslots, and all the other nodes in the cluster already know (after a few secondsneeded to update their config). We can verify with the following command:

就这样,现在我们有了这个哈希槽集合的新复制,集群中的其它所有节点(需要几秒钟后来更新他们的配置)都了解到这个情况。我们可以通过以下命令来检验:

$ redis-cli -p 7000 cluster nodes | grep slave | grep 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e
f093c80dde814da99c5cf72a7dd01590792b783b 127.0.0.1:7006 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385543617702 3 connected
2938205e12de373867bf38f1ca29d31d0ddb3e46 127.0.0.1:7002 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385543617198 3 connected

The node 3c3a0c... now has two slaves, running on ports7002 (the existing one) and 7006 (the new one).

节点3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e现在有两个从服务器,运行在端口7002和7006(新增的)上。

移除一个复制

To remove a slave node just use the del-nodecommand of redis-trib:

要移除以个从节点只需要在redis-trib中使用del-node命令:

./redis-trib del-node 127.0.0.1:7000 ``

The first argument is just a random node in the cluster,the second argument is the ID of the node you want to remove.

第一个参数是集群中随机一个节点的地址,第二个参数是你想要移除的实例的ID。

You can remove a master node in the same way as well, however in order to remove amaster node it must be empty. If the master is not empty youneed to reshard data away from it to all the other master nodes before.

你也可以以同样的方式来移除主服务器,但是在移除主服务器之前其必须是空的。如果一个主服务器非空,你需要重新非陪数据,先将数据转移到其它主服务器节点上。

An alternative to remove a master node is to perform amanual failover of it over one of its slaves and remove the node after itturned into a slave of the new master. Obviously this does not help when youwant to reduce the actual number of masters in your cluster, in that case, aresharding is needed.

另一个移除主服务器的替换方法是手工执行故障转移到它的某一个从服务器上,然后在转换为新主服务器的从服务器后在移除。很显然这样对你想减少集群中的主服务器是没有帮助的,要做到那样必须进行数据的重分配。

复制迁移

In Redis Cluster it is possible to reconfigure a slave to replicatewith a different master at any time just using the following command:

在Redis集群中,有可能需要将一个从服务器配置成另一个主服务器的复制,只需要输入以下命令:

CLUSTER REPLICATE 

However there is a special scenario where you want replicasto move from one master to another one automatically, without the help of thesystem administrator. The automatic reconfiguration of replicas is called replicas migrationand is able to improve the reliability of a Redis Cluster.

然而又一个特殊的场景中,你想要将复制从一个主服务器自动的迁移到另一个主服务器,而不需要系统管理员的帮助。这个自动的复制重配置称为replicas migration,这是可以改进Redis集群的可靠性的。

Note: you can read the details of replicas migration in theRedis Cluster Specification,here we'll only provide some information about the general idea and what youshould do in order to benefit from it.

注意:你可以在Redis Cluster Specification中读到关于复制迁移的详细内容,这里我们只提供了这个主意的一些信息和你应该怎么做来获得这个好处。

The reason why you may want to let your cluster replicas tomove from one master to another under certain condition, is that usually theRedis Cluster is as resistant to failures as the number of replicas attached toa given master.

期望集群复制可以制动的从一个主服务器在特定条件下迁移到另一个主服务器的原因是:确保在故障的情况下可以保证一个指定主服务器的复制数量。

For example a cluster where every master has a singlereplica can't continue operations if the master and its replica fail at thesame time, simply because there is no other instance to have a copy of the hashslots the master was serving. However while netsplits are likely to isolate anumber of nodes at the same time, many other kind of failures, like hardware orsoftware failures local to a single node, are a very notable class of failuresthat are unlikely to happen at the same time, so it is possible that in yourcluster where every master has a slave, the slave is killed at 4am, and themaster is killed at 6am. This still will result in a cluster that can no longeroperate.

比如一个每一个主服务器有一个复制的集群当主服务器和复制同时失败时就无法正常工作了,就仅仅是因为没有其它实例有着那个主服务器服务器哈希槽的拷贝。虽然当网络隔离有可能会同时分隔一些节点,另外一些类型的故障,比如的某个节点的硬件或软件故障就是一种几乎不可能同时发生的常见的故障类型,因此可能是你的从服务器在4am 被关闭,而主服务器在6am倍关闭,这样依然会造成集群无法继续工作。

To improve reliability of the system we have the option toadd additional replicas to every master, but this is expensive. Replicamigration allows to add more slaves to just a few masters. So you have 10masters with 1 slave each, for a total of 20 instances. However you add, forexample, 3 instances more as slaves of some of your masters, so certain masterswill have more than a single slave.

为了提高集群的可靠性,我们可以选择对每一个主服务器增加额外的复制,但这是昂贵的。复制迁移可以只在一些主服务器上增加从服务器,比如如果10个主服务器都有一个从服务器总共是20个实例,再增加比如3台实例来作为某一些主服务器的复制,也就是说某些主服务器又超过一个的从服务器。

With replicas migration what happens is that if a master isleft without slaves, a replica from a master that has multiple slaves willmigrate to the orphanedmaster. So after your slave goes down at 4am as in the example we made above,another slave will take its place, and when the master will fail as well at5am, there is still a slave that can be elected so that the cluster cancontinue to operate.

有了复制迁移,如果一个主服务器没有了从服务器时,一个有着多个从服务器的主服务器中的某一个复制会迁移到孤单的单一主服务器上,因此当你的从服务器在4am宕机,而主服务器5am宕机时,依然会有一个从服务器会被提升有主服务器,这样集群就可以继续工作了。

So what you should know about replicas migration in short?

因此简单的来说复制迁移就是?

  • The cluster will try to migrate a replica from the master that has the greatest number of replicas in a given moment.
  • 集群会尝试迁移那些某时刻最多的从服务器的主服务器的复制。
  • To benefit from replica migration you have just to add a few more replicas to a single master in your cluster, it does not matter what master.
  • 要从复制迁移中获取好处你只要增加一些额外的复制到你集群的某个主服务器上,无所谓是哪个主服务器。
  • There is a configuration parameter that controls the replica migration feature that is called cluster-migration-barrier: you can read more about it in the example redis.conf file provided with Redis Cluster.
  • 有一个控制复制迁移功能的参数:cluster-migration-barrier:你可以从Redis集群的配置示例redis.conf了解更多。

更新集群中的一个节点

Upgrading slave nodes is easy since you just need to stopthe node and restart it with an updated version of Redis. If there are clientsscaling reads using slave nodes, they should be able to reconnect to adifferent slave if a given one is not available.

更新一个从节点非常简单,因为你只要停止该节点,然后更新Redis版本后重启就可以了,如果有一些客户端使用从节点来扩展读,他们应该可以在当前从服务器不可用是重新连接到另一个从服务器。

Upgrading masters is a bit more complex, and the suggestedprocedure is:

更新主服务器有一点复杂,建议流程如下:

  1. Use CLUSTER FAILOVER to trigger a manual failover of the master to one of its slaves (see the "Manual failover" section of this documentation).
  2. 使用CLUSTER FAILOVER来出发一个手动的故障转移从主服务器到它的从服务器上(见 手动故障转移 章节)
  3. Wait for the master to turn into a slave.
  4. 等待主服务器转变会从服务器。
  5. Finally upgrade the node as you do for slaves.
  6. 按照从服务器的步骤更新节点
  7. If you want the master to be the node you just upgraded, trigger a new manual failover in order to turn back the upgraded node into a master.
  8. 如果希望使用更新的节点来作为主服务器,触发一个新的手动故障转移将更新后的节点转变为主服务器。

Following this procedure you should upgrade one node afterthe other until all the nodes are upgraded.

按照这个流程,你可以一个接一个的更新所有的节点。

迁移到Redis集群

Users willing to migrate to Redis Cluster may have just asingle master, or may already using a preexisting sharding setup, where keysare split among N nodes, using some in-house algorithm or a sharding algorithmimplemented by their client library or Redis proxy.

有用打算迁移到Redis集群的以前配置可能是只是一个单一的主服务器,或者已经是现存的分散配置,keys使用一些内部的算法或由客户端实现的分散算法划分到N节点。

In both cases it is possible to migrate to Redis Clustereasily, however what is the most important detail is if multiple-keysoperations are used by the application, and how. There are three differentcases:

所有情况下都可以容易的迁移到Redis集群中。然而最重要的是如果之前的应用有使用多key的操作,这里有3个不同的情况:

  1. Multiple keys operations, or transactions, or Lua scripts involving multiple keys, are not used. Keys are accessed independently (even if accessed via transactions or Lua scripts grouping multiple commands, about the same key, together).
  2. 多key操作,或事务,或涉及到多个key的Lua脚本没有被使用到,key是单独的获取的(即使是通过事务或Lua脚本来组织多个命令,但key是相同的)
  3. Multiple keys operations, transactions, or Lua scripts involving multiple keys are used but only with keys having the same hash tag, which means that the keys used together all have a {...} sub-string that happens to be identical. For example the following multiple keys operation is defined in the context of the same hash tag: SUNION {user:1000}.foo {user:1000}.bar.
  4. 多个key操作,事务或设计多个key的Lua脚本使用,但是key用同样的hash标签,这意味使用的key都有一个{…}的子字符串来确定。比如下面的多个key操作都处于同样的哈希tag上下文:SUNION{user:1000}.foo {user:1000}.bar
  5. Multiple keys operations, transactions, or Lua scripts involving multiple keys are used with key names not having an explicit, or the same, hash tag.
  6. 多个key操作,事务,或涉及多个key的Lua脚本中的key名字没有显式的,或者相同的哈希标签。

The third case is not handled by Redis Cluster: theapplication requires to be modified in order to don't use multi keys operationsor only use them in the context of the same hash tag.

第三种情况无法被Redis集群处理:应用需要修来避免使用多个Key操作或在同样的哈希标签上下文中使用。

Case 1 and 2 are covered, so we'll focus on those two cases,that are handled in the same way, so no distinction will be made in thedocumentation.

情况1和2是被覆盖的,因此我们只关注这俩种情况,它们都被按照同样的方式来处理,在文档中没有区别。

Assuming you have your preexisting data set split into Nmasters, where N=1 if you have no preexisting sharding, the following steps areneeded in order to migrate your data set to Redis Cluster:

假定你之前的数据集被划分为N个主服务器,当N=1是就是之前没有划分,下面是将数据迁移到Redis集群的步骤:

  1. Stop your clients. No automatic live-migration to Redis Cluster is currently possible. You may be able to do it orchestrating a live migration in the context of your application / environment.
  2. 停止客户端。现在自动的活跃状态下迁移到Redis集群是不可能的。你可能在你的应用环境上下文中进行这种活跃状态下的迁移。
  3. Generate an append only file for all of your N masters using the BGREWRITEAOF command, and waiting for the AOF file to be completely generated.
  4. 使用BGREWRITEAOF命令来为你的每一个主服务器生成一个只允许增加的文件,等到AOF文件生成完成。
  5. Save your AOF files from aof-1 to aof-N somewhere. At this point you can stop your old instances if you wish (this is useful since in non-virtualized deployments you often need to reuse the same computers).
  6. 从aof-1到aof-N的AOF文件到某个地方,目前你可以选择停止的你实例(当在非虚拟部署的环境下通常有用,因为你常常要重复使用同样的计算机)
  7. Create a Redis Cluster composed of N masters and zero slaves. You'll add slaves later. Make sure all your nodes are using the append only file for persistence.
  8. 创建一个包含N个主服务器和0个从服务器的Redis集群。后面再增加从服务器,确保你所有的节点使用只允许增加的文件来作为存储。
  9. Stop all the cluster nodes, substitute their append only file with your pre-existing append only files, aof-1 for the first node, aof-2 for the second node, up to aof-N.
  10. 停止集群的所有节点,用之前保存的AOF文件替换,aof-1替换第一个节点,aof-2替换第二个节点,直到aof-n
  11. Restart your Redis Cluster nodes with the new AOF files. They'll complain that there are keys that should not be there according to their configuration.
  12. 使用新的AOF文件重启你的Redis集群。他们可能会抱怨根据他们的配置这里不应该存在key。
  13. Use redis-trib fix command in order to fix the cluster so that keys will be migrated according to the hash slots each node is authoritative or not.
  14. 使用redis-trib fix命令来固定集群,这样keys会被迁移到对应的哈希槽和节点上。
  15. Use redis-trib check at the end to make sure your cluster is ok.
  16. 使用redis-trib check命令最后来确认你的集群是否ok
  17. Restart your clients modified to use a Redis Cluster aware client library.
  18. 重启你修改后的客户端来使用支持Redis集群的客户端库。

There is an alternative way to import data from externalinstances to a Redis Cluster, which is to use the redis-trib importcommand.

这里有一个从现有实例导入数据到Redis集群的替换方法,就是使用redis-tribimport 命令。

The command moves all the keys of a running instance(deleting the keys from the source instance) to the specified pre-existingRedis Cluster. However note that if you use a Redis 2.8 instance as sourceinstance the operation may be slow since 2.8 does not implement migrateconnection caching, so you may want to restart your source instance with aRedis 3.x version before to perform such operation.

这个命令说移动正在运行的实例的所有key(从源实例中删除)到指定的已知Redis集群。然而注意到,如果你使用redis2.8实例作为源实例,这个操作可能会很慢,因为2.8没有实现迁移连接缓存,因此你可能想要在执行这个操作之前使用Redis3.x来重启你的实例。

你可能感兴趣的:(java,redis)