以下翻译自redis官网.
基于redis的分布式锁
当多个进程需要以互斥的方式来操作共享数据的时候,分布式锁在这样的环境中非常有用。
有很多博客和库都描述了如何使用Redis实现一个DLM(分布式锁管理),但是每个库都用了不同方法,而且与稍微复杂的设计实现相比,很多使用简单的方法并不能保证高可用。
本篇文章尝试提供一种基于redis实现分布式锁的标准算法。我们提出的这种算法教着RedLock,我们相信这种实现比那种简单的更为安全。我们希望社区能够分析它,提供反馈,并且能够用它来作为基本实现,或者更复杂的设计。
安全性和可用性保证
我们把设计模型归结为三种属性,这些属性是使用分布式锁所必须的最少的要求。
安全属性:互斥。在任意的时刻,只能有一个客户端拿到锁。
活性A:无死锁。最终总是能获取到锁,即使获取到锁的客户端崩溃。
活性B:容错。只要大部分redis节点活着,客户端就能够获取和释放锁。
为什么基于容错的实现是不够的?
要理解我们想要提高什么,我们先来分析一下当前大部分基于redis的分布式锁处于一种什么样的状态。
用redis来锁住一个资源的最简单的方法是在一个实例上创建一个key。这个key通常创建的时候就会指定存活时间,所以最终锁是会释放的。当客户端需要释放锁的时候,也可以del这个key。表面上这样很不错,但是有一个问题:在我们的架构中,这会是个单点故障。如果master宕了之后会发生什么?好吧,让我们增加一个slave!并使用它如果master不可用。很不幸这样不可行。这样做我们不能达到互斥的安全属性,因为redis是异步复制的。
很显然使用这种方式有一个很明显的竞争状态:
客户端A在master上获取到锁。
master在写这个key到slave上之前崩溃。
salve被选举为master。
客户端B这时候就可以获取到这把锁,这样A,B同时拥有锁,这样就违反了第一条属性。
有时候在这样的现象可以容忍的时候,你可以使用这种master,slave异步复制的方案。否则,我们建议你采用下面文档里的方法来实现分布式锁。
The Redlock algorithm
In the distributed version of the algorithm we assume we have N Redis masters. Those nodes are totally independent, so we don’t use replication or any other implicit coordination system. We already described how to acquire and release the lock safely in a single instance. We take for granted that the algorithm will use this method to acquire and release the lock in a single instance. In our examples we set N=5, which is a reasonable value, so we need to run 5 Redis masters on different computers or virtual machines in order to ensure that they’ll fail in a mostly independent way.
In order to acquire the lock, the client performs the following operations:
It gets the current time in milliseconds.
It tries to acquire the lock in all the N instances sequentially, using the same key name and random value in all the instances. During step 2, when setting the lock in each instance, the client uses a timeout which is small compared to the total lock auto-release time in order to acquire it. For example if the auto-release time is 10 seconds, the timeout could be in the ~ 5-50 milliseconds range. This prevents the client from remaining blocked for a long time trying to talk with a Redis node which is down: if an instance is not available, we should try to talk with the next instance ASAP.
The client computes how much time elapsed in order to acquire the lock, by subtracting from the current time the timestamp obtained in step 1. If and only if the client was able to acquire the lock in the majority of the instances (at least 3), and the total time elapsed to acquire the lock is less than lock validity time, the lock is considered to be acquired.
If the lock was acquired, its validity time is considered to be the initial validity time minus the time elapsed, as computed in step 3.
If the client failed to acquire the lock for some reason (either it was not able to lock N/2+1 instances or the validity time is negative), it will try to unlock all the instances (even the instances it believed it was not able to lock).
Redlock (红锁算法)
在这个算法的分布式版本中我们假定有N个redis masters。这些节点之间互相独立。所以我们不需要复制,或者间接使用其他协调系统。我们已经描述过怎么安全的获取和释放锁在单实例中。我们确信在单实例中,可以使用这种方法来获取和释放锁。在我们的例子中,我们设置N=5,一个比较合理的值,我们需要把这5个redis master部署在不同的电脑或者虚拟机上,为了保证他们之间挂掉没有联系。
为了获取锁,客户端需要执行下列的操作:
获取当前的毫秒值。
尝试顺序的从所有实例上获取锁。使用一样的key和随机value。