redis的分布式锁

Distributed locks with Redis

  Distributed locks are a very useful primitive in many environments where different processes must operate with shared resources in a mutually exclusive way.

  分布式锁在很多场景下十分有用,这些场景中不同的进程以互斥的方式操作共享资源。

  There are a number of libraries and blog posts describing how to implement a DLM (Distributed Lock Manager) with Redis, but every library uses a different approach, and many use a simple approach with lower guarantees compared to what can be achieved with slightly more complex designs.

  许多库和博客文章描述了如何使用Redis实现DLM(分布式锁管理器),但是每个库都使用不同的方法,而且与稍微复杂一些的设计相比,许多库使用更低保证的简单方法。

  This page is an attempt to provide a more canonical algorithm to implement distributed locks with Redis. We propose an algorithm, called Redlock, which implements a DLM which we believe to be safer than the vanilla single instance approach. We hope that the community will analyze it, provide feedback, and use it as a starting point for the implementations or more complex or alternative designs.

  这篇文章尝试提供一种更加规范化的算法,利用redis实现分布式锁。我们提出了一种称为Redlock的算法,这个算法实现了我们认为比单实例方法更加安全的DLM(分布式锁管理器)。我们希望社区能够分析它,提供反馈,并将其作为实现的起点或者更复杂或替代的设计。

Implementations

实现

Before describing the algorithm, here are a few links to implementations already available that can be used for reference.
在描述算法之前,这里有一些已经可用的实现可以用作参考。

  • Redlock-rb (Ruby implementation). There is also a fork of Redlock-rb that adds a gem for easy distribution and perhaps more.
  • Redlock-py (Python implementation).
  • Aioredlock (Asyncio Python implementation).
  • Redlock-php (PHP implementation).
  • PHPRedisMutex (further PHP implementation)
  • cheprasov/php-redis-lock (PHP library for locks)
  • Redsync.go (Go implementation).
  • Redisson (Java implementation).
  • Redis::DistLock (Perl implementation).
  • Redlock-cpp (C++ implementation).
  • Redlock-cs (C#/.NET implementation).
  • RedLock.net (C#/.NET implementation). Includes async and lock extension support.
  • ScarletLock (C# .NET implementation with configurable datastore)
  • node-redlock (NodeJS implementation). Includes support for lock extension.

Safety and Liveness guarantees

安全和活性保证

  We are going to model our design with just three properties that, from our point of view, are the minimum guarantees needed to use distributed locks in an effective way.

  1. Safety property: Mutual exclusion. At any given moment, only one client can hold a lock.
  2. Liveness property A: Deadlock free. Eventually it is always possible to acquire a lock, even if the client that locked a resource crashes or gets partitioned.
  3. Liveness property B: Fault tolerance. As long as the majority of Redis nodes are up, clients are able to acquire and release locks.

  我们用三个属性来模拟我们的设计,从我们的观点来看,这三个属性是有效使用分布式锁的最低保证。

  1. 安全属性:互斥。在任何时刻,只能有一个客户端(线程)持有锁。
  2. 活性属性A:死锁释放。最后总有可能能获得锁,即使锁定的资源的客户端(线程)崩溃或被分区
  3. 活性属性B:容错。只要大多数Redis节点处于启动状态,客户端(线程)就能够获取和释放锁。

Why failover-based implementations are not enough

为什么基于failover(失败自动切换)的实现是不足的

  To understand what we want to improve, let’s analyze the current state of affairs with most Redis-based distributed lock libraries.

  分析大多数基于redis分布式锁的库的事务的当前情况,弄清楚什么是我们想要提升的。

  The simplest way to use Redis to lock a resource is to create a key in an instance. The key is usually created with a limited time to live, using the Redis expires feature, so that eventually it will get released (property 2 in our list). When the client needs to release the resource, it deletes the key.

  最简单方法使用redis锁住资源的方法是在实例中创建一个key。这个key通常使用reids的expire特性设置存活时间来创建,这样是为了最后这个key能够被释放(上表中的属性2)

  Superficially this works well, but there is a problem: this is a single point of failure in our architecture. What happens if the Redis master goes down? Well, let’s add a slave! And use it if the master is unavailable. This is unfortunately not viable. By doing so we can’t implement our safety property of mutual exclusion, because Redis replication is asynchronous.

  表面上这样很好,但是有一个问题:这是我们架构中的一个单点故障。如果redis主节点挂了怎么办?的确,如果主节点挂了可以加一个从节点。不幸的是,这是不可行的。这样我们就不能实现互斥的安全特性,因为Redis复制是异步的。

There is an obvious race condition with this model:

  1. Client A acquires the lock in the master.
  2. The master crashes before the write to the key is transmitted to the slave.
  3. The slave gets promoted to master.
  4. Client B acquires the lock to the same resource A already holds a lock for.SAFETY VIOLATION!

  该模型存在明显的竞争条件:

  1. 客户端(线程)A获取主节点的锁。
  2. 在写key值传输到从节点之前主节点挂掉了。
  3. 从节点提升为主节点。
  4. 客户端B获得A已经持有的相同资源的锁。违反安全!

  Sometimes it is perfectly fine that under special circumstances, like during a failure, multiple clients can hold the lock at the same time. If this is the case, you can use your replication based solution. Otherwise we suggest to implement the solution described in this document.

  有时,在特殊情况下,比如在故障期间,多个客户端(线程)可以同时持有锁,这是完全可以接受的。如果是这样,你可以使用基于你的副本的解决方案。否则,我们建议实现本文中描述的解决方案。

Correct implementation with a single instance

使用单个实例正确实现

  Before trying to overcome the limitation of the single instance setup described above, let’s check how to do it correctly in this simple case, since this is actually a viable solution in applications where a race condition from time to time is acceptable, and because locking into a single instance is the foundation we’ll use for the distributed algorithm described here.

  在尝试克服上述单实例设置的限制之前,让我们检查一下在这个简单的情况下怎么做才是正确的,因为这正是在这个竞争条件有时是可以接受的应用中可行的解决方案,并且因为锁定在单个实例是我们用于描述这个分布式算法的基础。

  To acquire the lock, the way to go is the following:

  获得锁的方法如下:

set resource_name my_random_value NX PX 30000

  The command will set the key only if it does not already exist (NX option), with an expire of 30000 milliseconds (PX option). The key is set to a value “myrandomvalue”. This value must be unique across all clients and all lock requests.

  这个命令只会在key不存在的情况下才会设置key(NX选项),过期时间为30000毫秒(PX选项)。这个key的值为“myrandomvalue”。这个随机值在所有客户端和所有锁请求之间必须是惟一的。

  Basically the random value is used in order to release the lock in a safe way, with a script that tells Redis: remove the key only if it exists and the value stored at the key is exactly the one I expect to be. This is accomplished by the following Lua script:

  基本上,使用随机值是为了以一种安全的方式释放锁,以下脚本告诉Redis:只有当键存在且存储在key上的值正是我所期望的值时,才删除key。这是通过以下Lua脚本完成的

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end

  This is important in order to avoid removing a lock that was created by another client. For example a client may acquire the lock, get blocked in some operation for longer than the lock validity time (the time at which the key will expire), and later remove the lock, that was already acquired by some other client. Using just DEL is not safe as a client may remove the lock of another client. With the above script instead every lock is “signed” with a random string, so the lock will be removed only if it is still the one that was set by the client trying to remove it.

  这对于避免删除另一个客户端(线程)创建的锁非常重要。例如一个客户端(线程)获得锁,在某些操作中阻塞的时间超过了锁的有效时间(key的过期时间),而且移除了已经被其他客户端(线程)获得的锁。仅仅使用DEL是不安全的,因为客户端(线程)可能会删除另一个客户端(线程)的锁。在上面的脚本中,每个锁都是有一个随机字符串“签名”,所以只有当锁的创建者删除它时才会被删除。

  What should this random string be? I assume it’s 20 bytes from /dev/urandom, but you can find cheaper ways to make it unique enough for your tasks. For example a safe pick is to seed RC4 with /dev/urandom, and generate a pseudo random stream from that. A simpler solution is to use a combination of unix time with microseconds resolution, concatenating it with a client ID, it is not as safe, but probably up to the task in most environments.

  这个随机字符串该是什么呢?我假设它是来自/dev/urandom的20个字节,但是你可以找到更简单的方法使这个字符串在你的任务中足够独一无二。例如:一种安全可行的方法是使用/dev/urandom作为RC4的种子,产生一个伪随机流。一个更简单的方法就是取unix的当前系统时间,转换成毫秒形式,然后连接客户端ID作为唯一随机字符串,虽然这种方式不是很安全,但是能满足大部分需求了。

  The time we use as the key time to live, is called the “lock validity time”. It is both the auto release time, and the time the client has in order to perform the operation required before another client may be able to acquire the lock again, without technically violating the mutual exclusion guarantee, which is only limited to a given window of time from the moment the lock is acquired.

  我们使用的时间作为锁的存在时间,称为“锁的有效时间”。它既是锁的自动释放时间,也是客户端为了另一个客户端可以再次获得锁之前执行所需操作的所需时间,没有违反互斥保证,而这种保障只是利用一个时间的窗口期就实现了。

  So now we have a good way to acquire and release the lock. The system, reasoning about a non-distributed system composed of a single, always available, instance, is safe. Let’s extend the concept to a distributed system where we don’t have such guarantees.

  所以现在我们有了一个获取和释放锁的好方法。对非分布式系统而言是安全的,可用的。让我们将这个概念扩展到一个分布式系统,在这个系统中我们没有这样的保证。

The Redlock algorithm

Redlock算法

  In the distributed version of the algorithm we assume we have N Redis masters. Those nodes are totally independent, so we don’t use replication or any other implicit coordination system. We already described how to acquire and release the lock safely in a single instance. We take for granted that the algorithm will use this method to acquire and release the lock in a single instance. In our examples we set N=5, which is a reasonable value, so we need to run 5 Redis masters on different computers or virtual machines in order to ensure that they’ll fail in a mostly independent way.

  在这个算法的分布式版本中,我们假设有N个redis主节点,这些节点完全独立,没有使用任何备份或者含蓄的协同系统。我们已经描述了如何在单实例中安全获得和释放锁。我们理所应当的认为这个算法在单实例中也使用这种方法获得和释放锁。在我们的例子中我们设N=5,这是一个合理值,所以我们需要跑5个redis主节点在不同的计算机或者虚拟机,确保他们以独立的方式失败。

In order to acquire the lock, the client performs the following operations:

  1. It gets the current time in milliseconds.
  2. It tries to acquire the lock in all the N instances sequentially, using the same key name and random value in all the instances. During step 2, when setting the lock in each instance, the client uses a timeout which is small compared to the total lock auto-release time in order to acquire it. For example if the auto-release time is 10 seconds, the timeout could be in the ~ 5-50 milliseconds range. This prevents the client from remaining blocked for a long time trying to talk with a Redis node which is down: if an instance is not available, we should try to talk with the next instance ASAP.
  3. The client computes how much time elapsed in order to acquire the lock, by subtracting from the current time the timestamp obtained in step 1. If and only if the client was able to acquire the lock in the majority of the instances (at least 3), and the total time elapsed to acquire the lock is less than lock validity time, the lock is considered to be acquired.
  4. If the lock was acquired, its validity time is considered to be the initial validity time minus the time elapsed, as computed in step 3.
  5. If the client failed to acquire the lock for some reason (either it was not able to lock N/2+1 instances or the validity time is negative), it will try to unlock all the instances (even the instances it believed it was not able to lock).

  为了获得锁,客户端需要执行以下操作:

  1. 以毫秒为单位获取当前时间。
  2. 试图在所有N个实例中使用相同的key名和随机值依次获取锁。在第二步中,当在每个实例中都设置锁,客户端使用比总的锁自动释放时间要小的超时时间以便获取锁。例如如果锁的自动释放时间为10秒,这个超时时间可以为5~50毫秒,这可以防止客户机在试图与关闭的Redis节点通信时间处于阻塞状态:如果一个实例不可用,我们应该尽快尝试与下一个实例通信。
  3. 客户端通过从当前时间减去步骤1中获得的时间戳来计算获取锁所花费的时间。当且仅当客户端能够在大多数实例中获取锁时(至少3),且获取锁所花费的总时间小于锁的有效时间,则认为该锁已被获取。
  4. 如果获取了锁,则其有效时间被认为是初始有效时间减去经过的时间,如步骤3中计算的那样。
  5. 如果客户端由于某种原因未能获得锁(要么无法锁定N/2+1实例,要么有效时间为负),它将尝试解锁所有实例(甚至是它认为无法锁定的实例)。

Is the algorithm asynchronous?

这个算法是异步的吗?

  The algorithm relies on the assumption that while there is no synchronized clock across the processes, still the local time in every process flows approximately at the same rate, with an error which is small compared to the auto-release time of the lock. This assumption closely resembles a real-world computer: every computer has a local clock and we can usually rely on different computers to have a clock drift which is small.

  该算法基于以下假设:虽然没有跨进程的同步时钟,但每个进程中的本地时间仍然以近似相同的速率流动,与锁的自动释放时间相比,误差很小。这个假设与现实世界中的计算机非常相似:每台计算机都有一个本地时钟,我们通常可以依赖不同的计算机来产生一个很小的时间差异。

  At this point we need to better specify our mutual exclusion rule: it is guaranteed only as long as the client holding the lock will terminate its work within the lock validity time (as obtained in step 3), minus some time (just a few milliseconds in order to compensate for clock drift between processes).

  在这一点上我们需要更好的详细说明互斥规则:这是保证只要客户持有锁,将终止锁有效期内的工作(在步骤3中获得),减去一段时间(几毫秒之间为了弥补进程间的时间差异过程)。

  For more information about similar systems requiring a bound clock drift, this paper is an interesting reference: Leases: an efficient fault-tolerant mechanism for distributed file cache consistency.

  对于相似的系统要求一个绑定的时间差异,这篇文章是一个有趣的参考:一种对于分布式文件缓存一致性有效的容错机制

Retry on failure

失败重试

  When a client is unable to acquire the lock, it should try again after a random delay in order to try to desynchronize multiple clients trying to acquire the lock for the same resource at the same time (this may result in a split brain condition where nobody wins). Also the faster a client tries to acquire the lock in the majority of Redis instances, the smaller the window for a split brain condition (and the need for a retry), so ideally the client should try to send the SET commands to the N instances at the same time using multiplexing.

  当一个客户端无法获取锁时,它应该在一个随机延迟之后再次尝试,
这是为了避免多个客户端同时去获取某个锁(这可能会导致没有人赢的大脑分裂)。而且,客户机在大多数redis实例中获取锁的速度越快,脑裂的窗口期(和重试需求)就越小,因此,理想情况下,客户机应该尝试使用多路复用同时向N个实例发送SET命令。

  It is worth stressing how important it is for clients that fail to acquire the majority of locks, to release the (partially) acquired locks ASAP, so that there is no need to wait for key expiry in order for the lock to be acquired again (however if a network partition happens and the client is no longer able to communicate with the Redis instances, there is an availability penalty to pay as it waits for key expiration).

  值得强调的是,对于未能获得大多数锁的客户端来说,尽快释放(部分)获得的锁是多么重要,这样就不需要为了再次获得锁等待锁过期(但是,如果发生了网络异常,并且客户端不再能够与Redis实例通信,那么在等待key过期时就需要付出可用性代价)。

Releasing the lock

释放锁

  Releasing the lock is simple and involves just releasing the lock in all instances, whether or not the client believes it was able to successfully lock a given instance.

  释放锁很简单,只是在所有实例中释放锁,不管客户端是否能够成功锁定给定的实例。

Safety arguments

安全性论证

  Is the algorithm safe? We can try to understand what happens in different scenarios.

  这个算法安全吗?我们可以试着理解在不同的情况下会发生什么。

  To start let’s assume that a client is able to acquire the lock in the majority of instances. All the instances will contain a key with the same time to live. However, the key was set at different times, so the keys will also expire at different times. But if the first key was set at worst at time T1 (the time we sample before contacting the first server) and the last key was set at worst at time T2 (the time we obtained the reply from the last server), we are sure that the first key to expire in the set will exist for at least MIN_VALIDITY=TTL-(T2-T1)-CLOCK_DRIFT. All the other keys will expire later, so we are sure that the keys will be simultaneously set for at least this time.

  我们先假设一个客户端可以获得大多数实例的锁。所有的实例包含相同有效时间的key。但是,key是在不同的时间设置的,因此key也会在不同的时间过期。但是,如果第一个key被设置的时间点在最坏的情况下为T1(我们在连接第一个服务器之前取样的时间),最后一个key被设置的时间点最坏的情况为T2(这个时间点为我们获取最后一个key的返回结果的时间点)。我们确定第一个过期的key将存活最少MIN_VALIDITY=TTL-(T2-T1)-CLOCK_DRIFT。其余的key都会在之后过期,所以我们可以确定所有key的有效期最少为MIN_VALIDITY。
redis的分布式锁_第1张图片

  During the time that the majority of keys are set, another client will not be able to acquire the lock, since N/2+1 SET NX operations can’t succeed if N/2+1 keys already exist. So if a lock was acquired, it is not possible to re-acquire it at the same time (violating the mutual exclusion property).

  在一个客户端设置了大部分key期间,另一个客户端将无法获得锁,因为如果已经存在N/2+1个key,那么N/2+1 SET NX操作将无法成功。因此,如果一个锁被获取了,其他客户端就不可能同时再获取它(违反互斥属性)。

  However we want to also make sure that multiple clients trying to acquire the lock at the same time can’t simultaneously succeed.

  但是,我们还希望确保同时获取锁的多个客户端不能同时成功。

  If a client locked the majority of instances using a time near, or greater, than the lock maximum validity time (the TTL we use for SET basically), it will consider the lock invalid and will unlock the instances, so we only need to consider the case where a client was able to lock the majority of instances in a time which is less than the validity time. In this case for the argument already expressed above, for MIN_VALIDITY no client should be able to re-acquire the lock. So multiple clients will be able to lock N/2+1 instances at the same time (with “time” being the end of Step 2) only when the time to lock the majority was greater than the TTL time, making the lock invalid.

  如果一个客户端获得多数实例的锁的时间接近或者超过锁的最大有效时间(我们设置的TTL),我们将认为这个锁失效并且释放被锁的实例,所以我们只需要考虑客户端怎么才能够在很短的时间内成功设置key就可以了,这个时间要远远短于锁的生命周期时间。在这种情况下,对于上面已经表达的论点,在MIN_VALIDITY,任何客户端都不能够再次获得锁。因此,只有当锁定大多数实例的时间大于锁的生命周期时间时,这种情况下获取的锁已经是无效,多个客户端才能锁定N/2+1个实例(步骤2中的时间)。

  Are you able to provide a formal proof of safety, point to existing algorithms that are similar, or find a bug? That would be greatly appreciated.

  您是否能够提供安全论证的正式证明,指出现有的类似算法,或者发现错误?那将非常感谢。

Liveness arguments

活性论证

The system liveness is based on three main features:

  1. The auto release of the lock (since keys expire): eventually keys are available again to be locked.
  2. The fact that clients, usually, will cooperate removing the locks when the lock was not acquired, or when the lock was acquired and the work terminated, making it likely that we don’t have to wait for keys to expire to re-acquire the lock.
  3. The fact that when a client needs to retry a lock, it waits a time which is comparably greater than the time needed to acquire the majority of locks, in order to probabilistically make split brain conditions during resource contention unlikely.

系统的活性基于三个主要特征:

  1. 锁的自动释放(因为key过期):锁最终可以重用。
  2. 事实是,客户端通常会在获得锁失败的情况下删除锁,或者在获得锁并工作完成的情况下删除锁,这使得我们不必等待key过期才能重新获得锁。
  3. 事实是,当客户端需要重试一个锁时,它等待的时间要比获取大多数锁所需的时间长得多,以便在资源争用期间不太可能出现脑裂的情况。

  However, we pay an availability penalty equal to TTL time on network partitions, so if there are continuous partitions, we can pay this penalty indefinitely. This happens every time a client acquires a lock and gets partitioned away before being able to remove the lock.

  但是,我们在网络断连上付出的可用代价等于TTL(生命周期)时间,所以如果有持续的网络失联,我们将无限期地付出这种代价。每次客户端获取锁并在能够删除锁之前网络失联,都会发生这种情况。

  Basically if there are infinite continuous network partitions, the system may become not available for an infinite amount of time.

  如果客户端无限的持续网络断连,那么我们的系统基本上就不能用了。

Performance, crash-recovery and fsync

性能、崩溃恢复、文件同步

  Many users using Redis as a lock server need high performance in terms of both latency to acquire and release a lock, and number of acquire / release operations that it is possible to perform per second. In order to meet this requirement, the strategy to talk with the N Redis servers to reduce latency is definitely multiplexing (or poor man’s multiplexing, which is, putting the socket in non-blocking mode, send all the commands, and read all the commands later, assuming that the RTT between the client and each instance is similar).

  很多用户将redis作为锁服务器的时候在获取和释放锁的延迟以及每秒可能执行的获取/释放操作数量方面都需要高性能。为了达到这个要求,可以利用多路并发的方式同时向N台redis服务发送请求以便减少时间(或者简化的多路并发,假设客户端和每个redis实例之间的RTT访问往返时间相近,将所有连接调整为非阻塞状态,同时将所有命令都发送出去,过一段时间后再去读取结果)。

  However there is another consideration to do about persistence if we want to target a crash-recovery system model.

  然而,如果我们想要实现崩溃恢复模型,还需要考虑redis持久化问题。

  Basically to see the problem here, let’s assume we configure Redis without persistence at all. A client acquires the lock in 3 of 5 instances. One of the instances where the client was able to acquire the lock is restarted, at this point there are again 3 instances that we can lock for the same resource, and another client can lock it again, violating the safety property of exclusivity of lock.

  为了看到这里的问题,让我们假设我们配置Redis时根本没有配置持久化。一个客户端获得了5个实例中3个实例的锁,3个实例中的一个实例被重启,重启完成后,此时,我们还可以为相同的资源锁定3个实例,另一个客户端可以再次锁定它,这违反了锁的独占性的安全属性。

  If we enable AOF persistence, things will improve quite a bit. For example we can upgrade a server by sending SHUTDOWN and restarting it. Because Redis expires are semantically implemented so that virtually the time still elapses when the server is off, all our requirements are fine. However everything is fine as long as it is a clean shutdown. What about a power outage? If Redis is configured, as by default, to fsync on disk every second, it is possible that after a restart our key is missing. In theory, if we want to guarantee the lock safety in the face of any kind of instance restart, we need to enable fsync=always in the persistence setting. This in turn will totally ruin performances to the same level of CP systems that are traditionally used to implement distributed locks in a safe way.

  如果我们启用AOF持久化,事情将会有一点改进。例如,我们可以升级服务器发送关闭和重新启动服务器请求。因为redis的关闭只是表面的关闭,实际上服务器关闭时时间还是在流逝,所以我们的所有需求都没有问题。停电怎么办?如果redis被配置为(默认情况下)每秒在磁盘上进行文件同步,重启之后可能丢失key。如果我们想要在面对任何类型的实例重启时保证锁的安全性,我们就需要把fsync设置为fsync=always,这样只要我们一设置key,就会立刻持久化。这反过来又会将性能完全破坏到与传统上安全实现分布式锁的CP系统相同的级别。

  However things are better than what they look like at a first glance. Basically the algorithm safety is retained as long as when an instance restarts after a crash, it no longer participates to any currently active lock, so that the set of currently active locks when the instance restarts, were all obtained by locking instances other than the one which is rejoining the system.

  然而,事情比第一眼看上去的要好。基本上,只要实例在崩溃后重新启动,算法的安全性就会保持不变,它不再参与任何当前活动的锁,因此当实例重新启动时,当前活动的锁都是通过锁定实例而不是重新加入系统的实例来获得的。

  To guarantee this we just need to make an instance, after a crash, unavailable for at least a bit more than the max TTL we use, which is, the time needed for all the keys about the locks that existed when the instance crashed, to become invalid and be automatically released.

  为了保证这一点,我们只需要在实例崩溃后做一个例子,不可用的时间至少比我们使用的最大TTL(有效时间)长一点,即实例崩溃时存在的锁的所有key变为无效并自动释放所需的时间。

  Using delayed restarts it is basically possible to achieve safety even without any kind of Redis persistence available, however note that this may translate into an availability penalty. For example if a majority of instances crash, the system will become globally unavailable for TTL (here globally means that no resource at all will be lockable during this time).

  使用延迟重启基本上是有可能实现安全性,即使没有任何类型的redis持久化,但是请注意,这可能会导致可用性损失。例如,如果大多数实例崩溃,这个系统将在有效时间全局不可用(这里的全局意味着在此期间根本没有任何资源是可锁定的)。

Making the algorithm more reliable: Extending the lock

让这个锁更具可靠性:扩展锁

  If the work performed by clients is composed of small steps, it is possible to use smaller lock validity times by default, and extend the algorithm implementing a lock extension mechanism. Basically the client, if in the middle of the computation while the lock validity is approaching a low value, may extend the lock by sending a Lua script to all the instances that extends the TTL of the key if the key exists and its value is still the random value the client assigned when the lock was acquired.

  如果客户端执行的工作由一些小步骤组成,那么默认情况下可以使用较小的锁有效性时间,并扩展实现锁扩展机制的算法。如果客户端在执行的过程中,占有的锁即将过期,我们可以写一段命令脚本,发送到所有的redis服务器上来延长锁的生命周期,脚本逻辑类似于:key存在的话,key对应的value值是否为获取锁时设定的随机数,如果相同,则重新设定key的过期时间,类似于重新获取了锁。

  The client should only consider the lock re-acquired if it was able to extend the lock into the majority of instances, and within the validity time (basically the algorithm to use is very similar to the one used when acquiring the lock).

  这个客户端仅仅考虑了能够在大多数实例中延长锁的生命周期并且在有效时间内重新获得锁。

  However this does not technically change the algorithm, so the maximum number of lock reacquisition attempts should be limited, otherwise one of the liveness properties is violated.

  但是,这在技术上并没有改变算法,因此应该限制锁的重新获取尝试的最大请求数,否则就违反了活性特性。

原文地址:https://redis.io/topics/distlock

你可能感兴趣的:(Redis)