分布式锁是在大型高并发场景下非常常用的同步手段,它主要解决的是不同独立系统之间的代码同步问题。Redisson是一个高级的分布式协调Redis客服端,其优秀的API设计让java应用开发人员可以非常轻松的实现一把高新能分布式锁。
下面以一个非常常见的商品秒杀场景为例,结合redisson(3.8.1)源码,分析一下分布式锁的实现原理。
在秒杀场景中,因为多应用实例的存在,商品扣减库存的操作必须是一个原子操作,否侧一旦并发量上去,就会出现超卖的现象。所以需要在扣减库存前加分布式锁,让多个扣减库存操作排队执行。
先分析一下这把分布式锁需要关注的细节:
@Bean
public Redisson redisson() {
Config config = new Config();
config.useSingleServer().setAddress("redis://localhost:6379").setDatabase(0);
return (Redisson) Redisson.create(config);
}
扣减库存:
String lockKey = "lock:product01";
//获取锁对象
RLock lock = redisson.getLock(lockKey);
//加分布式锁
redissonLock.lock();
try {
//获取库存
int stock = Integer.parseInt(redisTemplate.opsForValue().get("stock"));
if (stock > 0) {
//扣减库存
redisTemplate.opsForValue().set("stock", stock - 1 + "");
} else {
System.out.println("库存不足");
}
} finally {
//解锁
redissonLock.unlock();
}
可以看到,上面的代码非常简单,非常简洁的两个API(lock,unlock),就实现了一个比较完善的分布式锁。其实Redisson内部做了非常复杂的封装,使得上层API功能非常强大。下面进入源码:
public void lock() {
try {
lock(-1, null, false);
} catch (InterruptedException e) {
throw new IllegalStateException();
}
}
private void lock(long leaseTime, TimeUnit unit, boolean interruptibly) throws InterruptedException {
//拿到当前线程id
long threadId = Thread.currentThread().getId();
//尝试加锁,如果成功则返回null,否则返回过期时间
Long ttl = tryAcquire(-1, leaseTime, unit, threadId);
// lock acquired
if (ttl == null) {
//加索成功直接返回
return;
}
//到这里说明加锁失败,使用redis的发布/订阅机制,订阅一个channel,用于后续释放锁后的唤醒操作
CompletableFuture<RedissonLockEntry> future = subscribe(threadId);
pubSub.timeout(future);
RedissonLockEntry entry;
if (interruptibly) {
entry = commandExecutor.getInterrupted(future);
} else {
entry = commandExecutor.get(future);
}
try {
//这里的死循环会不断尝试加锁
while (true) {
//再次加锁
ttl = tryAcquire(-1, leaseTime, unit, threadId);
// lock acquired
if (ttl == null) {
break;
}
// waiting for message
if (ttl >= 0) {
//如果返回了过期时间
try {
/*
这里的entry.getLatch()会拿到一个信号量Semaphore,并获取许可,超时时间就是锁的过期时间
这里的Semaphore其实是之前加锁失败订阅channel的时候创建的
当一个线程释放锁时,redisson会根据订阅关系唤醒在这个Semaphore上阻塞的其中一个线程(释放一个许可)
*/
entry.getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
if (interruptibly) {
throw e;
}
entry.getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
}
} else {
if (interruptibly) {
entry.getLatch().acquire();
} else {
entry.getLatch().acquireUninterruptibly();
}
}
}
} finally {
unsubscribe(entry, threadId);
}
// get(lockAsync(leaseTime, unit));
}
主流程就是这一个方法,其中有很多细节的逻辑先略去,看几个主要方法:
private Long tryAcquire(long waitTime, long leaseTime, TimeUnit unit, long threadId) {
return get(tryAcquireAsync(waitTime, leaseTime, unit, threadId));
}
private <T> RFuture<Long> tryAcquireAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId) {
RFuture<Long> ttlRemainingFuture;
if (leaseTime > 0) {
ttlRemainingFuture = tryLockInnerAsync(waitTime, leaseTime, unit, threadId, RedisCommands.EVAL_LONG);
} else {
//直接看租期为-1的分支,如果加锁成功,redisson会为这把锁默认设置30租期
//并在占有锁达到10秒时重新续期30秒
//这个方法是一个异步方法,会返回一个Future
ttlRemainingFuture = tryLockInnerAsync(waitTime, internalLockLeaseTime,
TimeUnit.MILLISECONDS, threadId, RedisCommands.EVAL_LONG);
}
CompletionStage<Long> f = ttlRemainingFuture.thenApply(ttlRemaining -> {
// lock acquired
//ttlRemaining为null则说明加锁成功
if (ttlRemaining == null) {
if (leaseTime > 0) {
internalLockLeaseTime = unit.toMillis(leaseTime);
} else {
//锁续期逻辑
scheduleExpirationRenewal(threadId);
}
}
return ttlRemaining;
});
return new CompletableFutureWrapper<>(f);
}
继续看下加锁逻辑:tryLockInnerAsync,其实本质是一段lua脚本:
<T> RFuture<T> tryLockInnerAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand<T> command) {
return evalWriteAsync(getRawName(), LongCodec.INSTANCE, command,
"if (redis.call('exists', KEYS[1]) == 0) then " + //判断锁是否存在
"redis.call('hset', KEYS[1], ARGV[2], 1); " + //创建一个hash表示锁,并设置key,value为当前线程id和1
"redis.call('pexpire', KEYS[1], ARGV[1]); " + //设置过期时间,默认30s
"return nil; " +
"end; " +
"if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " + //如果锁存在
"redis.call('hincrby', KEYS[1], ARGV[2], 1); " + //重入逻辑,value+1
"redis.call('pexpire', KEYS[1], ARGV[1]); " + //重新设置过期时间
"return nil; " +
"end; " +
"return redis.call('pttl', KEYS[1]);", //如果加锁失败则返回过期时间
Collections.singletonList(getRawName()), unit.toMillis(leaseTime), getLockName(threadId));
}
这段lua脚本中的key为锁的名称,参数ARGV为过期时间和锁的持有者,可以看到redisson使用了hash而不是简单的String,主要是为了实现可重入逻辑。
下面看下锁续期的逻辑:
private void renewExpiration() {
ExpirationEntry ee = EXPIRATION_RENEWAL_MAP.get(getEntryName());
if (ee == null) {
return;
}
//创建一个延时任务,延迟internalLockLeaseTime / 3,也就是10秒执行
//这个定时任务的底层实现为netty的HashedWheelTimer
Timeout task = commandExecutor.getConnectionManager().newTimeout(new TimerTask() {
@Override
public void run(Timeout timeout) throws Exception {
ExpirationEntry ent = EXPIRATION_RENEWAL_MAP.get(getEntryName());
if (ent == null) {
return;
}
Long threadId = ent.getFirstThreadId();
if (threadId == null) {
return;
}
//锁续期操作,底层也是一个lua脚本,会重新设置过期时间为30s
CompletionStage<Boolean> future = renewExpirationAsync(threadId);
future.whenComplete((res, e) -> {
if (e != null) {
log.error("Can't update lock " + getRawName() + " expiration", e);
EXPIRATION_RENEWAL_MAP.remove(getEntryName());
return;
}
if (res) {
// reschedule itself
//续期成功后,递归调用自己renewExpiration(),进行下一次续期操作
renewExpiration();
} else {
cancelExpirationRenewal(null);
}
});
}
}, internalLockLeaseTime / 3, TimeUnit.MILLISECONDS);
ee.setTimeout(task);
}
值得注意的是,其中续期用的定时任务是netty提供的哈希环定时器,很多中间件都用到了这种一个线程管理多个任务的高性能定时器,比如dubbo。
ok,接下来看解锁逻辑:
public RFuture<Void> unlockAsync(long threadId) {
//解锁
RFuture<Boolean> future = unlockInnerAsync(threadId);
CompletionStage<Void> f = future.handle((opStatus, e) -> {
//解锁后关闭续期任务
cancelExpirationRenewal(threadId);
if (e != null) {
throw new CompletionException(e);
}
if (opStatus == null) {
IllegalMonitorStateException cause = new IllegalMonitorStateException("attempt to unlock lock, not locked by current thread by node id: "
+ id + " thread-id: " + threadId);
throw new CompletionException(cause);
}
return null;
});
return new CompletableFutureWrapper<>(f);
}
protected RFuture<Boolean> unlockInnerAsync(long threadId) {
return evalWriteAsync(getRawName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,
"if (redis.call('hexists', KEYS[1], ARGV[3]) == 0) then " + //锁不存在则直接返回null
"return nil;" +
"end; " +
"local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1); " + //value-1
"if (counter > 0) then " + //如果value>0则说明之前重入过了
"redis.call('pexpire', KEYS[1], ARGV[2]); " + //重新设置过期时间
"return 0; " +
"else " +
"redis.call('del', KEYS[1]); " + //释放锁,删除hash
"redis.call('publish', KEYS[2], ARGV[1]); " + //发布释放锁的信息到’redisson_lock__channel+锁名称‘这个channel
"return 1; " +
"end; " +
"return nil;",
Arrays.asList(getRawName(), getChannelName()), LockPubSub.UNLOCK_MESSAGE, internalLockLeaseTime, getLockName(threadId));
}
释放锁的逻辑依然是一段lua脚本,并且在释放锁后,会发布释放锁的信息到channel,redis发布订阅机制会通知所有订阅这个channel的客户端,如此一来客户端便可立即去竞争锁,再回到发布订阅的那段逻辑:
protected CompletableFuture<RedissonLockEntry> subscribe(long threadId) {
return pubSub.subscribe(getEntryName(), getChannelName());
}
public CompletableFuture<E> subscribe(String entryName, String channelName) {
AsyncSemaphore semaphore = service.getSemaphore(new ChannelName(channelName));
CompletableFuture<E> newPromise = new CompletableFuture<>();
semaphore.acquire().thenAccept(c -> {
if (newPromise.isDone()) {
semaphore.release();
return;
}
E entry = entries.get(entryName);
if (entry != null) {
entry.acquire();
semaphore.release();
entry.getPromise().whenComplete((r, e) -> {
if (e != null) {
newPromise.completeExceptionally(e);
return;
}
newPromise.complete(r);
});
return;
}
E value = createEntry(newPromise);
value.acquire();
E oldValue = entries.putIfAbsent(entryName, value);
if (oldValue != null) {
oldValue.acquire();
semaphore.release();
oldValue.getPromise().whenComplete((r, e) -> {
if (e != null) {
newPromise.completeExceptionally(e);
return;
}
newPromise.complete(r);
});
return;
}
//创建监听器
RedisPubSubListener<Object> listener = createListener(channelName, value);
CompletableFuture<PubSubConnectionEntry> s = service.subscribeNoTimeout(LongCodec.INSTANCE, channelName, semaphore, listener);
newPromise.whenComplete((r, e) -> {
if (e != null) {
s.completeExceptionally(e);
}
});
s.whenComplete((r, e) -> {
if (e != null) {
entries.remove(entryName);
value.getPromise().completeExceptionally(e);
return;
}
value.getPromise().complete(value);
});
});
return newPromise;
}
为channel创建监听器,当redis发布释放锁的信号到channel时,监听器的onMessage方法被执行:
public void onMessage(CharSequence channel, Object message) {
if (!channelName.equals(channel.toString())) {
return;
}
PublishSubscribe.this.onMessage(value, (Long) message);
}
protected void onMessage(RedissonLockEntry value, Long message) {
if (message.equals(UNLOCK_MESSAGE)) {
Runnable runnableToExecute = value.getListeners().poll();
if (runnableToExecute != null) {
runnableToExecute.run();
}
//如果是释放锁的信息,则释放对于Semaphore上的许可
value.getLatch().release();
} else if (message.equals(READ_UNLOCK_MESSAGE)) {
while (true) {
Runnable runnableToExecute = value.getListeners().poll();
if (runnableToExecute == null) {
break;
}
runnableToExecute.run();
}
value.getLatch().release(value.getLatch().getQueueLength());
}
}
这和之前的加锁失败阻塞的逻辑就连起来了。当然这段分布式锁的逻辑还是有点东西的,如果要细看的话还是相当复杂的,笔者也只是看了个大概,还有很多细节可以扣。
redisson为我们实现了一个相当完善的分布式锁,它的底层逻辑是几短lua脚本,lua脚本使得redis命令可以在redis中原子执行;并且这把锁使用的数据结构为hash,主要是为了可重入的逻辑;还有一个精髓点是:锁续期的逻辑,解决了高并发场景分布式锁下锁过期时间难以确定的难题。当然自己写出如此完备的分布式锁是很难的,其中的思想值得我们借鉴。