系统中存有用户的虚拟资金,用户可在金额到达一定的额度后提起提现申请,今日运营反馈部分用户在使用APP提现时页面空白
前期分析认为是APP端处理异常,或部分手机不兼容,深入分析后发现是APP端无法处理未知错误信息导致页面空白,根本的原因是用户发起提现申请时,系统处理资金扣除的过程中抛了异常,无法正常锁定用户钱包的金额。
通过检查日志,最终确认在钱包服务中进行金额扣除时,使用了redis进行分布式锁,并使用spring-redis进行的操作,在执行tryLock时,抛出异常
错误代码:
Lock lock = registry.obtain("lock-" + transferId);
if (lock.tryLock()) {
....
}else{
throw new Exception();
}
正确的代码写法:
Lock lock = registry.obtain("lock-" + transferId);
if (lock.tryLock()) {
try{
....
}finally{
lock.unlock();
}
}else{
throw new Exception();
}
为何只有部分用户会出现锁定失败的现象?而部分用户又可以成功?还有部分用户在锁定失败几次后,再发起又成功了?
spring-redis中操作redis的单位为RedisLockRegistry
public Lock obtain(Object lockKey) {
Assert.isInstanceOf(String.class, lockKey);
//try to find the lock within hard references
RedisLock lock = findLock(this.hardThreadLocks.get(), lockKey);
/*
* If the lock is locked, check that it matches what's in the store.
* If it doesn't, the lock must have expired.
*/
if (lock != null && lock.thread != null) {
...
}
if (lock == null) {
//try to find the lock within weak references
lock = findLock(this.weakThreadLocks.get(), lockKey);
if (lock == null) {
lock = new RedisLock((String) lockKey);
...
}
}
return lock;
}
最终返回的是一个RedisLock对象
既然由1得知lock的实际对象是RedisLock,那么我们继续分析RedisLock.tryLock做了什么:
public boolean tryLock() {
Lock localLock = RedisLockRegistry.this.localRegistry.obtain(this.lockKey);
try {
if (!localLock.tryLock()) {
return false;
}
boolean obtainedLock = this.obtainLock();
if (!obtainedLock) {
localLock.unlock();
}
return obtainedLock;
}
catch (Exception e) {
localLock.unlock();
rethrowAsLockException(e);
}
return false;
}
逐行分析:
这里的localLock又是什么?
这里的Lock由localRegistry.obtain获取,那localRegistry又是什么?由源码分析:
public RedisLockRegistry(RedisConnectionFactory connectionFactory, String registryKey, long expireAfter) {
this(connectionFactory, registryKey, expireAfter, new DefaultLockRegistry());
}
localRegistry为DefaultLockRegistry,那么我们需要分析的即为DefaultLockRegistry.obtain做了什么:
public final class DefaultLockRegistry implements LockRegistry {
private final Lock[] lockTable;
private final int mask;
public DefaultLockRegistry() {
this(0xFF);
}
//由这段可知,默认初始化时,lockTable被完全初始化,对象为ReentrantLock,数量为0xFF个,即256个
public DefaultLockRegistry(int mask) {
String bits = Integer.toBinaryString(mask);
Assert.isTrue(bits.length() < 32 && (mask == 0 || bits.lastIndexOf('0') < bits.indexOf('1')), "Mask must be a power of 2 - 1");
this.mask = mask;
int arraySize = this.mask + 1;
this.lockTable = new ReentrantLock[arraySize];
for (int i = 0; i < arraySize; i++) {
this.lockTable[i] = new ReentrantLock();
}
}
//由这段可知,所有的锁key都会被转换为0-255中的其中一个值,并返回一个ReentrantLock
public Lock obtain(Object lockKey) {
Assert.notNull(lockKey, "'lockKey' must not be null");
Integer lockIndex = lockKey.hashCode() & this.mask;
return this.lockTable[lockIndex];
}
}
分析得知localLock最终会被赋值为一个ReentrantLock
public class ReentrantLock implements Lock, java.io.Serializable {
//默认情况为非公平锁
private final Sync sync;
abstract static class Sync extends AbstractQueuedSynchronizer {
...
final boolean nonfairTryAcquire(int acquires) {
//当前线程
final Thread current = Thread.currentThread();
//获取状态,正常情况下为0
int c = getState();
if (c == 0) {
//CAS将锁置1
if (compareAndSetState(0, acquires)) {
//设置当前线程独占
setExclusiveOwnerThread(current);
return true;
}
}
//此次问题出现原因的核心点:
//当锁未被释放,第二次尝试获取锁时,此处会判定当前线程是否为独占线程
//如果刚好当前线程就是原来的独占线程,则用户操作可继续
//否则将返回false,即锁失败,不会再继续执行分布式锁
else if (current == getExclusiveOwnerThread()) {
int nextc = c + acquires;
if (nextc < 0) // overflow
throw new Error("Maximum lock count exceeded");
setState(nextc);
return true;
}
return false;
}
}
...
public boolean tryLock() {
return sync.nonfairTryAcquire(1);
}
...
}
*核心问题原因描述在源码中给出
原本我们猜测问题的原因是redis分布式锁存在问题,而经过此次分析,判定问题的根本原因其实是本地系统未释放本地锁导致的,而一个RedisLock默认只能有256个锁,当随着系统的不断运行,最终会导致大量的锁失败异常直至系统重启
所以spring-redis帮我们实现的锁机制其实是先建立在本地锁的基础上的
TIPS:
直接使用spring-redis的分布式锁是不能用来保证insert的幂等性的