每次一上线就发现有时dubbo调用接口超时会报如下的错误
2019-07-23 12:52:50,932 DEBUG [ZkEventThread.java:69] : Delivering event #3600 ZkEvent[Children of /dubbo/com.zoe.base.healthservice.api.mservice.indexmanager.RecIndexEddApi/providers changed sent to com.alibaba.dubbo.remoting.zookeeper.zkclient.ZkclientZookeeperClient$2@1ced40b]
2019-07-23 12:52:50,934 DEBUG [ClientCnxn.java:818] : Reading reply sessionid:0x100010739720373, packet:: clientPath:null serverPath:null finished:false header:: 10372,3 replyHeader:: 10372,42375930,0 request:: '/dubbo/com.zoe.base.healthservice.api.mservice.indexmanager.RecIndexEddApi/providers,T response:: s{22154898,22154898,1551930057480,1551930057480,0,1674,0,0,0,2,42375930}
2019-07-23 12:52:50,936 WARN [AbstractRegistry.java:221] : [DUBBO] Failed to save registry store file, cause: Can not lock the registry cache file C:\Users\Du Wenqing\.dubbo\dubbo-registry-172.16.34.101.cache, ignore and retry later, maybe multi java process use the file, please config: dubbo.registry.file=xxx.properties, dubbo version: 2.5.3, current host: 172.16.36.42
java.io.IOException: Can not lock the registry cache file C:\Users\Du Wenqing\.dubbo\dubbo-registry-172.16.34.101.cache, ignore and retry later, maybe multi java process use the file, please config: dubbo.registry.file=xxx.properties
at com.alibaba.dubbo.registry.support.AbstractRegistry.doSaveProperties(AbstractRegistry.java:193)
at com.alibaba.dubbo.registry.support.AbstractRegistry$SaveProperties.run(AbstractRegistry.java:150)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
跟踪源码发现,在注册中心保存配置的时候
public void doSaveProperties(long version) {
if (version >= this.lastCacheChanged.get()) {
if (this.file != null) {
Properties newProperties = new Properties();
FileInputStream in = null;
try {
if (this.file.exists()) {
in = new FileInputStream(this.file);
newProperties.load(in);
}
} catch (Throwable var69) {
this.logger.warn("Failed to load registry store file, cause: " + var69.getMessage(), var69);
} finally {
if (in != null) {
try {
in.close();
} catch (IOException var64) {
this.logger.warn(var64.getMessage(), var64);
}
}
}
try {
newProperties.putAll(this.properties);
File lockfile = new File(this.file.getAbsolutePath() + ".lock");
if (!lockfile.exists()) {
lockfile.createNewFile();
}
RandomAccessFile raf = new RandomAccessFile(lockfile, "rw");
try {
FileChannel channel = raf.getChannel();
try {
FileLock lock = channel.tryLock();
if (lock == null) {
throw new IOException("Can not lock the registry cache file " + this.file.getAbsolutePath() + ", ignore and retry later, maybe multi java process use the file, please config: dubbo.registry.file=xxx.properties");
}
try {
if (!this.file.exists()) {
this.file.createNewFile();
}
FileOutputStream outputFile = new FileOutputStream(this.file);
try {
newProperties.store(outputFile, "Dubbo Registry Cache");
} finally {
outputFile.close();
}
} finally {
lock.release();
}
} finally {
channel.close();
}
} finally {
raf.close();
}
} catch (Throwable var71) {
if (version < this.lastCacheChanged.get()) {
return;
}
this.registryCacheExecutor.execute(new AbstractRegistry.SaveProperties(this.lastCacheChanged.incrementAndGet()));
this.logger.warn("Failed to save registry store file, cause: " + var71.getMessage(), var71);
}
}
}
}
因为dubbo底层是用netty进行数据通信协议传输的,根据多路复用的思想,对于每一个线程都会为其创建一个channel进行读写。而在分布式系统中,如果有来自不同系统的线程同时对缓存文件进行读写时,就会造成死锁
所以在FileChannel中提供了一个自旋操作tryLock
public FileLock tryLock(long var1, long var3, boolean var5) throws IOException {
this.ensureOpen();
if (var5 && !this.readable) {
throw new NonReadableChannelException();
} else if (!var5 && !this.writable) {
throw new NonWritableChannelException();
} else {
FileLockImpl var6 = new FileLockImpl(this, var1, var3, var5);
FileLockTable var7 = this.fileLockTable();
var7.add(var6);
int var9 = this.threads.add();
FileLockImpl var11;
try {
int var8;
try {
this.ensureOpen();
var8 = this.nd.lock(this.fd, false, var1, var3, var5);
} catch (IOException var15) {
var7.remove(var6);
throw var15;
}
FileLockImpl var10;
if (var8 == -1) {
var7.remove(var6);
var10 = null;
return var10;
}
if (var8 != 1) {
var10 = var6;
return var10;
}
assert var5;
var10 = new FileLockImpl(this, var1, var3, false);
var7.replace(var6, var10);
var11 = var10;
} finally {
this.threads.remove(var9);
}
return var11;
}
}
分为以下几个步骤
- 定义了来自不同Channel的FileLock,并把FileLock的信息用FileLockImpl进行封装,交给FileLockTable进行记录。
- 记录当前FileLock使用的线程数,加入线程队列
- 根据获得锁的线程,把文件描述符指定给对应的锁
- 如果锁对象不存在,就从线程队列中把该线程移除
这个时候如果consumer调用provider,首先会判断是否存在锁对象,如果不在的话,也就不能将文件描述符指定给相应的所对象,进而调用本地缓存,debug如下所示
由于竞争文件锁导致的,那么让服务模块各自缓存自己的cache文件就可以避免这样的问题了。
具体做法是:在provider的xml配置文件中加入