一 .前言
某年某月某天,同事说需要一个文件排他锁功能,需求如下:
(1)写操作是排他属性
(2)适用于同一进程的多线程/也适用于多进程的排他操作
(3)容错性:获得锁的进程若Crash,不影响到后续进程的正常获取锁
二 .解决方案
1. 最初的构想
在Java领域,同进程的多线程排他实现还是较简易的。比如使用线程同步变量标示是否已锁状态便可。但不同进程的排他实现就比较繁琐。使用已有API,自然想到 java.nio.channels.FileLock:如下
/**
* @param file
* @param strToWrite
* @param append
* @param lockTime 以毫秒为单位,该值只是方便模拟排他锁时使用,-1表示不考虑该字段
* @return
*/
public static boolean lockAndWrite(File file, String strToWrite, boolean append,int lockTime){
if(!file.exists()){
return false;
}
RandomAccessFile fis = null;
FileChannel fileChannel = null;
FileLock fl = null;
long tsBegin = System.currentTimeMillis();
try {
fis = new RandomAccessFile(file, "rw");
fileChannel = fis.getChannel();
fl = fileChannel.tryLock();
if(fl == null || !fl.isValid()){
return false;
}
log.info("threadId = {} lock success", Thread.currentThread());
// if append
if(append){
long length = fis.length();
fis.seek(length);
fis.writeUTF(strToWrite);
//if not, clear the content , then write
}else{
fis.setLength(0);
fis.writeUTF(strToWrite);
}
long tsEnd = System.currentTimeMillis();
long totalCost = (tsEnd - tsBegin);
if(totalCost < lockTime){
Thread.sleep(lockTime - totalCost);
}
} catch (Exception e) {
log.error("RandomAccessFile error",e);
return false;
}finally{
if(fl != null){
try {
fl.release();
} catch (IOException e) {
e.printStackTrace();
}
}
if(fileChannel != null){
try {
fileChannel.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if(fis != null){
try {
fis.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return true;
}
(1)同一进程,两个线程同时争夺锁,暂定命名为测试程序A,期待结果:有一线程获取锁失败
(2)执行两个进程,也就是执行两个测试程序A,期待结果:有一进程某线程获得锁,另一线程获取锁失败
public static void main(String[] args) {
new Thread("write-thread-1-lock"){
@Override
public void run() {
FileLockUtils.lockAndWrite(new File("/data/hello.txt"), "write-thread-1-lock" + System.currentTimeMillis(), false, 30 * 1000);}
}.start();
new Thread("write-thread-2-lock"){
@Override
public void run() {
FileLockUtils.lockAndWrite(new File("/data/hello.txt"), "write-thread-2-lock" + System.currentTimeMillis(), false, 30 * 1000);
}
}.start();
}
上面的测试代码在单个进程内可以达到我们的期待。但是同时运行两个进程,在Mac环境(java8) 第二个进程也能正常获取到锁,在Win7(java7)第二个进程则不能获取到锁。为什么?难道TryLock不是排他的?
其实不是TryLock不是排他,而是channel.close 的问题,官方说法:
On some systems, closing a channel releases all locks held by the Java virtual machine on the
underlying file regardless of whether the locks were acquired via that channel or via
another channel open on the same file.It is strongly recommended that, within a program, a unique
channel be used to acquire all locks on any given file.
在经过一段曲折寻找真理的道路后,终于在stackoverflow上找到一个帖子 ,指明了 lucence 的 NativeFSLock,NativeFSLock 也是存在多个进程排他写的需求。笔者参考的是lucence 4.10.4 的NativeFSLock源码,具体可见地址,具体可见obtain 方法,NativeFSLock 的设计思想如下:
(1)每一个锁,都有本地对应的文件。
(2)本地一个static类型线程安全的Set
(3)假设LOCK_HELD 没有对应文件路径,则可对File的channel TryLock。
public synchronized boolean obtain() throws IOException {
if (lock != null) {
// Our instance is already locked:
return false;
}
// Ensure that lockDir exists and is a directory.
if (!lockDir.exists()) {
if (!lockDir.mkdirs())
throw new IOException("Cannot create directory: " + lockDir.getAbsolutePath());
} else if (!lockDir.isDirectory()) {
// TODO: NoSuchDirectoryException instead?
throw new IOException("Found regular file where directory expected: " + lockDir.getAbsolutePath());
}
final String canonicalPath = path.getCanonicalPath();
// Make sure nobody else in-process has this lock held
// already, and, mark it held if not:
// This is a pretty crazy workaround for some documented
// but yet awkward JVM behavior:
//
// On some systems, closing a channel releases all locks held by the
// Java virtual machine on the underlying file
// regardless of whether the locks were acquired via that channel or via
// another channel open on the same file.
// It is strongly recommended that, within a program, a unique channel
// be used to acquire all locks on any given
// file.
//
// This essentially means if we close "A" channel for a given file all
// locks might be released... the odd part
// is that we can't re-obtain the lock in the same JVM but from a
// different process if that happens. Nevertheless
// this is super trappy. See LUCENE-5738
boolean obtained = false;
if (LOCK_HELD.add(canonicalPath)) {
try {
channel = FileChannel.open(path.toPath(), StandardOpenOption.CREATE, StandardOpenOption.WRITE);
try {
lock = channel.tryLock();
obtained = lock != null;
} catch (IOException | OverlappingFileLockException e) {
// At least on OS X, we will sometimes get an
// intermittent "Permission Denied" IOException,
// which seems to simply mean "you failed to get
// the lock". But other IOExceptions could be
// "permanent" (eg, locking is not supported via
// the filesystem). So, we record the failure
// reason here; the timeout obtain (usually the
// one calling us) will use this as "root cause"
// if it fails to get the lock.
failureReason = e;
}
} finally {
if (obtained == false) { // not successful - clear up and move
// out
clearLockHeld(path);
final FileChannel toClose = channel;
channel = null;
closeWhileHandlingException(toClose);
}
}
}
return obtained;
}