在讲解并发编程知识点时,我们提到应用内实现锁的常用方式包括: Synchronized、Lock等方式, 但是在分布式环境中单应用的锁无法应用到集群环境, 那么在集群环境如何实现锁的功能呢 ? 常见的分布式锁实现方式包含下面方式:
分布式锁实现的思路, 都是借助中间件提供的一致性协议功能来实现锁, 比如: zookeeper根据ZAB协议实现相关的功能,redis根据Raft协议实现相关功能, 本篇文章主要讲解zookeeper内容, 这里我们只讲解zookeeper实现分布式锁的方式, redis实现分布式锁的方式以后补充。
<dependency>
<groupId>org.apache.curatorgroupId>
<artifactId>curator-frameworkartifactId>
<version>4.0.0version>
dependency>
<dependency>
<groupId>org.apache.curatorgroupId>
<artifactId>curator-recipesartifactId>
<version>4.0.0version>
dependency>
<dependency>
<groupId>org.apache.zookeepergroupId>
<artifactId>zookeeperartifactId>
<version>3.4.13version>
dependency>
public class LockMain {
public static final String ZOOKEEPER_CONFIG = "127.0.0.1:2181" ;
public static void main(String[] args) throws IOException {
CuratorFramework curatorFramework = CuratorFrameworkFactory.builder()
.connectString(ZOOKEEPER_CONFIG)
.sessionTimeoutMs(3000)
.retryPolicy(new ExponentialBackoffRetry(1000, 3))
.build() ;
curatorFramework.start();
final InterProcessMutex lock = new InterProcessMutex(curatorFramework, "/lock") ;
for (int i=0; i<10 ; i++) {
new Thread(() -> {
try {
lock.acquire();
System.out.println(Thread.currentThread().getName() + "-->获得了锁");
} catch (Exception e) {
e.printStackTrace();
}
try {
//模拟逻辑处理
Thread.sleep(3000);
} catch (InterruptedException e) {
e.printStackTrace();
}finally {
try {
lock.release();
} catch (Exception e) {
e.printStackTrace();
}
}
}, "Thread-" + i).start();
}
System.in.read() ;
}
}
这里稍微简单讲解一下示例代码思路, 以线程表示需要请求获取分布式锁的服务, 线程启动获取cpu资源后, 尝试获取锁的请求, 表示各个服务器获取分布式锁的请求, 所有请求会根据到达zookeeper服务器的先后顺序创建**临时有序结点**, 各个线程会监听前一个结点情况(注意监听是Curator提供的, 不是创建的Thread提供的), 来竞争锁资源。具体代码内部实现逻辑, 会在第四节讲解。
集群业务环境下, 服务结点加入集群环境 或 原Leader结点不可用,都可能进行leader选举,比如: Kakfa利用zookeeper实现的leader选举, 下面给出具体的代码实现
参考 2.1节
public class LeaderVote extends LeaderSelectorListenerAdapter implements Closeable {
/**
* 当前进程名称
*/
private String name ;
/**
* leader选举的api
*/
private LeaderSelector leaderSelector ;
/**
* 避免任务执行结束
*/
private CountDownLatch countDownLatch = new CountDownLatch(3) ;
public LeaderVote(String name) {
this.name = name ;
}
public void setLeaderSelector(LeaderSelector leaderSelector) {
this.leaderSelector = leaderSelector;
}
public void start() {
leaderSelector.start();
}
@Override
public void close() throws IOException {
leaderSelector.close();
}
@Override
public void takeLeadership(CuratorFramework client) throws Exception {
System.out.println(this.name + "现在是Leader了");
countDownLatch.await();
}
}
创建测试启动类Main
注意这里是用Main模拟服务器结点, 如果需要模拟多个客户端, 需要将下面的Main创建多份分别启动
public class LeaderVoteMain {
public static final String ZOOKEEPER_CONFIG = "127.0.0.1:2181" ;
public static final String LEADER_NAME = "LeaderA" ;
// public static final String LEADER_NAME = "LeaderB" ;
// public static final String LEADER_NAME = "LeaderB" ;
public static void main(String[] args) throws IOException {
CuratorFramework curatorFramework = CuratorFrameworkFactory.builder()
.connectString(ZOOKEEPER_CONFIG)
.sessionTimeoutMs(3000)
.retryPolicy(new ExponentialBackoffRetry(3000, 3))
.build() ;
curatorFramework.start();
LeaderVote leaderVote = new LeaderVote(LEADER_NAME) ;
LeaderSelector leaderSelector = new LeaderSelector(curatorFramework, "/leader", leaderVote);
leaderVote.setLeaderSelector(leaderSelector);
leaderVote.start();
System.in.read() ;
}
}
注:
本小节是对第二节基于Curator实现分布式锁
实现原理的分析, 看curator是怎么和zookeeper配合实现 分布式锁
功能的。
再次看下下面的代码
final InterProcessMutex lock = new InterProcessMutex(curatorFramework, "/lock") ;
//InterProcessMutex
//1. 指定LockInternalsDriver的实现方式为StandardLockInternalsDriver
public InterProcessMutex(CuratorFramework client, String path){
this(client, path, new StandardLockInternalsDriver());
}
//2. 指定锁的名称LockName、连接数目 maxLeases
public InterProcessMutex(CuratorFramework client, String path, LockInternalsDriver driver){
this(client, path, LOCK_NAME, 1, driver);
}
//3. 校验znode的路径信息是否符合要求、锁实现的核心对象 LockInternals
InterProcessMutex(CuratorFramework client, String path, String lockName, int maxLeases, LockInternalsDriver driver){
basePath = PathUtils.validatePath(path);
internals = new LockInternals(client, driver, path, lockName, maxLeases);
}
首先看下获取锁的逻辑, 这部分代码和 ReentrantLock获取锁的入口是一样的, 这里我们就从acquire()来分析curator是怎么在分布式环境下获取锁的。
//1. 测试案例LockMain
lock.acquire();
//2. InterProcessMutex
public void acquire() throws Exception {
if ( !internalLock(-1, null) ){ // 尝试获取锁, 如果返回false, 抛出异常
throw new IOException("Lost connection while trying to acquire lock: " + basePath);
}
}
InterProcessMutex调用acquire方法尝试获取锁, 实际内部调用的是internalLock()方法, 如果internalLock返回false,会抛出异常
private boolean internalLock(long time, TimeUnit unit) throws Exception{
//1. 获取当前线程
Thread currentThread = Thread.currentThread();
/**
* 2. 从ConcurrentMap获取LockData
* key: Thread
* value: LockData, InterProcessMutex的内部类, 记录了Thread, lockPath, 重入次数lockCount
*/
LockData lockData = threadData.get(currentThread);
//3. 如果当前线程获取了锁, 增加重入次数
if ( lockData != null ){
// re-entering
lockData.lockCount.incrementAndGet();
return true;
}
//4. 获取锁的核心逻辑, 基于InternalLock实现, 该类在构造方法中创建
String lockPath = internals.attemptLock(time, unit, getLockNodeBytes());
//5. 创建lockData, 并添加到threadData中
if ( lockPath != null ){
LockData newLockData = new LockData(currentThread, lockPath);
threadData.put(currentThread, newLockData);
return true;
}
//6. 如果获取失败, 返回false
return false;
}
internalLock的获取锁的思路如下
//LockInternals
String attemptLock(long time, TimeUnit unit, byte[] lockNodeBytes) throws Exception{
// ... 省略部分代码
while ( !isDone ){
isDone = true;
try{
//1. 通过curator, 在zookeeper上创建临时有序结点
//client: CuratorFramework#newWatcherRemoveCuratorFramework创建的带watch机制的客户端
//path: ZKPaths.makePath(path, lockName) 创建路径信息
ourPath = driver.createsTheLock(client, path, localLockNodeBytes);
//2. 获取锁, 当前结点会监视上一个结点, 只有上个结点释放锁, 或当前结点是第一个结点才能抢到锁
hasTheLock = internalLockLoop(startMillis, millisToWait, ourPath);
}
catch ( KeeperException.NoNodeException e ){
// 省略部分代码
}
}
//3. 持有了锁, 返回ourPath信息
if ( hasTheLock ){
return ourPath;
}
//4. 没有持有到锁, 返回null
return null;
}
//StandardLockInternalsDriver
public String createsTheLock(CuratorFramework client, String path, byte[] lockNodeBytes) throws Exception{
String ourPath;
//1. 锁结点信息 lockNodeBytes 不为空
if ( lockNodeBytes != null ){
ourPath = client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL).forPath(path, lockNodeBytes);
}
else{ //2. 锁结点信息为空
ourPath = client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL).forPath(path);
}
return ourPath;
}
这块代码的核心是StandardLockInternalsDriver使用Curator去zookeeper创建临时结点信息, 当前结点信息会对前一结点添加watch监听, 只有当前结点是第一个结点或当前结点的前一结点释放了锁, 当前结点才能抢占到锁。那Curator具体是如何抢占锁的呢 ? 请看下面代码
private boolean internalLockLoop(long startMillis, Long millisToWait, String ourPath) throws Exception
{
boolean haveTheLock = false;
boolean doDelete = false;
try{
//1. 如果当前结点ourPath配置了 revocable, 对当前结点添加watch监听
if ( revocable.get() != null ){
client.getData().usingWatcher(revocableWatcher).forPath(ourPath);
}
//2. 如果客户端状态是 STARTED, 并且当前结点没有锁, 走这部分逻辑
while ( (client.getState() == CuratorFrameworkState.STARTED) && !haveTheLock ){
//3. 对锁路径下的所有节点进行排序, 因为创建的是临时有序节点, 排序后的节点顺序是唯一的
List<String> children = getSortedChildren();
//4. ourPath路径全量信息, basePath是ZKPaths.makePath(path, lockName) 处理后的信息
/**
* basePath: /lock
* ourPath: /lock/_c_0b51ec79-33d9-45d5-b10d-397605ad3404-lock-0000000006
* sequenceNodeName: _c_0b51ec79-33d9-45d5-b10d-397605ad3404-lock-0000000006
*/
String sequenceNodeName = ourPath.substring(basePath.length() + 1); // +1 to include the slash
//5. driver通过getsTheLock获取锁
PredicateResults predicateResults = driver.getsTheLock(client, children, sequenceNodeName, maxLeases);
//6. 成功获取锁, 设置haveTheLock=true
if ( predicateResults.getsTheLock() ){
haveTheLock = true;
}
//7. 没有获取到锁, 走这部分逻辑, 给前节点添加watch监听, 或者wait超时退出
else{
//7.1 获取当前节点的前节点
String previousSequencePath = basePath + "/" + predicateResults.getPathToWatch();
synchronized(this){
try{
//7.2 给当前节点的前节点添加watch监听
client.getData().usingWatcher(watcher).forPath(previousSequencePath);
//7.3 校验当前节点是否等待超时, 如果超时退出
if ( millisToWait != null ){
millisToWait -= (System.currentTimeMillis() - startMillis);
startMillis = System.currentTimeMillis();
if ( millisToWait <= 0 ){
doDelete = true; // timed out - delete our node
break;
}
wait(millisToWait);
}else{
wait();
}
}
catch ( KeeperException.NoNodeException e ){
// it has been deleted (i.e. lock released). Try to acquire again
}
}
}
}
}
catch ( Exception e )
{
ThreadUtils.checkInterrupted(e);
doDelete = true;
throw e;
}
finally{
if ( doDelete ){
// 8. 如果doDelete=true, 将当前节点删除
// 比如: 等待超时
deleteOurPath(ourPath);
}
}
return haveTheLock;
}
从上面的代码,获取锁的逻辑主要为下面步骤
校验当前节点是否设置了RevocationSpec信息, 如果 revocable.get() != null,给当前节点设置watch监听
如果客户端状态是 STARTED, 并且当前结点没有锁, 尝试获取锁
getSortedChildren()对根节点下的所有节点进行排序操作, 实际是对nodeName根据lockName进行subString操作, 下面是实现代码
public static List<String> getSortedChildren(CuratorFramework client, String basePath, final String lockName, final LockInternalsSorter sorter) throws Exception
{
List<String> children = client.getChildren().forPath(basePath);
List<String> sortedList = Lists.newArrayList(children);
Collections.sort
(
sortedList,
new Comparator<String>()
{
@Override
public int compare(String lhs, String rhs)
{
return sorter.fixForSorting(lhs, lockName).compareTo(sorter.fixForSorting(rhs, lockName));
}
}
);
return sortedList;
}
/**
* 例:
* nodeName: _c_0b51ec79-33d9-45d5-b10d-397605ad3404-lock-0000000006
* lockName: lock-
* sorter.fixForSorting(lhs, lockName)执行后的结果是: 0000000006
*/
获取sequenceNodeName节点信息, 根据这个节点尝试获取锁
如果获取失败, 获取当前节点的前节点,对获取的前节点添加watch监听操作
然后根据是否有超时时间, 让节点节点进行等待
如果doDelete=true, 将当前节点删除, 比如: 抛异常、等待超时
Curator通过StandardLockInternalsDriver的getsTheLock方法执行是否成功获取锁的逻辑, 该方法主要根据sequenceNodeName查询下标ourIndex, 如果ourIndex < maxLeases 表示成功获取锁, 如果成功获取锁, 不需要添加Watch监听, 如果获取锁失败, 后序会对当前节点的前节点添加watch操作
注: maxLeases, 这个参数在前面讲解创建InterProcessMutex时提到过, 固定值 1
public PredicateResults getsTheLock(CuratorFramework client, List<String> children, String sequenceNodeName, int maxLeases) throws Exception{
int ourIndex = children.indexOf(sequenceNodeName);
validateOurIndex(sequenceNodeName, ourIndex);
boolean getsTheLock = ourIndex < maxLeases;
String pathToWatch = getsTheLock ? null : children.get(ourIndex - maxLeases);
return new PredicateResults(pathToWatch, getsTheLock);
}
// InterProcessMutex
public void release() throws Exception{
//1. 获取当前线程
Thread currentThread = Thread.currentThread();
//2. 从ConcurrentMap中LockData
LockData lockData = threadData.get(currentThread);
//3. 如果没有lockData, 抛出 IllegalMonitorStateException 异常
if ( lockData == null )
{
throw new IllegalMonitorStateException("You do not own the lock: " + basePath);
}
//4. 存在lockData, 获取当前线程的重入数量 lockCount
int newLockCount = lockData.lockCount.decrementAndGet();
//4.1 如果lockCount > 0, 表示当前线程多次持有锁
if ( newLockCount > 0 )
{
return;
}
if ( newLockCount < 0 )
{
throw new IllegalMonitorStateException("Lock count has gone negative for lock: " + basePath);
}
//5. newLockCount == 0, 表示当前线程可以释放锁, 执行releaseLock执行锁释放
try
{
internals.releaseLock(lockData.lockPath);
}
finally
{
//6. 从ConcurrentMap中删除当前线程
threadData.remove(currentThread);
}
}
阅读上面的代码, 释放锁的逻辑可以概括如下
//LockInternals
final void releaseLock(String lockPath) throws Exception
{
//1. 删除当前节点的watch
client.removeWatchers();
//2. 设置 revocable 为null
revocable.set(null);
//3. 删除远程zookeeper节点信息
deleteOurPath(lockPath);
}
private void deleteOurPath(String ourPath) throws Exception
{
try
{
//通过CuratorFramework 删除远程节点
client.delete().guaranteed().forPath(ourPath);
}
catch ( KeeperException.NoNodeException e )
{
}
}
释放锁的核心逻辑比较简单, 具体步骤如下
上面我们分析了Curator实现分布式锁的原理, 本小节将分析Curator选举Leader实现原理, 首先看下3.2节中程序入口
//LeaderVote
public void start() {
leaderSelector.start();
}
//LeaderSelector
public void start(){
//1. 服务启动, 对相关必要信息做校验
Preconditions.checkState(state.compareAndSet(State.LATENT, State.STARTED), "Cannot be started more than once");
Preconditions.checkState(!executorService.isShutdown(), "Already started");
Preconditions.checkState(!hasLeadership, "Already has leadership");
//2. 获取当前client的所有监听器集合, 并将listener添加入该List集合中
client.getConnectionStateListenable().addListener(listener);
//3. 入队操作
requeue();
}
start()方法是进行Leader选举的入口, 主要包含下面的功能
//LeaderSelector
public boolean requeue()
{
Preconditions.checkState(state.get() == State.STARTED, "close() has already been called");
// 实际入队处理逻辑
return internalRequeue();
}
//LeaderSelector
private synchronized boolean internalRequeue()
{
//之前没有执行过入队操作 并且 当前节点 State标志为STARTED
if ( !isQueued && (state.get() == State.STARTED) )
{
//1. 设置入队为 true
isQueued = true;
//2. 使用Curator内部自定的多线程executorService类, 添加核心逻辑处理线程Callable, 返回Future
Future<Void> task = executorService.submit(new Callable<Void>()
{
@Override
public Void call() throws Exception
{
try
{
//3. 核心处理方法
doWorkLoop();
}
finally
{
clearIsQueued();
if ( autoRequeue.get() )
{
//调用自身, 再次执行入队操作
internalRequeue();
}
}
return null;
}
});
//4. 将Future添加到ourTask
ourTask.set(task);
return true;
}
return false;
}
入队操作需要注意下面几点
入队操作其实是创建了一个带有返回参数Future的Callable
executorService是CloseableExecutorService,它是Curator自己定义的并发类, 内部包含了ExecutorService
public class CloseableExecutorService implements Closeable{
private final ExecutorService executorService;
}
入队操作, 需要满足下面两个条件
之前未加入过队列 isQueued=false
当前节点信息的state 是STARTED
节点是什么时候变为STARTED状态的呢 ?
默认初始时刻, 节点的初始状态默认为 LATENT
private final AtomicReference<State> state = new AtomicReference<State>(State.LATENT);
在进行入队操作时, 会进行compareAndSet()操作, 如果节点是LATENT状态, 会被设置为 STARTED, 如果节点状态变更,再次CAS会抛出异常
如果核心处理方法doWorkLoop()处理失败会重置isQueued=false, 并调用自身再次执行入队操作
内部实际调用的是doWork()方法
//LeaderSelector
private void doWorkLoop() throws Exception{
//内部调用的是doWork()方法
doWork();
}
void doWork() throws Exception{
hasLeadership = false;
try
{
//1. 基于InterProcessMutex获取锁, 来实现Leader选举
mutex.acquire();
//2. 设置获取锁标志为true
hasLeadership = true;
try{
//3. 执行listener的takeLeadership进行逻辑处理
listener.takeLeadership(client);
}
catch ( InterruptedException e ){
//省略部分代码
}
}
catch ( InterruptedException e ){
Thread.currentThread().interrupt();
throw e;
}
finally{
//4. 如果当前节点不在是Leader, 重置hasLeadership=false,并释放锁mutex.realease
if ( hasLeadership )
{
hasLeadership = false;
try
{
mutex.release();
}
catch ( Exception e )
{
ThreadUtils.checkInterrupted(e);
log.error("The leader threw an exception", e);
// ignore errors - this is just a safety
}
}
}
}
从上面的代码可以看出, Leader选举是基于InterProcessMutex来实现的, 其实现原理已经在第四节中讲解, 这里不再重复,doWork()主要通过下面的方式来进行选举操作
设置hasLeaderShip的初始值为false, 然后通过mutex.acquire获取锁
如果获取锁成功, 会设置hasLeaderShip为true, 然后调用当前监听器listener的takeLeadership方法执行业务逻辑
//这是自定义声明Listener(LeaderVote)的 takeLeadership 方法
public void takeLeadership(CuratorFramework client) throws Exception {
System.out.println(this.name + "现在是Leader了");
countDownLatch.await();
}
程序执行结束后,会调用finally代码块执行mutex.release()操作, 进行锁释放
自定义类(LeaderVote)中调用并发工具CountDownLatch的await方法的作用, 是防止程序结束, 以便看出程序执行效果