springboot的优雅停机是借助于ShutdownHook回调实现的(网上文章都说烂了)。
在执行hook流程时,spring借助CountDownLatch阻塞线程达到在一定时间内不退出程序,来处理剩下的任务。
原地址:https://juejin.cn/post/7197292579057221693 发表在掘金,这次拿到csdn,书写格式可能不是很友好。
SmartLifecycle
DefaultLifecycleProcessor
WebServerGracefulShutdownLifecycle
WebServerStartStopLifecycle
WebServerManager
TomcatWebServer implements WebServer
java.util.concurrent.CountDownLatch
java.lang.Runtime
java.lang.ApplicationShutdownHooks
java.lang.Shutdown
什么时候设置的hook
什么时候触发的hook
触发hook后续的流程
org.springframework.boot.SpringApplication#refreshContext()
org.springframework.context.support.AbstractApplicationContext#registerShutdownHook()
@Override
public void registerShutdownHook() {
if (this.shutdownHook == null) {
// No shutdown hook registered yet.this.shutdownHook = newThread(SHUTDOWN_HOOK_THREAD_NAME) {
@Override public void run() {
synchronized (startupShutdownMonitor) {
doClose();
}
}
};
Runtime.getRuntime().addShutdownHook(this.shutdownHook);
}
}
从上述代码中可以看到,spring在刷新上下文时会向Runtime中注册一个shutdownHook,根据Runtime api中注释可以看出,当虚拟机响应关闭信号后(有些信号不会响应例如 kill -9),会执行这个线程
从注册hook时可以看到,当虚拟机回调时会执行 doClose()方法,也就是说这个方法是关闭容器的核心入口
org.springframework.context.support.AbstractApplicationContext#doClose()
public static void main(String[] args){
ConfigurableApplicationContext context = SpringApplication.run(MvcApplication.class, args);
// 模拟 shutdown调用
context.close();
}
@Override
public void close() {
synchronized (this.startupShutdownMonitor) {
// 此处调用真正的关闭方法doClose();
if (this.shutdownHook != null) {
try {
Runtime.getRuntime().removeShutdownHook(this.shutdownHook);
}
catch (IllegalStateException ex) {
// ignore - VM is already shutting down
}
}
}
}
protected void doClose() {
....... 忽略不在本次范围的代码,有兴趣的可以去源码看看
// Stop all Lifecycle beans, to avoid delays during individual destruction.if (this.lifecycleProcessor != null) {
try {
// 停止实现Lifecycle的bean
this.lifecycleProcessor.onClose();
}
catch (Throwable ex) {
logger.warn("Exception thrown from LifecycleProcessor on context close", ex);
}
}
.....
}
上述代码可以忽略不看,只是Springboot停机的外部代码
private void stopBeans() {
Map lifecycleBeans = getLifecycleBeans();
Map phases = new HashMap<>();
lifecycleBeans.forEach((beanName, bean) -> {
int shutdownPhase = getPhase(bean);
LifecycleGroup group = phases.get(shutdownPhase);
if (group == null) {
group = new LifecycleGroup(shutdownPhase, this.timeoutPerShutdownPhase, lifecycleBeans, false);
phases.put(shutdownPhase, group);
}
group.add(beanName, bean);
});
if (!phases.isEmpty()) {
List keys = new ArrayList<>(phases.keySet());
keys.sort(Collections.reverseOrder());
for (Integer key : keys) {
// TODO 重点
phases.get(key).stop();
}
}
}
stopBeans 一共做了两件事 组装 和 排序 这个不重要
重要的是 经过一系列组装,将相同排序的lifecycle加入到同一个 LifecycleGroup 这个类 里面会维护多个 lifecycle成员,在执行stop的时候,多个成员for循环依次执行
// LifecycleGroup
public void stop() {
if (this.members.isEmpty()) {
return;
}
if (logger.isDebugEnabled()) {
logger.debug("Stopping beans in phase " + this.phase);
}
this.members.sort(Collections.reverseOrder());
// 倒数器, count数量就是 lifecycle成员的数量
CountDownLatch latch = new CountDownLatch(this.smartMemberCount);
Set countDownBeanNames = Collections.synchronizedSet(new LinkedHashSet<>());
// 里面的类名,会在doStop时被移除
Set lifecycleBeanNames = new HashSet<>(this.lifecycleBeans.keySet());
for (LifecycleGroupMember member : this.members) {
if (lifecycleBeanNames.contains(member.name)) {
doStop(this.lifecycleBeans, member.name, latch, countDownBeanNames);
}
else if (member.bean instanceof SmartLifecycle) {
// Already removed: must have been a dependent bean from another phase
latch.countDown();
}
}
try {
// await 等待, 也就意味着 如果在上述方法时候,一直不执行countDown ,这里就是一个兜底方案,强制放行
latch.await(this.timeout, TimeUnit.MILLISECONDS);
if (latch.getCount() > 0 && !countDownBeanNames.isEmpty() && logger.isInfoEnabled()) {
logger.info("Failed to shut down " + countDownBeanNames.size() + " bean" +
(countDownBeanNames.size() > 1 ? "s" : "") + " with phase value " +
this.phase + " within timeout of " + this.timeout + "ms: " + countDownBeanNames);
}
}
catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
}
private void doStop(Map lifecycleBeans, final String beanName,
final CountDownLatch latch, final Set countDownBeanNames) {
// 移除当前这个bean,并返回bean的实例
Lifecycle bean = lifecycleBeans.remove(beanName);
if (bean != null) {
// 依赖关系 依次stop
String[] dependentBeans = getBeanFactory().getDependentBeans(beanName);
for (String dependentBean : dependentBeans) {
doStop(lifecycleBeans, dependentBean, latch, countDownBeanNames);
}
try {
if (bean.isRunning()) {
if (bean instanceof SmartLifecycle) {
if (logger.isTraceEnabled()) {
logger.trace("Asking bean '" + beanName + "' of type [" +
bean.getClass().getName() + "] to stop");
}
countDownBeanNames.add(beanName);
// 核心 执行stop,执行完毕后回调函数中 进行countDown
((SmartLifecycle) bean).stop(() -> {
latch.countDown();
countDownBeanNames.remove(beanName);
if (logger.isDebugEnabled()) {
logger.debug("Bean '" + beanName + "' completed its stop procedure");
}
});
}
else {
if (logger.isTraceEnabled()) {
logger.trace("Stopping bean '" + beanName + "' of type [" +
bean.getClass().getName() + "]");
}
bean.stop();
if (logger.isDebugEnabled()) {
logger.debug("Successfully stopped bean '" + beanName + "'");
}
}
}
else if (bean instanceof SmartLifecycle) {
// Don't wait for beans that aren't running...
latch.countDown();
}
}
catch (Throwable ex) {
if (logger.isWarnEnabled()) {
logger.warn("Failed to stop bean '" + beanName + "'", ex);
}
}
}
}
上述两段代码,其实真正核心的就是一个CountDownLatch的运用
LifecycleGroup的member作为countDown的count,stop成功一个释放一个count,直到全部释放成功
latch.await(this.timeout, TimeUnit.MILLISECONDS)
如果countDown内部的count一直没被消费,则一直阻塞在这里
作为一个兜底,如果超过timeout时间还没有stop完毕,则不再阻塞线程,这里的timeout就是咱们在yaml文件中配置的
default void stop(Runnable callback) {
stop();
callback.run();
}
((SmartLifecycle) bean).stop(() -> {
latch.countDown();
countDownBeanNames.remove(beanName);
if (logger.isDebugEnabled()) {
logger.debug("Bean '" + beanName + "' completed its stop procedure");
}
});
具体看下SmartLifecycle这个方法,我们发现,是一个callback函数,只有当stop完成后,再会执行我们设置的函数,也就是latch.countDown()
org.springframework.boot.web.reactive.context.WebServerGracefulShutdownLifecycle#stop(java.lang.Runnable)
org.springframework.boot.web.reactive.context.WebServerManager#shutDownGracefully
org.springframework.boot.web.embedded.tomcat.TomcatWebServer#shutDownGracefully
org.springframework.boot.web.embedded.tomcat.GracefulShutdown#shutDownGracefully
void shutDownGracefully(GracefulShutdownCallback callback) {
logger.info("Commencing graceful shutdown. Waiting for active requests to complete");
new Thread(() -> doShutdown(callback), "tomcat-shutdown").start();
}
private void doShutdown(GracefulShutdownCallback callback) {
List connectors = getConnectors();
connectors.forEach(this::close);
try {
for (Container host : this.tomcat.getEngine().findChildren()) {
for (Container context : host.findChildren()) {
while (isActive(context)) {
if (this.aborted) {
logger.info("Graceful shutdown aborted with one or more requests still active");
callback.shutdownComplete(GracefulShutdownResult.REQUESTS_ACTIVE);
return;
}
Thread.sleep(50);
}
}
}
}
catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
logger.info("Graceful shutdown complete");
callback.shutdownComplete(GracefulShutdownResult.IDLE);
}
代码可能有点多,既然坚持到这里了,还是把调用栈详细写出来
shutDownGracefully (callback)
我们看到这里启动了一个新的线程,并且执行,全部交给异步执行(不要忘了入参是个 callback)
内部再调用doShutDown(callback)
doShutdown(callback) 关键
关闭所有Connector,熟悉tomcat的都知道,Connector是管理socket连接的,关闭了Connector也就代表不再接受新的请求了。
isActive(context) == true就一直执行,进入内部源码看下就会清楚,里面是tomcat正在处理的任务,只要有一个任务没结束就返回true,这个方法也就是说明了,优雅关闭的核心,当有请求没有处理完,就允许他继续处理
总结
定义 countDownLatch 阻塞hook的线程, count数量就是 实现lifecycle的子类
循环每一个lifecycle进行stop,stop完成后会进行countDownLatch.countDown()
最外层countDownLatch.await,设置超时时间,如果超时不再阻塞主进程,正常走完hook流程,结束进程
编写不易,转载请标明出处。