近期有使用nacos的小伙伴在使用时遇到一个tomcat警告内存泄漏的问题。
相关警告信息:
2020-11-03 16:59:46.088 [main] WARN o.a.c.loader.WebappClassLoaderBase [173] - The web application [ROOT] appears to have started a thread named [com.alibaba.nacos.naming.beat.sender] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) 2020-11-03 16:59:46.089 [main] WARN o.a.c.loader.WebappClassLoaderBase [173] - The web application [ROOT] appears to have started a thread named [com.alibaba.nacos.naming.failover] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) 2020-11-03 16:59:46.090 [main] WARN o.a.c.loader.WebappClassLoaderBase [173] - The web application [ROOT] appears to have started a thread named [com.alibaba.nacos.naming.push.receiver] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: java.net.PlainDatagramSocketImpl.receive0(Native Method) java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:143) java.net.DatagramSocket.receive(DatagramSocket.java:812) com.alibaba.nacos.client.naming.core.PushReceiver.run(PushReceiver.java:73) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.run(FutureTask.java:266) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745)
下面是我复现并定位问题的一些信息:
我本地项目成功复现该问题。
mave依赖中包含spring-boot-starter-actuator.
org.springframework.boot
spring-boot-starter-web
org.springframework.boot
spring-boot-starter-actuator
com.alibaba.cloud
spring-cloud-starter-alibaba-nacos-discovery
2.2.3.RELEASE
application配置:
spring:
application:
name: nacos-producer
cloud:
nacos:
discovery:
server-addr: 127.0.0.1:8848
启动类:
@SpringBootApplication
@EnableDiscoveryClient
public class App {
public static void main(String[] args) {
SpringApplication.run(App.class, args);
}
}
本地的nacos-server没有进行启动,8848端口不进行监听。项目启动时,会去注册实例到nacos-server, 但是nacos-server没有启动,就会出现connection refused的报错,导致spring初始化失败,关闭tomcat容器。而警告日志就是在关闭tomcat容器的时候打出来的。
tomcat中打警告日志的逻辑见:org.apache.catalina.loader.WebappClassLoaderBase#clearReferencesThreads
. 这个org.apache.catalina.loader.WebappClassLoaderBase
的实现类其实就是org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader
.
下面是clearReferencesThreads方法的部分逻辑。
private void clearReferencesThreads() {
Thread[] threads = getThreads();
List executorThreadsToStop = new ArrayList<>();
// Iterate over the set of threads
for (Thread thread : threads) {
if (thread != null) {
ClassLoader ccl = thread.getContextClassLoader();
if (ccl == this) {
// Don't warn about this thread
if (thread == Thread.currentThread()) {
continue;
}
final String threadName = thread.getName();
// JVM controlled threads
ThreadGroup tg = thread.getThreadGroup();
if (tg != null && JVM_THREAD_GROUP_NAMES.contains(tg.getName())) {
// HttpClient keep-alive threads
if (clearReferencesHttpClientKeepAliveThread &&
threadName.equals("Keep-Alive-Timer")) {
thread.setContextClassLoader(parent);
log.debug(sm.getString("webappClassLoader.checkThreadsHttpClient"));
}
// Don't warn about remaining JVM controlled threads
continue;
}
// Skip threads that have already died
if (!thread.isAlive()) {
continue;
}
// TimerThread can be stopped safely so treat separately
// "java.util.TimerThread" in Sun/Oracle JDK
// "java.util.Timer$TimerImpl" in Apache Harmony and in IBM JDK
if (thread.getClass().getName().startsWith("java.util.Timer") &&
clearReferencesStopTimerThreads) {
clearReferencesStopTimerThread(thread);
continue;
}
if (isRequestThread(thread)) {
//打印相关的警告
log.warn(sm.getString("webappClassLoader.stackTraceRequestThread",
getContextName(), threadName, getStackTrace(thread)));
} else {
//打印相关的警告
log.warn(sm.getString("webappClassLoader.stackTrace",
getContextName(), threadName, getStackTrace(thread)));
}
}
Springboot程序在启动失败过后,会去关闭Tomcat容器,在关闭Tomcat容器的时候会扫描线程,如果对应的线程满足一下几个点,就会打警告日志。
1.判断线程的ContextClassLoader是否和是自己, 也就是org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader
.
2.判断该线程不是当前线程。
满足两点过后,再判断当前线程的栈帧中是否存在org.apache.catalina.connector.CoyoteAdapter
,如果包含这个类,说明当前线程正在处理请求,打警告日志:
The web application [{0}] is still processing a request that has yet to finish. This is very likely to create a memory leak. You can control the time allowed for requests to finish by using the unloadDelay attribute of the standard Context implementation. Stack trace of request processing thread:[{2}]
如果不包含这个类的话,打警告日志:
The web application [{0}] appears to have started a thread named [{1}] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:{2}
这个就是打警告日志的原因。
创建线程时,线程的contextClassLoader和创建线程的线程的contextClassLoader是一致的。
以下来自Thread的构造函数的部分相关逻辑:
Thread parent = currentThread();
if (security == null || isCCLOverridden(parent.getClass()))
this.contextClassLoader = parent.getContextClassLoader();
else
this.contextClassLoader = parent.contextClassLoader;
所以,之前的判断打警告日志时,对应线程的classLoader只要是org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader
就会打相关的警告日志。
如果创建相关的线程时,Thread.currentThread()的contextClassLoader对应的是TomcatEmbeddedWebappClassLoader
的话,
就会发生之前打警告日志的问题。
这涉及到org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedContext
初始化的一个逻辑,见方法org.apache.catalina.core.StandardContext#startInternal
.相关逻辑如下:
try {
if (ok) {
// Start our subordinate components, if any
Loader loader = getLoader();
if (loader instanceof Lifecycle) {
((Lifecycle) loader).start();
}
// since the loader just started, the webapp classloader is now
// created.
setClassLoaderProperty("clearReferencesRmiTargets",
getClearReferencesRmiTargets());
setClassLoaderProperty("clearReferencesStopThreads",
getClearReferencesStopThreads());
setClassLoaderProperty("clearReferencesStopTimerThreads",
getClearReferencesStopTimerThreads());
setClassLoaderProperty("clearReferencesHttpClientKeepAliveThread",
getClearReferencesHttpClientKeepAliveThread());
setClassLoaderProperty("clearReferencesObjectStreamClassCaches",
getClearReferencesObjectStreamClassCaches());
setClassLoaderProperty("clearReferencesObjectStreamClassCaches",
getClearReferencesObjectStreamClassCaches());
setClassLoaderProperty("clearReferencesThreadLocals",
getClearReferencesThreadLocals());
// By calling unbindThread and bindThread in a row, we setup the
// current Thread CCL to be the webapp classloader
unbindThread(oldCCL);
oldCCL = bindThread();
// Initialize logger again. Other components might have used it
// too early, so it should be reset.
logger = null;
getLogger();
Realm realm = getRealmInternal();
if(null != realm) {
if (realm instanceof Lifecycle) {
((Lifecycle) realm).start();
}
// Place the CredentialHandler into the ServletContext so
// applications can have access to it. Wrap it in a "safe"
// handler so application's can't modify it.
CredentialHandler safeHandler = new CredentialHandler() {
@Override
public boolean matches(String inputCredentials, String storedCredentials) {
return getRealmInternal().getCredentialHandler().matches(inputCredentials, storedCredentials);
}
@Override
public String mutate(String inputCredentials) {
return getRealmInternal().getCredentialHandler().mutate(inputCredentials);
}
};
context.setAttribute(Globals.CREDENTIAL_HANDLER, safeHandler);
}
// Notify our interested LifecycleListeners
fireLifecycleEvent(Lifecycle.CONFIGURE_START_EVENT, null);
// Start our child containers, if not already started
for (Container child : findChildren()) {
if (!child.getState().isAvailable()) {
child.start();
}
}
// Start the Valves in our pipeline (including the basic),
// if any
if (pipeline instanceof Lifecycle) {
((Lifecycle) pipeline).start();
}
// Acquire clustered manager
Manager contextManager = null;
Manager manager = getManager();
if (manager == null) {
if (log.isDebugEnabled()) {
log.debug(sm.getString("standardContext.cluster.noManager",
Boolean.valueOf((getCluster() != null)),
Boolean.valueOf(distributable)));
}
if ((getCluster() != null) && distributable) {
try {
contextManager = getCluster().createManager(getName());
} catch (Exception ex) {
log.error(sm.getString("standardContext.cluster.managerError"), ex);
ok = false;
}
} else {
contextManager = new StandardManager();
}
}
// Configure default manager if none was specified
if (contextManager != null) {
if (log.isDebugEnabled()) {
log.debug(sm.getString("standardContext.manager",
contextManager.getClass().getName()));
}
setManager(contextManager);
}
if (manager!=null && (getCluster() != null) && distributable) {
//let the cluster know that there is a context that is distributable
//and that it has its own manager
getCluster().registerManager(manager);
}
}
if (!getConfigured()) {
log.error(sm.getString("standardContext.configurationFail"));
ok = false;
}
// We put the resources into the servlet context
if (ok)
getServletContext().setAttribute
(Globals.RESOURCES_ATTR, getResources());
if (ok ) {
if (getInstanceManager() == null) {
setInstanceManager(createInstanceManager());
}
getServletContext().setAttribute(
InstanceManager.class.getName(), getInstanceManager());
InstanceManagerBindings.bind(getLoader().getClassLoader(), getInstanceManager());
}
// Create context attributes that will be required
if (ok) {
getServletContext().setAttribute(
JarScanner.class.getName(), getJarScanner());
}
// Set up the context init params
mergeParameters();
// Call ServletContainerInitializers
for (Map.Entry>> entry :
initializers.entrySet()) {
try {
entry.getKey().onStartup(entry.getValue(),
getServletContext());
} catch (ServletException e) {
log.error(sm.getString("standardContext.sciFail"), e);
ok = false;
break;
}
}
// Configure and call application event listeners
if (ok) {
if (!listenerStart()) {
log.error(sm.getString("standardContext.listenerFail"));
ok = false;
}
}
// Check constraints for uncovered HTTP methods
// Needs to be after SCIs and listeners as they may programmatically
// change constraints
if (ok) {
checkConstraintsForUncoveredMethods(findConstraints());
}
try {
// Start manager
Manager manager = getManager();
if (manager instanceof Lifecycle) {
((Lifecycle) manager).start();
}
} catch(Exception e) {
log.error(sm.getString("standardContext.managerFail"), e);
ok = false;
}
// Configure and call application filters
if (ok) {
if (!filterStart()) {
log.error(sm.getString("standardContext.filterFail"));
ok = false;
}
}
// Load and initialize all "load on startup" servlets
if (ok) {
if (!loadOnStartup(findChildren())){
log.error(sm.getString("standardContext.servletFail"));
ok = false;
}
}
// Start ContainerBackgroundProcessor thread
super.threadStart();
} finally {
// Unbinding thread
unbindThread(oldCCL);
}
其中bindThread()方法会把当前线程的contextClassLoader,也就是AppClassLoader替换成TomcatEmbeddedWebappClassLoader. 后续在finnaly再执行 unbindThread(oldCCL), 再把AppClassLoader进行还原。
在bindThread和unbindThread之间,就会做一些listener的相关start.
关键在于// Call ServletContainerInitializers
, 这里会把org.springframework.boot.web.servlet.ServletContextInitializer
类型的bean进行获取,org.springframework.boot.actuate.endpoint.web.ServletEndpointRegistrar
这个类是spring-boot-starter-actuator
中的,它实现了ServletContextInitializer
接口,也会在// Call ServletContainerInitializers
进行加载。在加载过程中,就会进行ServletEndpointRegistrar的相关实现类加载,而在sca中,存在类com.alibaba.cloud.nacos.endpoint.NacosDiscoveryEndpointAutoConfiguration
,其中有nacos相关的endpoint的实现,这之中会进行NacosNamingService的初始化,这个时候初始化时,contextClassLoader还没有进行unBind, 所以这时创建的线程的contextClassLoader都是TomcatEmbeddedWebappClassLoader。
如果不引入spring-boot-starter-actuator
的情况下,NacosNamingService初始化已经是unbindThread之后了,这个时候当前线程的contextClassLoader已经还原成了AppClassLoader。
因此后续如果在spring的初始化周期抛出了没有catch的异常,进行tomcat销毁时,就会判断线程的contextClassLoader是否是TomcatEmbeddedWebappClassLoader,在不引入spring-boot-starter-actuator
的情况下,线程的contextClassLoader是AppClassLoader, 不会打警告日志。引入之后,contextClassLoader是TomcatEmbeddedWebappClassLoader,会打警告日志。
本质原因就是如果在org.springframework.boot.web.servlet.ServletContextInitializer的bean的切入点过程中创建的线程,其contextClassLoader是TomcatEmbeddedWebappClassLoader,如果在unBindThread之前(把contextClassLoader还原成AppClassLoader), spring初始化失败,关闭tomcat容器时,就会打警告日志。
详细讨论见issue: https://github.com/alibaba/nacos/issues/4124.