JBOSS内存泄漏分析

现象:Apollo告警平台16台JBOSS服务器中有一台登陆不上,堆内存耗尽,OOM异常。

分析:

1、取回dump日志,通过IBM HeapAnalyzer分析

NonRegisteringDriver类concurrentHashMap占用了73%的堆内存。

疑点:为何GC回收不了此内存?

打开5.1.45版本的mysql-connector-java下NonRegisteringDriver.java。

 71     protectedstaticfinalConcurrentHashMap connectionPhantomRefs = newConcurrentHashMap    ntomReference, ConnectionPhantomReference>();

348     protectedstaticvoidtrackConnection(Connection newConn) {

349 

350         ConnectionPhantomReference phantomRef = newConnectionPhantomReference((ConnectionImpl) newConn, refQueue);

351         connectionPhantomRefs.put(phantomRef, phantomRef);

352     }

   staticclassConnectionPhantomReference extendsPhantomReference {

889         privateNetworkResources io;

890 

891         ConnectionPhantomReference(ConnectionImpl connectionImpl, ReferenceQueue q) {

892             super(connectionImpl, q);

893 

894             try{

895                 this.io = connectionImpl.getIO().getNetworkResources();

896             } catch(SQLException e) {

897                 // if we somehow got here and there's really no i/o, we deal with it later

898             }

899         }

900 

901         voidcleanup() {

902             if(this.io != null) {

903                 try{

904                     this.io.forceClose();

905                 } finally{

906                     this.io = null;

907                 }

908             }

909         }

910     }

显然connectionPhantomRefs map占用太多的堆内存了。同时ConnectionPhantomReference 继承PhantomReference。

而这个Reference如果不调用cleanup,将io=null,是无法通过GC回收的。

 

那么正常情况下,谁负责调用cleanup呢?这里connect/j中会启用AbandonedConnectionUpThread这个常驻线程,负责清空此队列。

 

AbandonedConnectionCleanUpThread.java

 60     publicvoidrun() {

 61         for(;;) {

 62             try{

 63                 checkContextClassLoaders();

 64                 ReferenceextendsConnectionImpl> ref = NonRegisteringDriver.refQueue.remove(5000);

 65                 if(ref != null) {

 66                     try{

 67                         ((ConnectionPhantomReference) ref).cleanup();

 68                     } finally{

 69                         NonRegisteringDriver.connectionPhantomRefs.remove(ref);

 70                     }

 71                 }

 72 

 73             } catch(InterruptedException e) {

 74                 threadRef = null;

 75                 return;

 76 

 77             } catch(Exception ex) {

 78                 // Nowhere to really log this.

 79             }

 80         }

 81     }

 82 

既然有这个线程保证回收队列,那为什么又占那么多堆内存呢。莫非是线程挂起或者阻塞了?

通过visualvm工具打开dump,显示线程信息,找到Abandoned线程

"Abandoned connection -

thread" daemon prio=5 tid=90 BLOCKED

    at java.util.ArrayList.(ArrayList.java:132)

    at org.jboss.modules.ModuleClassLoader.loadResourceLocal(ModuleClassLoader.java:349)

       Local Variable: java.util.ArrayList#54264

    at org.jboss.modules.ModuleClassLoader$1.loadResourceLocal(ModuleClassLoader.java:103)

       Local Variable: org.jboss.modules.ModuleClassLoader$1#14

    at org.jboss.modules.Module.getResource(Module.java:600)

       Local Variable: org.jboss.modules.Module#15

    at org.jboss.modules.ModuleClassLoader.findResource(ModuleClassLoader.java:574)

       Local Variable: org.jboss.modules.ModuleClassLoader#15

       Local Variable: java.lang.String#1101031

    at org.jboss.modules.ConcurrentClassLoader.getResource(ConcurrentClassLoader.java:261)

    at com.mysql.jdbc.AbandonedConnectionCleanupThread.checkContextClassLoaders(AbandonedConnectionCleanupThread.java:90)

    at com.mysql.jdbc.AbandonedConnectionCleanupThread.run(AbandonedConnectionCleanupThread.java:63)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

       Local Variable: java.util.concurrent.ThreadPoolExecutor#7

       Local Variable: com.mysql.jdbc.AbandonedConnectionCleanupThread#1

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

       Local Variable: java.util.concurrent.ThreadPoolExecutor$Worker#48

    at java.lang.Thread.run(Thread.java:724)

 

发现果然线程被blocked,阻塞状态。等待ArrayList,init方法处理完。

查看上下文,发现是checkContextClassLoaders()的问题。

调用链方法如下:

AbandoneConnectionCleanUpThread.java

privatevoidcheckContextClassLoaders() {

 89         try{

 90             threadRef.getContextClassLoader().getResource("");

 91         } catch(Throwable e) {

 92             // Shutdown no matter what.

 93             uncheckedShutdown();

 94         }

 95     }

ConcurrentClassLoader.java

    publicfinalURL getResource(finalString name) {

250         for(String s : Module.systemPaths) {

251             if(name.startsWith(s)) {

252                 returnJDKSpecific.getSystemResource(name);

253             }

254         }

255         returnfindResource(name, false);

256     }

 

ModuleClassLoader.java

  /**{@inheritDoc} */

650     @Override

651     publicfinalURL findResource(finalString name, finalbooleanexportsOnly) {

652         returnmodule.getResource(name);

653     }

 

Module.java

 748     * @returnthe resource URL, or {@code null}if not found

 749     */

 750     URL getResource(finalString name) {

 751         finalString canonPath = PathUtils.canonicalize(name);

 752         for(String s : Module.systemPaths) {

 753             if(canonPath.startsWith(s)) {

 754                 returnmoduleClassLoader.getResource(canonPath);

 755             }

 756         }

 757         log.trace("Attempting to find resource %s in %s", canonPath, this);

 758         finalString path = pathOf(canonPath);

 759         finalURLConnectionResource jaxpResource = ModuleClassLoader.jaxpImplResources.get(canonPath);

 760         finalMap> paths = getPathsUnchecked();

 761         finalList loaders = paths.get(path);

 762         if(loaders != null) {

 763             for(LocalLoader loader : loaders) {

 764                 finalIterator iterator = loader.loadResourceLocal(canonPath).iterator();

 765                 if(iterator.hasNext()) {

 766                     finalURL url = iterator.next().getURL();

 767                     if(jaxpResource != null) log.jaxpResourceLoaded(url, this);

 768                     returnurl;

 769                 }

 770             }

 771         }

 772         finalLocalLoader fallbackLoader = this.fallbackLoader;

 773         if(fallbackLoader != null) {

 774             finalIterator iterator = fallbackLoader.loadResourceLocal(canonPath).iterator();

 775             if(iterator.hasNext()) {

 776                 finalURL url = iterator.next().getURL();

 777                 if(jaxpResource != null) log.jaxpResourceLoaded(url, this);

 778                 returnurl;

 779             }

 780         }

 781         if(jaxpResource != null) {

 782             finalURL url = jaxpResource.getURL();

 783             log.jaxpResourceLoaded(url, this);

 784             returnurl;

 785         }

 786         returnnull;

 787     }

 

 

 moduleClassLoader.java

     /**

395     *Load a local exported resource from this class loader.

396     *

397     * @paramnamethe resource name

398     * @returnthe list of resources

399     */

400     publicList loadResourceLocal(finalString name) {

401         finalMap> paths = this.paths.get().getAllPaths();

402 

403         finalString path = Module.pathOf(name);

404         finalURLConnectionResource jaxpResource = jaxpImplResources.get(name);

405 

406         finalList loaders = paths.get(path);

407         finalList list = newArrayList(loaders == null1: loaders.size());

408         if(loaders != null) {

409             for(ResourceLoader loader : loaders) {

410                 finalResource resource = loader.getResource(name);

411                 if(resource != null) {

412                     if(jaxpResource != null) Module.log.jaxpResourceLoaded(resource.getURL(), module);

413                     list.add(resource);

414                 }

415             }

416         }

417         returnlist.isEmpty() ? Collections.emptyList() : list;

418     }

 

分析是ArrayList init的时候阻塞线程,那么就是407行的初始化loaders.size()大小的队列。

考虑到此dump占用了99.7%的空间,说明这个地方已经无法申请那么多空间内存,所以回收线程被阻塞了。

那么concurrentHashMap无法回收数据,导致内存泄漏,最终发生OOM。

 

所以结论就是堆内存空间不足,导致Abandon线程阻塞,引起内存泄漏,发生OOM

 

紧接着问题就是为什么堆内存空间不足?

 

正常情况下的单台服务器占用500M堆内存,最多到2G。但是这台出问题的服务器,内存一直在2G左右。

IBMheap Analyzer分析第二疑似点为:

 

JBOSS的org.jboss.module.linkage的链表也很大,占用了22%的空间。

 57     Linkage(finalDependencySpec[] dependencySpecs, finalDependency[] dependencies, finalState state, finalMap> allPaths) {

 58         this.dependencySpecs = dependencySpecs;

 59         this.dependencies = dependencies;

 60         this.state = state;

 61         this.allPaths = PathUtils.deduplicateLists(allPaths);

 62     }

 

LinkAge的构造函数里面进行了拷贝操作。

  static Map> deduplicateLists(Map> allPaths) {

400         if(allPaths == null) {

401             returnnull;

402         } elseif(allPaths.size() == 0) {

403             returnCollections.emptyMap();

404         } else{

405             Map> newPaths = newHashMap<>();

406             Map, List> dedup = newHashMap<>();

407             for(Map.Entry> e : allPaths.entrySet()) {

408                 List l = dedup.get(e.getValue());

409                 if(l == null) {

410                     dedup.put(e.getValue(), l = Collections.unmodifiableList(newArrayList<>(e.getValue())));

411                 }

412                 newPaths.put(e.getKey(), l);

413             }

414             returnCollections.unmodifiableMap(newPaths);

415         }

416     }

 

而Module.getPaths,又会new linkage

 

1554     Map> getPaths() throwsModuleLoadException {

1555         Linkage oldLinkage = this.linkage;

1556         Linkage linkage;

1557         Linkage.State state = oldLinkage.getState();

1558         if(state == Linkage.State.LINKED) {

1559             returnoldLinkage.getPaths();

1560         }

1561         // slow path loop

1562         booleanintr = false;

1563         try{

1564             for(;;) {

1565                 synchronized(this) {

1566                     oldLinkage = this.linkage;

1567                     state = oldLinkage.getState();

1568                     while(state == Linkage.State.LINKING || state == Linkage.State.NEW) try{

1569                         wait();

1570                         oldLinkage = this.linkage;

1571                         state = oldLinkage.getState();

1572                     } catch(InterruptedException e) {

1573                         intr = true;

1574                     }

1575                     if(state == Linkage.State.LINKED) {

1576                         returnoldLinkage.getPaths();

1577                     }

1578                     this.linkage = linkage = newLinkage(oldLinkage.getDependencySpecs(), oldLinkage.getDependencies(), Linkage.State.LINKING);

1579                     // fall out and link

1580                 }

1581                 booleanok = false;

1582                 try{

1583                     link(linkage);

1584                     ok = true;

1585                 } finally{

1586                     if(! ok) {

1587                         // restore original (lack of) linkage

1588                         synchronized(this) {

1589                             if(this.linkage == linkage) {

1590                                 this.linkage = oldLinkage;

1591                                 notifyAll();

1592                             }

 

这里有无限循环。

所以在Abandon线程在check的时候,可能会不停创建LInkage。

同时分析正常环境的dump,不存在此现象。

 

我有理由怀疑,Abandon线程在某种场景下会不停的创建LinkAge,

而这句话threadRef.getContextClassLoader().getResource("”);

可能就是问题的原因。

 

重启WILDFLY后,此问题不复现。Abandon线程正常,不会一直有进程加载资源。

一切都恢复了原样!然而平静的表面下,还是暗藏着玄机。

你可能感兴趣的:(JBOSS内存泄漏分析)