JVM Crash一般会生成core.pid文件和hs_err_pidXXXX.log。
打开hs_err_pidXXXX.log文件 一般有如下内容:
A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0x00007fb7006c6f31, pid=8864, tid=140421610395392
#
# JRE version: 6.0_20-b02
# Java VM: Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode linux-amd64 )
# Problematic frame:
# C [libzip.so+0xaf31]
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
其中红色标识的C表示执行Native code的时候出现问题。
查找C [libzip.so+0xaf31]
可以看到其下部分提示如下:
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J java.util.zip.ZipFile.getEntry(JLjava/lang/String;Z)J
J sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;
J sun.misc.URLClassPath$JarLoader.findResource(Ljava/lang/String;Z)Ljava/net/URL;
j sun.misc.URLClassPath$1.next()Z+42
j sun.misc.URLClassPath$1.hasMoreElements()Z+1
j java.net.URLClassLoader$3$1.run()Ljava/lang/Object;+7
v ~StubRoutines::call_stub
j java.security.AccessController.doPrivileged(Ljava/security/PrivilegedAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0
j java.net.URLClassLoader$3.next()Z+24
j java.net.URLClassLoader$3.hasMoreElements()Z+1
j sun.misc.CompoundEnumeration.next()Z+33
j sun.misc.CompoundEnumeration.hasMoreElements()Z+1
j org.apache.hadoop.mapred.JobConf.findContainingJar(Ljava/lang/Class;)Ljava/lang/String;+42
j org.apache.hadoop.mapred.JobConf.setJarByClass(Ljava/lang/Class;)V+1
j org.apache.hadoop.mapreduce.Job.setJarByClass(Ljava/lang/Class;)V+5
j com.panguso.recommend.mapred.usermodel.task.UserModelTask.run()V+53
j com.panguso.recommend.common.mission.AbstractMission.unitJob()V+198
j com.panguso.recommend.common.mission.AbstractMission.access$100(Lcom/panguso/recommend/common/mission/AbstractMission;)V+1
j com.panguso.recommend.common.mission.AbstractMission$1.run()V+39
j java.util.TimerThread.mainLoop()V+221
j java.util.TimerThread.run()V+1
v ~StubRoutines::call_stub
定位到自己的代码,发现代码没有问题。则定位到jar包。想到jar在运行时发生过替换。断定可能由于替换导致程序发生变化,使得JVM找不到相关代码出现问题。
确认阶段:查看相关类似问题,在https://forums.oracle.com/forums/thread.jspa?threadID=1540064发现一个类似问题。通过答复可以确认此问题由无法访问到相关jar或原类产生。
至此,问题定位。原因也明了了。
总结一下:JVM crash后 ,避免查看core.pid文件。直接分析hs_error_pidXXXX.log文件。通过其中的异常信息定位分析原因,断定可能的问题点,分析验证即可。
还可参见:http://www.oracle.com/technetwork/java/javase/crashes-137240.html