最近hadoop系统在升级后发现一个诡异的问题: jps命令返回的都是process information unavailable, jstack, jmap等所有使用attach api的命令都有类似问题.
以为是Jps有问题, 查看Jps源代码, 得知使用 jps -J-Djps.debug=true -J-Djps.printStackTrace=true 可以获得Jps错误详细信息, 如下:
16373 -- process information unavailable
Could not attach to 16373
sun.jvmstat.monitor.MonitorException: Could not attach to 16373
at sun.jvmstat.perfdata.monitor.protocol.local.PerfDataBuffer.<init>(PerfDataBuffer.java:91)
at sun.jvmstat.perfdata.monitor.protocol.local.LocalMonitoredVm.<init>(LocalMonitoredVm.java:68)
at sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.getMonitoredVm(MonitoredHostProvider.java:77)
at sun.tools.jps.Jps.main(Jps.java:92)
Caused by: java.io.IOException: Operation not permitted
at sun.misc.Perf.attach(Native Method)
at sun.misc.Perf.attachImpl(Perf.java:270)
at sun.misc.Perf.attach(Perf.java:200)
at sun.jvmstat.perfdata.monitor.protocol.local.PerfDataBuffer.<init>(PerfDataBuffer.java:64)
... 3 more
确定问题是权限问题, 可抛出异常的却是native code.无奈只有下载openjdk源代码(native code部分在src.zip的jdk类库中没有源代码), 在@RednaxelaFX 的帮助下,找到了罪魁祸首:
hotspot/src/os/linux/vm/perfMemory_linux.cpp
static bool is_directory_secure(const char* path) { struct stat statbuf; int result = 0; RESTARTABLE(::lstat(path, &statbuf), result); if (result == OS_ERR) { return false; } // the path exists, now check it's mode if (S_ISLNK(statbuf.st_mode) || !S_ISDIR(statbuf.st_mode)) { // the path represents a link or some non-directory file type, // which is not what we expected. declare it insecure. // return false; } else { // we have an existing directory, check if the permissions are safe. // if ((statbuf.st_mode & (S_IWGRP|S_IWOTH)) != 0) { // the directory is open for writing and could be subjected // to a symlnk attack. declare it insecure. // return false; } } return true; }
原来目录权限是S_IWGRP | S_IWOTH都会有问题.查看/tmp/hsperfdata_mapred发现权限被人修改成了777. 修改回755后, 问题解决.
最后, 交代一下关于jps的基础知识, jps, jstack等都是通过/tmp/hsperfdata_${user_name} 来确定正在运行的java进程pid等信息. 如果启动java进程时使用-Djava.io.tmpdir 后, jps等可能会由于找不到对应的数据而有问题. 这次遇到的是该目录的权限问题.
--EOF--