由于需要将关系数据库数据通过VDataHub工具导入到HBase表,虽然在VDataHub中进行了流控,但是由于集群还有其他任务在跑,导致整个集群磁盘IO几乎跑满,很多put操作没法写入,在hbase服务端可以看到很多类似以下的异常日志:
2015-06-23 13:45:18,844 WARN [RpcServer.handler=71,port=60020] ipc.RpcServer: (responseTooSlow): {"processingtimems":18758,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)","client":":54083","starttimems":1435038300085,"queuetimems":0,"class":"HRegionServer","responsesize":503,"method":"Multi"}
2015-06-23 13:45:18,844 WARN [RpcServer.handler=71,port=60020] ipc.RpcServer: RpcServer.respondercallId: 9805 service: ClientService methodName: Multi size: 119.3 K connection: :54083: output error
2015-06-23 13:45:18,845 WARN [RpcServer.handler=71,port=60020] ipc.RpcServer: RpcServer.handler=71,port=60020: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
2015-06-23 13:45:19,080 WARN [RpcServer.handler=77,port=60020] ipc.RpcServer: (responseTooSlow): {"processingtimems":18725,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)","client":"55791","starttimems":1435038300354,"queuetimems":0,"class":"HRegionServer","responsesize":675,"method":"Multi"}
2015-06-23 13:45:19,080 WARN [RpcServer.handler=77,port=60020] ipc.RpcServer: RpcServer.respondercallId: 9354 service: ClientService methodName: Multi size: 160.1 K connection: :55791: output error
2015-06-23 13:45:19,081 WARN [RpcServer.handler=77,port=60020] ipc.RpcServer: RpcServer.handler=77,port=60020: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
CallRunner代码片段
public void run() {
try {
if (!call.connection.channel.isOpen()) {
if (RpcServer.LOG.isDebugEnabled()) {
RpcServer.LOG.debug(Thread.currentThread().getName() + ": skipped " + call);
}
return;
}
this.status.setStatus("Setting up call");
this.status.setConnection(call.connection.getHostAddress(), call.connection.getRemotePort());
if (RpcServer.LOG.isDebugEnabled()) {
UserGroupInformation remoteUser = call.connection.user;
RpcServer.LOG.debug(call.toShortString() + " executing as " +
((remoteUser == null) ? "NULL principal" : remoteUser.getUserName()));
}
Throwable errorThrowable = null;
String error = null;
Pair resultPair = null;
RpcServer.CurCall.set(call);
TraceScope traceScope = null;
try {
if (!this.rpcServer.isStarted()) {
throw new ServerNotRunningYetException("Server is not running yet");
}
if (call.tinfo != null) {
traceScope = Trace.startSpan(call.toTraceString(), call.tinfo);
}
RequestContext.set(userProvider.create(call.connection.user), RpcServer.getRemoteIp(),
call.connection.service);
// make the call
resultPair = this.rpcServer.call(call.service, call.md, call.param, call.cellScanner,
call.timestamp, this.status);
} catch (Throwable e) {
RpcServer.LOG.debug(Thread.currentThread().getName() + ": " + call.toShortString(), e);
errorThrowable = e;
error = StringUtils.stringifyException(e);
} finally {
if (traceScope != null) {
traceScope.close();
}
// Must always clear the request context to avoid leaking
// credentials between requests.
RequestContext.clear();
}
RpcServer.CurCall.set(null);
this.rpcServer.addCallSize(call.getSize() * -1);
// Set the response for undelayed calls and delayed calls with
// undelayed responses.
if (!call.isDelayed() || !call.isReturnValueDelayed()) {
Message param = resultPair != null ? resultPair.getFirst() : null;
CellScanner cells = resultPair != null ? resultPair.getSecond() : null;
call.setResponse(param, cells, errorThrowable, error);
}
call.sendResponseIfReady();
this.status.markComplete("Sent response");
this.status.pause("Waiting for a call");
} catch (OutOfMemoryError e) {
if (this.rpcServer.getErrorHandler() != null) {
if (this.rpcServer.getErrorHandler().checkOOME(e)) {
RpcServer.LOG.info(Thread.currentThread().getName() + ": exiting on OutOfMemoryError");
return;
}
} else {
// rethrow if no handler
throw e;
}
} catch (ClosedChannelException cce) {
RpcServer.LOG.warn(Thread.currentThread().getName() + ": caught a ClosedChannelException, " +
"this means that the server was processing a " +
"request but the client went away. The error message was: " +
cce.getMessage());
} catch (Exception e) {
RpcServer.LOG.warn(Thread.currentThread().getName()
+ ": caught: " + StringUtils.stringifyException(e));
}
}
既然这样,修复的办法便很明了了,不管是否异常,在最后都进行减操作,直接放到finally代码块中,google一下,发现果然有人也遇到同样的问题,在去年8月份的时候社区已经修复了,修复的版本是0.98.6,不过我们的版本是cdh 0.98.1,只能靠自己源码编译改了,坑爹呀。
http://qnalist.com/questions/5065474/ipc-queue-size
https://issues.apache.org/jira/browse/HBASE-11705
https://issues.apache.org/jira/secure/attachment/12660609/HBASE-11705.v2.patch
具体如何源码编译改的话,请参考博客里的另外一篇文章,这就不累述了。
修复后图表显示表明处于正常状态了: