在使用Hadoop集群过程中,出现如下错误,Namenode 宕机,日志如下
2016-08-09 16:33:51,526 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:52,169 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2016-08-09 16:33:52,526 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7002 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:53,527 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8003 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:54,529 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9004 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:55,530 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10006 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:56,531 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 11007 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:57,533 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12008 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:58,533 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13009 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:59,534 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 14010 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:34:00,536 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15011 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:34:01,537 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 16013 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:34:02,538 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 17014 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:34:03,540 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 18015 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:34:04,541 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:34:05,525 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [10.80.248.17:8486, 10.80.248.18:8486, 10.80.248.19:8486], stream=QuorumOutputStream starting at txid 2947))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:639)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1221)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1158)
at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1238)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6344)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:933)
at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:139)
at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:11214)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
2016-08-09 16:34:05,526 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 2947
2016-08-09 16:34:05,600 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2016-08-09 16:34:05,733 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ut07/10.80.248.17
************************************************************/
可以看到Namenode宕机了,所以查看是否另一台Namenode切换到了active状态。查看另一台机器的50070端口,确认切换成功。
那么就分析下造成错误的原因,从错误堆栈信息可以看到定位到Hadoop源码,如下
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
那我们就看看这段代码干了点啥,直接定位到方法
public synchronized void waitFor(
int minResponses, int minSuccesses, int maxExceptions,
int millis, String operationName)
throws InterruptedException, TimeoutException {
//假设之而立是20:00:00
long st = Time.monotonicNow();
//nextLogTime = 20:00:06
long nextLogTime = st + (long)(millis * WAIT_PROGRESS_INFO_THRESHOLD);
/*
* writeTxnsTimeoutMs = conf.getInt(
* DFSConfigKeys.DFS_QJOURNAL_WRITE_TXNS_TIMEOUT_KEY,
* DFSConfigKeys.DFS_QJOURNAL_WRITE_TXNS_TIMEOUT_DEFAULT)
* et = 20:00:20
*/
long et = st + millis;
while (true) {
checkAssertionErrors();
/*如果jn 有5台
* minResponses =3 &&countResponses>=3 代表有三台回复了。返回成功
* minSuccesses =3 &&countSuccesses>3 代表成功了
* 2>=0 && countExceptions>2 代表写入失败了
* 以上三种情况都是在jn正常返回的情况下,那么如果此时jn没有返回
*/
if (minResponses > 0 && countResponses() >= minResponses) return;
if (minSuccesses > 0 && countSuccesses() >= minSuccesses) return;
if (maxExceptions >= 0 && countExceptions() > maxExceptions) return;
//now = 20:00:01
//now = 20:00:06
//now = 20:00:07
long now = Time.monotonicNow();
//第一次 如果当前时间点>刷日志的时间 20:00:01>20:00:06 不满足
//第二次 20:00:06>20:00:06 还不满足 那么程序继续
//第三次 20:00:07>20:00:06 进入
if (now > nextLogTime) {
//waited = 20:00:07 - 20:00:00 = 7 代表等待jn 响应 7 s
long waited = now - st;
String msg = String.format(
"Waited %s ms (timeout=%s ms) for a response for %s",
waited, millis, operationName);
if (!successes.isEmpty()) {
msg += ". Succeeded so far: [" + Joiner.on(",").join(successes.keySet()) + "]";
}
if (!exceptions.isEmpty()) {
msg += ". Exceptions so far: [" + getExceptionMapString() + "]";
}
if (successes.isEmpty() && exceptions.isEmpty()) {
msg += ". No responses yet.";
}
if (waited > millis * WAIT_PROGRESS_WARN_THRESHOLD) {
QuorumJournalManager.LOG.warn(msg);
} else {
QuorumJournalManager.LOG.info(msg);
}
//nextLogTime = 20:00:07 + 1 = 20:00:08
nextLogTime = now + WAIT_PROGRESS_INTERVAL_MILLIS;
}
//第一次 = 20:00:20 - 20:00:01 = 19
//第二次 = 20:00:20 - 20:00:06 = 14
//第三次 = 20:00:20 - 20:00:07 = 13
//第四次 发生fullGc假如20s 20:00:20 - 20:00:07+1+20 = -8
/*
* 经过三次之后,我们发现,jn会在等待多次之后,循环判断jn是否有响应
* 那么,如果判断了十几次,每次一秒还是没有响应的话,说明jn存在问题
* 直到if (rem <= 0) 条件成立,那么namenode,将会推出,因为数据同步失败
* 将会导致这个集群不可用,所以namenode挂掉也正常
* 但是这回存在一个问题,就是如果处此时在et-now 这个阶段,出现了fullgc,
* 那么也会导致Namenode挂掉
*/
long rem = et - now;
//第四次的时候,会抛出异常
if (rem <= 0) {
throw new TimeoutException();
}
rem = Math.min(rem, nextLogTime - now);
rem = Math.max(rem, 1);
//第一次 rem = 5 程序会睡5 s
//第二次 rem = 1
wait(rem);
}
}
参看注释,得出了如下结论。那么我们可以通过两种方式来处理这种情况。
1 . 增加超时时间调整 DFS_QJOURNAL_WRITE_TXNS_TIMEOUT_KEY参数的配置时长,默认是20S,此处我们可以调整到60S。
2. 修改代码,将fullGC 时间排除。在这里附上修改后的源码。
增加了一个时间计数的类
static class StopWatch{
private long start ;
private long elapse;
public void start(){
this.start=System.currentTimeMillis();
}
public void reset(){
this.start=0L;
this.elapse = 0L;
}
public void restart(){
reset();
start();
}
public long getStart() {
return start;
}
public void setStart(long now) {
this.start = now;
}
public long getElapse() {
return System.currentTimeMillis() - getStart();
}
}
public synchronized void waitFor(
int minResponses, int minSuccesses, int maxExceptions,
int millis, String operationName)
throws InterruptedException, TimeoutException {
//假设之而立是20:00:00
long st = Time.monotonicNow();
//nextLogTime = 20:00:06
long nextLogTime = st + (long)(millis * WAIT_PROGRESS_INFO_THRESHOLD);
/*
* writeTxnsTimeoutMs = conf.getInt(
* DFSConfigKeys.DFS_QJOURNAL_WRITE_TXNS_TIMEOUT_KEY,
* DFSConfigKeys.DFS_QJOURNAL_WRITE_TXNS_TIMEOUT_DEFAULT)
* et = 20:00:20
*/
long et = st + millis;
while (true) {
checkAssertionErrors();
/*如果jn 有5台
* minResponses =3 &&countResponses>=3 代表有三台回复了。返回成功
* minSuccesses =3 &&countSuccesses>3 代表成功了
* 2>=0 && countExceptions>2 代表写入失败了
* 以上三种情况都是在jn正常返回的情况下,那么如果此时jn没有返回
*/
if (minResponses > 0 && countResponses() >= minResponses) return;
if (minSuccesses > 0 && countSuccesses() >= minSuccesses) return;
if (maxExceptions >= 0 && countExceptions() > maxExceptions) return;
StopWatch sw = new StopWatch();
sw.start();
/*
fullGC 时间点 1
*/
//now = 20:00:01
//now = 20:00:06
//now = 20:00:07
long now = Time.monotonicNow();
//第一次 如果当前时间点>刷日志的时间 20:00:01>20:00:06 不满足
//第二次 20:00:06>20:00:06 还不满足 那么程序继续
//第三次 20:00:07>20:00:06 进入
if (now > nextLogTime) {
//waited = 20:00:07 - 20:00:00 = 7 代表等待jn 响应 7 s
long waited = now - st;
String msg = String.format(
"Waited %s ms (timeout=%s ms) for a response for %s",
waited, millis, operationName);
if (!successes.isEmpty()) {
msg += ". Succeeded so far: [" + Joiner.on(",").join(successes.keySet()) + "]";
}
if (!exceptions.isEmpty()) {
msg += ". Exceptions so far: [" + getExceptionMapString() + "]";
}
if (successes.isEmpty() && exceptions.isEmpty()) {
msg += ". No responses yet.";
}
if (waited > millis * WAIT_PROGRESS_WARN_THRESHOLD) {
QuorumJournalManager.LOG.warn(msg);
} else {
QuorumJournalManager.LOG.info(msg);
}
//nextLogTime = 20:00:07 + 1 = 20:00:08
nextLogTime = now + WAIT_PROGRESS_INTERVAL_MILLIS;
}
//第一次 = 20:00:20 - 20:00:01 = 19
//第二次 = 20:00:20 - 20:00:06 = 14
//第三次 = 20:00:20 - 20:00:07 = 13
//第四次 发生fullGc假如20s 20:00:20 - 20:00:07+1+20 = -8
/*
* 经过三次之后,我们发现,jn会在等待多次之后,循环判断jn是否有响应
* 那么,如果判断了十几次,每次一秒还是没有响应的话,说明jn存在问题
* 直到if (rem <= 0) 条件成立,那么namenode,将会推出,因为数据同步失败
* 将会导致这个集群不可用,所以namenode挂掉也正常
* 但是这回存在一个问题,就是如果处此时在et-now 这个阶段,出现了fullgc,
* 那么也会导致Namenode挂掉
*/
long elapse = sw.getElapse();
if (elapse>3000){
//表示这段代码执行的时间大于3s了,那肯定不正常,是fullGC过的
et = et + elapse;
}
long rem = et - now;
//第四次的时候,会抛出异常
if (rem <= 0) {
throw new TimeoutException();
}
rem = Math.min(rem, nextLogTime - now);
rem = Math.max(rem, 1);
//第一次 rem = 5 程序会睡5 s
//第二次 rem = 1
wait(rem);
}
}