记录一次Hadoop二次开发过程

背景

在使用Hadoop集群过程中,出现如下错误,Namenode 宕机,日志如下

2016-08-09 16:33:51,526 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:52,169 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2016-08-09 16:33:52,526 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7002 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:53,527 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8003 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:54,529 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9004 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:55,530 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10006 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:56,531 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 11007 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:57,533 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 12008 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:58,533 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 13009 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:33:59,534 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 14010 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:34:00,536 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15011 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:34:01,537 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 16013 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:34:02,538 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 17014 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:34:03,540 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 18015 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:34:04,541 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [10.80.248.17:8486]
2016-08-09 16:34:05,525 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [10.80.248.17:8486, 10.80.248.18:8486, 10.80.248.19:8486], stream=QuorumOutputStream starting at txid 2947))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
    at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
    at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
    at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
    at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
    at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
    at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
    at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
    at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:639)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1221)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1158)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1238)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6344)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:933)
    at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:139)
    at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:11214)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
2016-08-09 16:34:05,526 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 2947
2016-08-09 16:34:05,600 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2016-08-09 16:34:05,733 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ut07/10.80.248.17
************************************************************/

分析

可以看到Namenode宕机了,所以查看是否另一台Namenode切换到了active状态。查看另一台机器的50070端口,确认切换成功。
那么就分析下造成错误的原因,从错误堆栈信息可以看到定位到Hadoop源码,如下

    at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)

那我们就看看这段代码干了点啥,直接定位到方法

 public synchronized void waitFor(
      int minResponses, int minSuccesses, int maxExceptions,
      int millis, String operationName)
      throws InterruptedException, TimeoutException {
    //假设之而立是20:00:00
    long st = Time.monotonicNow();
    //nextLogTime = 20:00:06
    long nextLogTime = st + (long)(millis * WAIT_PROGRESS_INFO_THRESHOLD);
    /*
     *  writeTxnsTimeoutMs = conf.getInt(
     *  DFSConfigKeys.DFS_QJOURNAL_WRITE_TXNS_TIMEOUT_KEY,
     *  DFSConfigKeys.DFS_QJOURNAL_WRITE_TXNS_TIMEOUT_DEFAULT)
     *  et = 20:00:20
     */
    long et = st + millis;
    while (true) {
      checkAssertionErrors();
      /*如果jn 有5台
       * minResponses =3 &&countResponses>=3 代表有三台回复了。返回成功
       * minSuccesses =3 &&countSuccesses>3 代表成功了
       * 2>=0 && countExceptions>2 代表写入失败了
       * 以上三种情况都是在jn正常返回的情况下,那么如果此时jn没有返回
       */
      if (minResponses > 0 && countResponses() >= minResponses) return;
      if (minSuccesses > 0 && countSuccesses() >= minSuccesses) return;
      if (maxExceptions >= 0 && countExceptions() > maxExceptions) return;
      //now = 20:00:01
      //now = 20:00:06
      //now = 20:00:07
      long now = Time.monotonicNow();
      //第一次 如果当前时间点>刷日志的时间 20:00:01>20:00:06 不满足
      //第二次 20:00:06>20:00:06 还不满足 那么程序继续
      //第三次 20:00:07>20:00:06 进入
      if (now > nextLogTime) {
        //waited = 20:00:07 - 20:00:00 = 7 代表等待jn 响应 7 s
        long waited = now - st;
        String msg = String.format(
            "Waited %s ms (timeout=%s ms) for a response for %s",
            waited, millis, operationName);

        if (!successes.isEmpty()) {
          msg += ". Succeeded so far: [" + Joiner.on(",").join(successes.keySet()) + "]";
        }
        if (!exceptions.isEmpty()) {
          msg += ". Exceptions so far: [" + getExceptionMapString() + "]";
        }
        if (successes.isEmpty() && exceptions.isEmpty()) {
          msg += ". No responses yet.";
        }
        if (waited > millis * WAIT_PROGRESS_WARN_THRESHOLD) {
          QuorumJournalManager.LOG.warn(msg);
        } else {
          QuorumJournalManager.LOG.info(msg);
        }
        //nextLogTime = 20:00:07 + 1 = 20:00:08
        nextLogTime = now + WAIT_PROGRESS_INTERVAL_MILLIS;
      }
      //第一次 = 20:00:20 - 20:00:01 = 19
      //第二次 = 20:00:20 - 20:00:06 = 14
      //第三次 = 20:00:20 - 20:00:07 = 13
      //第四次 发生fullGc假如20s  20:00:20 - 20:00:07+1+20 = -8 
      /*
       * 经过三次之后,我们发现,jn会在等待多次之后,循环判断jn是否有响应
       * 那么,如果判断了十几次,每次一秒还是没有响应的话,说明jn存在问题
       * 直到if (rem <= 0) 条件成立,那么namenode,将会推出,因为数据同步失败
       * 将会导致这个集群不可用,所以namenode挂掉也正常
       * 但是这回存在一个问题,就是如果处此时在et-now 这个阶段,出现了fullgc,
       * 那么也会导致Namenode挂掉
       */

      long rem = et - now;
      //第四次的时候,会抛出异常
      if (rem <= 0) {
        throw new TimeoutException();
      }

      rem = Math.min(rem, nextLogTime - now);
      rem = Math.max(rem, 1);
      //第一次 rem = 5 程序会睡5 s
      //第二次 rem = 1
      wait(rem);
    }
  }

参看注释,得出了如下结论。那么我们可以通过两种方式来处理这种情况。

结论

1 . 增加超时时间调整 DFS_QJOURNAL_WRITE_TXNS_TIMEOUT_KEY参数的配置时长,默认是20S,此处我们可以调整到60S。
2. 修改代码,将fullGC 时间排除。在这里附上修改后的源码。
增加了一个时间计数的类

  static class StopWatch{
    private long start ;
    private long elapse;

    public void start(){
      this.start=System.currentTimeMillis();
    }
    public void reset(){
      this.start=0L;
      this.elapse = 0L;
    }
    public void restart(){
      reset();
      start();
    }

    public long getStart() {
      return start;
    }

    public void setStart(long now) {
      this.start = now;
    }

    public long getElapse() {
      return System.currentTimeMillis() - getStart();
    }
  }




  public synchronized void waitFor(
      int minResponses, int minSuccesses, int maxExceptions,
      int millis, String operationName)
      throws InterruptedException, TimeoutException {
    //假设之而立是20:00:00
    long st = Time.monotonicNow();
    //nextLogTime = 20:00:06
    long nextLogTime = st + (long)(millis * WAIT_PROGRESS_INFO_THRESHOLD);
    /*
     *  writeTxnsTimeoutMs = conf.getInt(
     *  DFSConfigKeys.DFS_QJOURNAL_WRITE_TXNS_TIMEOUT_KEY,
     *  DFSConfigKeys.DFS_QJOURNAL_WRITE_TXNS_TIMEOUT_DEFAULT)
     *  et = 20:00:20
     */
    long et = st + millis;
    while (true) {
      checkAssertionErrors();
      /*如果jn 有5台
       * minResponses =3 &&countResponses>=3 代表有三台回复了。返回成功
       * minSuccesses =3 &&countSuccesses>3 代表成功了
       * 2>=0 && countExceptions>2 代表写入失败了
       * 以上三种情况都是在jn正常返回的情况下,那么如果此时jn没有返回
       */
      if (minResponses > 0 && countResponses() >= minResponses) return;
      if (minSuccesses > 0 && countSuccesses() >= minSuccesses) return;
      if (maxExceptions >= 0 && countExceptions() > maxExceptions) return;

      StopWatch sw = new StopWatch();
      sw.start();
      /*
        fullGC 时间点 1
       */
      //now = 20:00:01
      //now = 20:00:06
      //now = 20:00:07
      long now = Time.monotonicNow();
      //第一次 如果当前时间点>刷日志的时间 20:00:01>20:00:06 不满足
      //第二次 20:00:06>20:00:06 还不满足 那么程序继续
      //第三次 20:00:07>20:00:06 进入

      if (now > nextLogTime) {
        //waited = 20:00:07 - 20:00:00 = 7 代表等待jn 响应 7 s
        long waited = now - st;
        String msg = String.format(
            "Waited %s ms (timeout=%s ms) for a response for %s",
            waited, millis, operationName);

        if (!successes.isEmpty()) {
          msg += ". Succeeded so far: [" + Joiner.on(",").join(successes.keySet()) + "]";
        }
        if (!exceptions.isEmpty()) {
          msg += ". Exceptions so far: [" + getExceptionMapString() + "]";
        }
        if (successes.isEmpty() && exceptions.isEmpty()) {
          msg += ". No responses yet.";
        }
        if (waited > millis * WAIT_PROGRESS_WARN_THRESHOLD) {
          QuorumJournalManager.LOG.warn(msg);
        } else {
          QuorumJournalManager.LOG.info(msg);
        }
        //nextLogTime = 20:00:07 + 1 = 20:00:08
        nextLogTime = now + WAIT_PROGRESS_INTERVAL_MILLIS;
      }
      //第一次 = 20:00:20 - 20:00:01 = 19
      //第二次 = 20:00:20 - 20:00:06 = 14
      //第三次 = 20:00:20 - 20:00:07 = 13
      //第四次 发生fullGc假如20s  20:00:20 - 20:00:07+1+20 = -8
      /*
       * 经过三次之后,我们发现,jn会在等待多次之后,循环判断jn是否有响应
       * 那么,如果判断了十几次,每次一秒还是没有响应的话,说明jn存在问题
       * 直到if (rem <= 0) 条件成立,那么namenode,将会推出,因为数据同步失败
       * 将会导致这个集群不可用,所以namenode挂掉也正常
       * 但是这回存在一个问题,就是如果处此时在et-now 这个阶段,出现了fullgc,
       * 那么也会导致Namenode挂掉
       */
      long elapse = sw.getElapse();
      if (elapse>3000){
        //表示这段代码执行的时间大于3s了,那肯定不正常,是fullGC过的
        et = et + elapse;
      }
      
      long rem = et - now;
      //第四次的时候,会抛出异常
      if (rem <= 0) {
          throw new TimeoutException();
      }

      rem = Math.min(rem, nextLogTime - now);
      rem = Math.max(rem, 1);
      //第一次 rem = 5 程序会睡5 s
      //第二次 rem = 1
      wait(rem);
    }
  }

最后在linux环境进行编译,打包上线

你可能感兴趣的:(日常笔记)