HBase RegionServer 退出 ( ZooKeeper session expired)

阅读更多

RegionServer 由于 ZooKeeper session expired 而退出,头疼了很久,总结可能的原因:

 

1、网络不好

2、GC时间过长,程序暂停导致租约过期

3、CPU忙,维护zookeeper的线程不能及时得到执行机会(调度)

 

解决方案:

  1. RS配置zookeeper.session.timeout时间长点,我配置的180000
  2. RS配置hbase.regionserver.restart.on.zk.expire设置为true

参考下源代码

 

 /**
   * We register ourselves as a watcher on the master address ZNode. This is
   * called by ZooKeeper when we get an event on that ZNode. When this method
   * is called it means either our master has died, or a new one has come up.
   * Either way we need to update our knowledge of the master.
   * @param event WatchedEvent from ZooKeeper.
   */
  public void process(WatchedEvent event) {
    EventType type = event.getType();
    KeeperState state = event.getState();
    LOG.info("Got ZooKeeper event, state: " + state + ", type: " +
      type + ", path: " + event.getPath());

    // Ignore events if we're shutting down.
    if (stopRequested.get()) {
      LOG.debug("Ignoring ZooKeeper event while shutting down");
      return;
    }

    if (state == KeeperState.Expired) {
      LOG.error("ZooKeeper session expired");
      boolean restart =
        this.conf.getBoolean("hbase.regionserver.restart.on.zk.expire", false);
      if (restart) {
        restart();
      } else {
        abort();
      }
    } else if (type == EventType.NodeDeleted) {
      watchMasterAddress();
    } else if (type == EventType.NodeCreated) {
      getMaster();

      // ZooKeeper watches are one time only, so we need to re-register our watch.
      watchMasterAddress();
    }
  }

可以看出来  hbase.regionserver.restart.on.zk.expire设置为true的话,会restart否则会abort,这样可以防止RS自杀。不过我看官方文档没有给出 hbase.regionserver.restart.on.zk.expire配置。

 

你可能感兴趣的:(HBase,zk,UP)