yarn3.2源码分析之ResourceManager基于zk的HA机制

概述

  ResourceManager#serviceInit()方法

1、判断是否启动HA。如果yarn.resourcemanager.ha.enabled配置参数为true,则为启动HA。

2、如果启动HA,判断是否启用自动失败重启。如果yarn.resourcemanager.ha.automatic-failover.enabled配置参数为true,则为启动自动失败重启。如果启用自动失败重启,创建EmbeddedElector。

EmbeddedElector有2种类型:

  • CuratorBasedElectorService
  • ActiveStandbyElectorBasedElectorService。

如果yarn.resourcemanager.ha.curator-leader-elector.enabled参数为true,则EmbeddedElector类型为CuratorBasedElectorService,该参数默认为false。

CuratorBasedElectorService实现了curator框架的LeaderLatchListener接口。

curator框架的LeaderLatch封装了zk的主从选举,在当前进程当选为leader节点时,LeaderLatch会回调LeaderLatchListener的isLeader方法。

isLeader方法是RM在当选为leader节点后的执行逻辑。在RM当选为leader节点后,将会启动RMActiveServices。

 // Set HA configuration should be done before login
    this.rmContext.setHAEnabled(HAUtil.isHAEnabled(this.conf));
    if (this.rmContext.isHAEnabled()) {
      HAUtil.verifyAndSetConfiguration(this.conf);
    }

// elector must be added post adminservice
    if (this.rmContext.isHAEnabled()) {
      // If the RM is configured to use an embedded leader elector,
      // initialize the leader elector.
      if (HAUtil.isAutomaticFailoverEnabled(conf)
          && HAUtil.isAutomaticFailoverEmbedded(conf)) {
        EmbeddedElector elector = createEmbeddedElector();
        addIfService(elector);
        rmContext.setLeaderElectorService(elector);
      }
    }

 RM基于zk的主从选举过程

CuratorBasedElectorService启动leaderLatch

 private void initAndStartLeaderLatch() throws Exception {
    leaderLatch = new LeaderLatch(curator, latchPath, rmId);
    leaderLatch.addListener(this);
    leaderLatch.start();
  }

 leaderLatch竞选leader节点,并当选为leader节点时回调LeaderLatchListener的isLeader方法

略,请参阅curator2.1源码分析之LeaderLatch封装ZK主从选举

CuratorBasedElectorService#isLeader方法

 RM在当选为leader节点后的处理逻辑

public void isLeader() {
    LOG.info(rmId + "is elected leader, transitioning to active");
    try {
      rm.getRMContext().getRMAdminService()
          .transitionToActive(
          new HAServiceProtocol.StateChangeRequestInfo(
              HAServiceProtocol.RequestSource.REQUEST_BY_ZKFC));
    } catch (Exception e) {
      LOG.info(rmId + " failed to transition to active, giving up leadership",
          e);
      notLeader();
      rejoinElection();
    }
  }

 AdminService#transitionToActive()方法

public synchronized void transitionToActive(
      HAServiceProtocol.StateChangeRequestInfo reqInfo) throws IOException {
    if (isRMActive()) {
      return;
    }
    // call refreshAdminAcls before HA state transition
    // for the case that adminAcls have been updated in previous active RM
    try {
      refreshAdminAcls(false);
    } catch (YarnException ex) {
      throw new ServiceFailedException("Can not execute refreshAdminAcls", ex);
    }

    UserGroupInformation user = checkAccess("transitionToActive");
    checkHaStateChange(reqInfo);

    try {
      // call all refresh*s for active RM to get the updated configurations.
      refreshAll();
    } catch (Exception e) {
      rm.getRMContext()
          .getDispatcher()
          .getEventHandler()
          .handle(
              new RMFatalEvent(RMFatalEventType.TRANSITION_TO_ACTIVE_FAILED,
                  e, "failure to refresh configuration settings"));
      throw new ServiceFailedException(
          "Error on refreshAll during transition to Active", e);
    }

    try {
      rm.transitionToActive();
    } catch (Exception e) {
      RMAuditLogger.logFailure(user.getShortUserName(), "transitionToActive",
          "", "RM",
          "Exception transitioning to active");
      throw new ServiceFailedException(
          "Error when transitioning to Active mode", e);
    }

    RMAuditLogger.logSuccess(user.getShortUserName(), "transitionToActive",
        "RM");
  }

 ResourceManager#transitionToActive()方法

synchronized void transitionToActive() throws Exception {
    if (rmContext.getHAServiceState() == HAServiceProtocol.HAServiceState.ACTIVE) {
      LOG.info("Already in active state");
      return;
    }
    LOG.info("Transitioning to active state");

    this.rmLoginUGI.doAs(new PrivilegedExceptionAction() {
      @Override
      public Void run() throws Exception {
        try {
//启动RMActiveServices
          startActiveServices();
          return null;
        } catch (Exception e) {
          reinitialize(true);
          throw e;
        }
      }
    });

    rmContext.setHAServiceState(HAServiceProtocol.HAServiceState.ACTIVE);
    LOG.info("Transitioned to active state");
  }

  ResourceManager#startActiveServices()方法

void startActiveServices() throws Exception {
    if (activeServices != null) {
      clusterTimeStamp = System.currentTimeMillis();
      activeServices.start();
    }
  }

RM基于ZK的状态存储

 ResourceManager.RMActiveServices#serviceInit()方法

RMActiveServices在服务初始化时,根据yarn.resourcemanager.recovery.enabled参数决定是否在active RM启动后恢复它的状态。如果参数参数为true,调用RMStateStoreFactory#getStore()初始化RMStateStore。RMStateStoreFactory会根据yarn.resourcemanager.store.class参数反射生成相应的RMStateStore。

所以,如果yarn.resourcemanager.recovery.enabled参数为true,必须设置yarn.resourcemanager.store.class参数。

RMStateStore有几种类型如下:

  • NullRMStateStore
  • MemoryRMStateStore
  • FileSystemRMStateStore
  • LeveldbRMStateStore
  • ZKRMStateStore

如果设置yarn.resourcemanager.store.class参数为ZKRMStateStore,则ResourceManager使用基于zk的状态存储。

recoveryEnabled = conf.getBoolean(YarnConfiguration.RECOVERY_ENABLED,
          YarnConfiguration.DEFAULT_RM_RECOVERY_ENABLED);

      RMStateStore rmStore = null;
      if (recoveryEnabled) {
        rmStore = RMStateStoreFactory.getStore(conf);
        boolean isWorkPreservingRecoveryEnabled =
            conf.getBoolean(
              YarnConfiguration.RM_WORK_PRESERVING_RECOVERY_ENABLED,
              YarnConfiguration.DEFAULT_RM_WORK_PRESERVING_RECOVERY_ENABLED);
        rmContext
            .setWorkPreservingRecoveryEnabled(isWorkPreservingRecoveryEnabled);
      } else {
        rmStore = new NullRMStateStore();
      }

ResourceManager#serviceStart()方法

@Override
    protected void serviceStart() throws Exception {
      RMStateStore rmStore = rmContext.getStateStore();
      // The state store needs to start irrespective of recoveryEnabled as apps
      // need events to move to further states.
      rmStore.start();
//是否在active RM启动后恢复它的状态
      if(recoveryEnabled) {
        try {
          LOG.info("Recovery started");
          rmStore.checkVersion();
          if (rmContext.isWorkPreservingRecoveryEnabled()) {
            rmContext.setEpoch(rmStore.getAndIncrementEpoch());
          }
          RMState state = rmStore.loadState();
          recover(state);
          LOG.info("Recovery ended");
        } catch (Exception e) {
          // the Exception from loadState() needs to be handled for
          // HA and we need to give up master status if we got fenced
          LOG.error("Failed to load/recover state", e);
          throw e;
        }
      } else {
        if (HAUtil.isFederationEnabled(conf)) {
          long epoch = conf.getLong(YarnConfiguration.RM_EPOCH,
              YarnConfiguration.DEFAULT_RM_EPOCH);
          rmContext.setEpoch(epoch);
          LOG.info("Epoch set for Federation: " + epoch);
        }
      }

      super.serviceStart();
    }

ResourceManager#recover()方法

@Override
  public void recover(RMState state) throws Exception {
    // recover RMdelegationTokenSecretManager
    rmContext.getRMDelegationTokenSecretManager().recover(state);

    // recover AMRMTokenSecretManager
    rmContext.getAMRMTokenSecretManager().recover(state);

    // recover reservations
    if (reservationSystem != null) {
      reservationSystem.recover(state);
    }
    // recover applications
    rmAppManager.recover(state);

    setSchedulerRecoveryStartAndWaitTime(state, conf);
  }

ZKRMSateStore#getAndIncrementEpoch()方法

public synchronized long getAndIncrementEpoch() throws Exception {
    String epochNodePath = getNodePath(zkRootNodePath, EPOCH_NODE);
    long currentEpoch = baseEpoch;

    if (exists(epochNodePath)) {
      // load current epoch
      byte[] data = getData(epochNodePath);
      Epoch epoch = new EpochPBImpl(EpochProto.parseFrom(data));
      currentEpoch = epoch.getEpoch();
      // increment epoch and store it
      byte[] storeData = Epoch.newInstance(nextEpoch(currentEpoch)).getProto()
          .toByteArray();
      zkManager.safeSetData(epochNodePath, storeData, -1, zkAcl,
          fencingNodePath);
    } else {
      // initialize epoch node with 1 for the next time.
      byte[] storeData = Epoch.newInstance(nextEpoch(currentEpoch)).getProto()
          .toByteArray();
      zkManager.safeCreate(epochNodePath, storeData, zkAcl,
          CreateMode.PERSISTENT, zkAcl, fencingNodePath);
    }

    return currentEpoch;
  }

待续。。

你可能感兴趣的:(Yarn)