YarnRpc例子-ResourceTracker协议分析

   ResourceManager和NodeManager之间的通信协议是ResourceTracker.。

   

        服务器端和客户端实现都满足,包结构和类名都符合上文所说的规范,ResourceTrackerPBServiceImpl实现了PB服务的BlockingInterface,实际上代理了ResourceTrackerService(真正实现类)的方法,

       第一章

    . 首先我们介绍Hadoop对PB参数和返回值的Java封装机制,客户端要从java类中释放它的PB原型,然后调用proxy相应方法。服务器端则是要封装它,调用真正的

ResourceTrackerService实现类进行相关操作。

     比如抽象类RegisterNodeManagerRequest,它对应的PB Message是

message RegisterNodeManagerRequestProto {
  optional NodeIdProto node_id = 1;
  optional int32 http_port = 3;
  optional ResourceProto resource = 4;
  optional string nm_version = 5;
  repeated NMContainerStatusProto container_statuses = 6;
  repeated ApplicationIdProto runningApplications = 7;
}
字段完全对应,只是抽象类的get/set方法是抽象的,真正封装PB消息的是它的实现类RegisterNodeManagerRequestPBImpl。
    下面我们来分析这个实现类,除了上面必要的字段外,还有三个重要字段,proto、builder、viaProto。viaProto是Bolean
字段为true说明通过proto返回字段信息,否则通过builder。构造函数因此也分为两个。
     封装类要get一个字段首先检查对应成员变量,不为null返回,否则检查proto或builder是否有这个字段,没有返回null,有
则从proto消息转换到成员字段再返回。
     接着我们分析getProto方法,此方法用于proto和java pojo类之间的转换,首先调用mergeLocalToProto方法,此方法如果viaProto
为true会先调用maybeInitBuilder方法,此方法如果builder为null会创建,不为null但viaProto为true也会重新创建,最后把viaProto
置为false。然后调用mergeLocalToBuilder方法,就是把java pojo类非null的成员变量转换为Proto形式(调用成员变量.getProto
方法)后设置到builder中,最后调用builder.build()构建proto,把viaProto置为true,然后再返回这个Proto。
     下面分析成员变量set方法,先调用maybeInitBuilder方法,如果viaProto为true或者builder为null,则创建builder(为了重新
创建Proto,重置builder,viaProto为false表示Proto还在builder过程中,新数据在builder中),并把viaProto置为false,如果
set方法的参数为null,则清空builder中相应字段,否则设置成员变量的值即可。builder中的属性值只有在调用getProto时才会导入到
proto。
   第二章
    然后我们来分析,客户端的类,
     2.1 ResourceTrackerPBClientImpl比较简单,构造函数注册ResourceTrackerPB。class和Protobuf
RpcEngine的对应关系。用Rpc工厂类获取Proxy对象。剩下是几个协议方法,把参数java类获取他们封装在内部的Proto,调用proxy
对象的对应的方法,并封装返回的proto成java bean。
      客户端从开始调用起是NodeManager,他持有一个NodeStatusUpdater对象,NodeStatusUpdater类持有一个resourceTracker对象。
NodeStatusUpdater对象在resyncWithRM方法中会调用rebootNodeStatusUpdaterAndRegisterWithRM方法,该方法中会调用
resourceTracker对象的registerNodeManager方法。
     至于ResourceTracker的另一个rpc方法调用是在NodeManager的service.start()中,由于它继承自compositService所以他还包含
其他service,比如NodeStatusUpdater服务,NodeManager服务在service.init中会在自身下级服务加入NodeStatusUpdater服务,然后
service.start()中调用NodeStatusUpdater的service.start().此方法进一步调用startStatusUpdater方法,此方法会启动一个线程
,run方法中会调用resourceTracker.nodeHeartbeat方法。
     2.2 接下来就是NodeStatusUpdater类的resourceTracker对象的创建问题,他来自getRMClient方法,里面调用ServerRMProxy.
createRMProxy(conf, ResourceTracker.class)方法,里面调用同名其他方法。此方法先创建retryPolicy
    RetryPolicy接口(方法shouldRetry(Exception e, int retries(重试数), int failovers(故障备援转移次数), 
boolean isIdempotentOrAtMostOnce(方法是否是幂等性))返回RetryAction)的各种针对产生的异常的重试策略,RetryAction有
失败,重试,故障恢复后重试三种,并有重试时间字段。RetryAction的不同取决于Exception的不同。这里是RetryPolicy
实现类FailoverOnNetworkExceptionRetry重试时间以指数(*2)增长。
   //retries当前第几次重试 failovers已恢复次数
  @Override
  public RetryAction shouldRetry(Exception e, int retries,
      int failovers, boolean isIdempotentOrAtMostOnce) throws Exception {
    //恢复次数超过阀值抛出异常
    if (failovers >= maxFailovers) {
      return new RetryAction(RetryAction.RetryDecision.FAIL, 0,
          "failovers (" + failovers + ") exceeded maximum allowed ("  + maxFailovers + ")");
    }
    //重试次数超过阀值跑出异常
    if (retries - failovers > maxRetries) {
      return new RetryAction(RetryAction.RetryDecision.FAIL, 0, "retries ("  + retries + ") exceeded maximum allowed (" + maxRetries + ")");
    }
    //连不上都应该恢复重试
    if (e instanceof ConnectException ||
        e instanceof NoRouteToHostException ||
        e instanceof UnknownHostException ||
        e instanceof StandbyException ||
        e instanceof ConnectTimeoutException ||
        isWrappedStandbyException(e)) {
      return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY,
          getFailoverOrRetrySleepTime(failovers));
    //指定错误都应该重试
    } else if (e instanceof RetriableException
        || getWrappedRetriableException(e) != null) {
      // RetriableException or RetriableException wrapped  return new RetryAction(RetryAction.RetryDecision.RETRY,
            getFailoverOrRetrySleepTime(retries));
    //其他socket或IOException,除了RemoteException(IOException子类),方法为幂等性的就重试,否则失败
    } else if (e instanceof SocketException
        || (e instanceof IOException && !(e instanceof RemoteException))) {
      if (isIdempotentOrAtMostOnce) {
        return RetryAction.FAILOVER_AND_RETRY;
      } else {
        return new RetryAction(RetryAction.RetryDecision.FAIL, 0,
            "the invoked method is not idempotent, and unable to determine "  + "whether it was invoked");
      }
    其他Exception或服务端错误(RemoteException)则用fallbackPolicy,立刻失败!
    } else {
        return fallbackPolicy.shouldRetry(e, retries, failovers,
            isIdempotentOrAtMostOnce);
    }
  }
}
 如果支持HA则创建ConfiguredRMFailoverProxyProvider(支持恢复重试的proxy提供者),此类最重要的是getProxy()方法,获取真正
的proxy,最后调用的是RMProxy.getProxy方法,
   
@Private
static <T> T getProxy(final Configuration conf,
    final Class<T> protocol, final InetSocketAddress rmAddress)
    throws IOException {
  return UserGroupInformation.getCurrentUser().doAs(
    new PrivilegedAction<T>() {
      @Override
      public T run() {
        return (T) YarnRPC.create(conf).getProxy(protocol, rmAddress, conf);
      }
    });
}
  正好调用YarnRpc API。
 然后再调用RetryProxy.create方法,最后创建动态代理的方法是:
   
/**  * Create a proxy for an interface of implementations of that interface using  * the given {@link FailoverProxyProvider} and the same retry policy for each  * method in the interface.  *  * @param iface the interface that the retry will implement  * @param proxyProvider provides implementation instances whose methods should be retried  * @param retryPolicy the policy for retrying or failing over method call failures  * @return the retry proxy  */ public static <T> Object create(Class<T> iface,
    FailoverProxyProvider<T> proxyProvider, RetryPolicy retryPolicy) {
 //动态代理
  return Proxy.newProxyInstance(
      proxyProvider.getInterface().getClassLoader(),
      new Class<?>[] { iface },
 // ConfiguredRMFailoverProxyProvider
      new RetryInvocationHandler<T>(proxyProvider, retryPolicy)
      );
}
   
 2.3 接下来我们看看RetryInvocationHandler的构造函数:
protected RetryInvocationHandler(FailoverProxyProvider<T> proxyProvider,
    RetryPolicy defaultPolicy,
    Map<String, RetryPolicy> methodNameToPolicyMap) {
  this.proxyProvider = proxyProvider;
  this.defaultPolicy = defaultPolicy;
  this.methodNameToPolicyMap = methodNameToPolicyMap;
//返回包含真正Proxy的proxyInfo
  this.currentProxy = proxyProvider.getProxy();
}

还有invoke方法
Override
public Object invoke(Object proxy, Method method, Object[] args)
  throws Throwable {
 //缓存的重试策略				
  RetryPolicy policy = methodNameToPolicyMap.get(method.getName());
  if (policy == null) {
    policy = defaultPolicy;
  }
  
  // The number of times this method invocation has been failed over.  int invocationFailoverCount = 0;
 //proxy是否是Proxy类的实例而且它的InvocationHandler是RpcInvocationHandler
  final boolean isRpc = isRpcInvocation(currentProxy.proxy);
  final int callId = isRpc? Client.nextCallId(): RpcConstants.INVALID_CALL_ID;
  int retries = 0;
 //包含多次rpc重试
  while (true) {
    // The number of times this invocation handler has ever been failed over,  // before this method invocation attempt. Used to prevent concurrent  // failed method invocations from triggering multiple failover attempts.  long invocationAttemptFailoverCount;
    synchronized (proxyProvider) {
      invocationAttemptFailoverCount = proxyProviderFailoverCount;
    }

    if (isRpc) {
//检查两个参数是否是无效值,而且原来的callId要为空
      Client.setCallIdAndRetryCount(callId, retries);
    }
    try {
//用真正proxy来执行此方法。
      Object ret = invokeMethod(method, args);
      hasMadeASuccessfulCall = true;
      return ret;
    } catch (Exception e) {
//如果出错,就看逻辑是否重试
      if (Thread.currentThread().isInterrupted()) {
        // If interrupted, do not retry.  throw e;
      }
//从方法的注释看方法是幂等的或者最多一次的
      boolean isIdempotentOrAtMostOnce = proxyProvider.getInterface()
          .getMethod(method.getName(), method.getParameterTypes())
          .isAnnotationPresent(Idempotent.class);
      if (!isIdempotentOrAtMostOnce) {
        isIdempotentOrAtMostOnce = proxyProvider.getInterface()
            .getMethod(method.getName(), method.getParameterTypes())
            .isAnnotationPresent(AtMostOnce.class);
      }
//传入retries次数,failover次数,该方法如上面分析,获得RetryAction。
      RetryAction action = policy.shouldRetry(e, retries++,
          invocationFailoverCount, isIdempotentOrAtMostOnce);
      if (action.action == RetryAction.RetryDecision.FAIL) {
        //抛出失败原因
        if (action.reason != null) {
          LOG.warn("Exception while invoking " + currentProxy.proxy.getClass()
              + "." + method.getName() + " over " + currentProxy.proxyInfo  + ". Not retrying because " + action.reason, e);
        }
        throw e;
      } else { // retry or failover  // avoid logging the failover if this is the first call on this  // proxy object, and we successfully achieve the failover without  // any flip-flopping
//第一次失败重试没日志  boolean worthLogging = 
          !(invocationFailoverCount == 0 && !hasMadeASuccessfulCall);
        worthLogging |= LOG.isDebugEnabled();
//根据条件不同或者是否开启debug模式打印不同日志
        if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY &&
            worthLogging) {
          String msg = "Exception while invoking " + method.getName()
              + " of class " + currentProxy.proxy.getClass().getSimpleName()
              + " over " + currentProxy.proxyInfo;

          if (invocationFailoverCount > 0) {
            msg += " after " + invocationFailoverCount + " fail over attempts"; 
          }
          msg += ". Trying to fail over " + formatSleepMessage(action.delayMillis);
          LOG.info(msg, e);
//action为retry或者FAILOVER_AND_RETRY的第一次恢复而且开启的debug模式,打印以下日志。
        } else {
          if(LOG.isDebugEnabled()) {
            LOG.debug("Exception while invoking " + method.getName()
                + " of class " + currentProxy.proxy.getClass().getSimpleName()
                + " over " + currentProxy.proxyInfo + ". Retrying "  + formatSleepMessage(action.delayMillis), e);
          }
        }
        //睡眠重试策略的间隔
        if (action.delayMillis > 0) {
          Thread.sleep(action.delayMillis);
        }
        
        if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
          // Make sure that concurrent failed method invocations only cause a  // single actual fail over.  synchronized (proxyProvider) {
//防止别的地方也同时进行恢复
            if (invocationAttemptFailoverCount == proxyProviderFailoverCount) {
              //把ResourceManager的id换为下一个HA RM列表的id
              proxyProvider.performFailover(currentProxy.proxy);
              proxyProviderFailoverCount++;
            } else {
              LOG.warn("A failover has occurred since the start of this method"  + " invocation attempt.");
            }
           //获取对应新的ID RM Address的Proxy
            currentProxy = proxyProvider.getProxy();
          }
          invocationFailoverCount++;
        }
      }
    }
  }
}
       然后我们看看proxyProvider.getProxy()方法
      
final InetSocketAddress rmAddress = rmProxy.getRMAddress(conf, protocol);
    
    
    
    
最后调用
conf.getSocketAddr(
  YarnConfiguration.RM_RESOURCE_TRACKER_ADDRESS,
  YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_ADDRESS,
  YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_PORT);

 会去获取当前RM ID,然后再去配置文件获取当前ID的Address

        让我们看看failover的方法(ConfiguredRMFailoverProxyProvider
@Override
public synchronized void performFailover(T currentProxy) {
 //换成新的id下标
  currentProxyIndex = (currentProxyIndex + 1) % rmServiceIds.length;
  //设置当前resourceManager的id,
  conf.set(YarnConfiguration.RM_HA_ID, rmServiceIds[currentProxyIndex]);
  LOG.info("Failing over to " + rmServiceIds[currentProxyIndex]);
}
 
第三章 服务端部分代码
   
    服务端代码在ResourceManager,他有一个ResourceTrackerService类成员变量,该类既是协议的实现类,又是服
务器端的启动代码,resourceTrackerService它是ResourceManager组合服务的一个子服务,会被调用start和init方法
,init方法是读取配置文件的配置,start方法如下:
   
@Override
protected void serviceStart() throws Exception {
  super.serviceStart();
  // ResourceTrackerServer authenticates NodeManager via Kerberos if  // security is enabled, so no secretManager.  Configuration conf = getConfig();
//使用YarnRpc类
  YarnRPC rpc = YarnRPC.create(conf);
  this.server =
    rpc.getServer(ResourceTracker.class, this, resourceTrackerAddress,
        conf, null,
        conf.getInt(YarnConfiguration.RM_RESOURCE_TRACKER_CLIENT_THREAD_COUNT, 
            YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_CLIENT_THREAD_COUNT));
  
  // Enable service authorization?
 //如果支持认证,则加入或刷新安全认证的配置。  if (conf.getBoolean(
      CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION, 
      false)) {
    InputStream inputStream =
        this.rmContext.getConfigurationProvider()
            .getConfigurationInputStream(conf,
                YarnConfiguration.HADOOP_POLICY_CONFIGURATION_FILE);
    if (inputStream != null) {
      conf.addResource(inputStream);
    }
    refreshServiceAcls(conf, RMPolicyProvider.getInstance());
  }

  this.server.start();
  conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
      YarnConfiguration.RM_RESOURCE_TRACKER_ADDRESS,
      YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_ADDRESS,
                         server.getListenerAddress());
}
    


你可能感兴趣的:(YarnRpc例子-ResourceTracker协议分析)