spark源码解析-分析一次完整的远程请求过程

spark版本: 2.0.0

1.概念

1.引入

前面已经介绍了master启动,worker启动和rpc原理,现在结合这些,来探究一下一次完整的远程请求到底是咋样的?就以worker启动后注册到master为例,我们来细细品味一下其远程服务调用过程。

2.远程服务请求过程

worker注册到master的方法是masterEndpoint.ask[RegisterWorkerResponse](RegisterWorker( workerId, host, port, self, cores, memory, workerWebUiUrl),即worker通过访问master服务然后根据master的结果,worker也响应的处理。
在rpc原理博客中,可以知道这代码最终是将message=RegisterWorker添加到outbox中,然后取出来通过RpcOutboxMessage.sendWith方法发送到服务端(master endpoint),在sendWith方法中调用了sendRpc方法中的channel.writeAndFlush(new RpcRequest(requestId, new NioManagedBuffer(message)))将数据以RpcRequest请求类型发送到master endpoint。接下来会经过一个关键的handler:MessageEncoder将请求对象转换为字节数组,然后master endpoint收到字节数组之后,先通过MessageDecoder的decode方法转为RpcRequest对象。(TransportFrameDecoder是处理粘包拆包不是重点)

  private Message decode(Message.Type msgType, ByteBuf in) {
    switch (msgType) {

      case RpcRequest:
        return RpcRequest.decode(in);

然后依次通过IdleStateHandler,TransportChannelHandler这两个handler,最主要的是TransportChannelHandler,它将告诉我们最终的调用过程。

TransportChannelHandler.java
-----------------------

  public void channelRead0(ChannelHandlerContext ctx, Message request) throws Exception {
     
    // 区分消息类型
    if (request instanceof RequestMessage) {
     
      requestHandler.handle((RequestMessage) request);

TransportRequestHandler.java
----------------------

  public void handle(RequestMessage request) {
     
    ...
    else if (request instanceof RpcRequest) {
     
      processRpcRequest((RpcRequest) request);
 
 
    
  /**
   * 处理rpc请求
   * @param req
   */
  private void processRpcRequest(final RpcRequest req) {
     
    try {
     
      // rpcHandler=NettyRpcHandler
      rpcHandler.receive(reverseClient, req.body().nioByteBuffer(), new RpcResponseCallback() {
     
        @Override
        public void onSuccess(ByteBuffer response) {
     
          respond(new RpcResponse(req.requestId, new NioManagedBuffer(response)));
        }

        @Override
        public void onFailure(Throwable e) {
     
          respond(new RpcFailure(req.requestId, Throwables.getStackTraceAsString(e)));
        }
      });
    } catch (Exception e) {
     
      logger.error("Error while invoking RpcHandler#receive() on RPC id " + req.requestId, e);
      respond(new RpcFailure(req.requestId, Throwables.getStackTraceAsString(e)));
    } finally {
     
      req.body().release();
    }
  }

可以看出请求处理逻辑是通过:rpcHandler.receive方法(rpcHandler=NettyRpcHandler)

NettyRpcEnv.scala
-------------------

  override def receive(
      client: TransportClient,
      message: ByteBuffer,
      callback: RpcResponseCallback): Unit = {
     
    // ByteBuffer => requestMessage
    val messageToDispatch = internalReceive(client, message)
    // 分发请求信息
    dispatcher.postRemoteMessage(messageToDispatch, callback)
  }
  
  
  Dispatcher.scala
  -------------------
  
  
  def postRemoteMessage(message: RequestMessage, callback: RpcResponseCallback): Unit = {
     
      val rpcCallContext =
      new RemoteNettyRpcCallContext(nettyEnv, callback, message.senderAddress)
    val rpcMessage = RpcMessage(message.senderAddress, message.content, rpcCallContext)
    postMessage(message.receiver.name, rpcMessage, (e) => callback.onFailure(e))
  }

postMessage这个方法前面已经介绍了就是消息会发送到inbox中,然后自动调用masterEndpoint.receiveAndReply(因为请求类型是RpcMessage),但是异常情况没有说明,现在来看一下:

  private def postMessage(
      endpointName: String,
      message: InboxMessage,
      callbackIfStopped: (Exception) => Unit): Unit = {
     
    val error = synchronized {
     
      // 获取endpoint的信息
      val data = endpoints.get(endpointName)
      if (stopped) {
     
        Some(new RpcEnvStoppedException())
      } else if (data == null) {
     
        Some(new SparkException(s"Could not find $endpointName."))
      } else {
     
        // 往需要发送的通信端inbox中添加一条消息,并添加到receivers从而触发消息处理
        data.inbox.post(message)
        receivers.offer(data)
        None
      }
    }
    // 错误处理核心
    error.foreach(callbackIfStopped)
  }

注意错误处理代码:callbackIfStopped是

TransportRequestHandler.java
----------------------

new RpcResponseCallback() {
     
        @Override
        public void onSuccess(ByteBuffer response) {
     
          respond(new RpcResponse(req.requestId, new NioManagedBuffer(response)));
        }

        @Override
        public void onFailure(Throwable e) {
     
          respond(new RpcFailure(req.requestId, Throwables.getStackTraceAsString(e)));
        }
      }

最终会调用TransportRequestHandler.respond方法

  private void respond(final Encodable result) {
     
    final String remoteAddress = channel.remoteAddress().toString();
    channel.writeAndFlush(result).addListener(
      new ChannelFutureListener() {
     
        @Override
        public void operationComplete(ChannelFuture future) throws Exception {
     
          if (future.isSuccess()) {
     
            logger.trace(String.format("Sent result %s to client %s", result, remoteAddress));
          } else {
     
            logger.error(String.format("Error sending result %s to %s; closing connection",
              result, remoteAddress), future.cause());
            channel.close();
          }
        }
      }
    );
  }

即将错误信息返回到请求方。
如果请求是正常的最终会调用Master.receiveAndReply请求类型是RegisterWorker:

Master.scala
--------------------

 /**
    * 请求并返回
    * @param context
    * @return
    */
  override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
     
    case RegisterWorker(
        id, workerHost, workerPort, workerRef, cores, memory, workerWebUiUrl) =>
      logInfo("Registering worker %s:%d with %d cores, %s RAM".format(
        workerHost, workerPort, cores, Utils.megabytesToString(memory)))
      // 如果master是STANDBY,直接返回MasterInStandby消息
      if (state == RecoveryState.STANDBY) {
     
        context.reply(MasterInStandby)
        // 如果已经存在相同的workerid
      } else if (idToWorker.contains(id)) {
     
        context.reply(RegisterWorkerFailed("Duplicate worker ID"))
      } else {
     
        // 封装worker信息对象
        val worker = new WorkerInfo(id, workerHost, workerPort, cores, memory,
          workerRef, workerWebUiUrl)
        // 注册worker并修改对应的属性
        if (registerWorker(worker)) {
     
          persistenceEngine.addWorker(worker)
          // 响应master的web ui地址
          context.reply(RegisteredWorker(self, masterWebUiUrl))
          
          // 应用程序的资源调度(比较复杂,后面介绍)
          schedule()
        } else {
     
          val workerAddress = worker.endpoint.address
          logWarning("Worker registration failed. Attempted to re-register worker at same " +
            "address: " + workerAddress)
          context.reply(RegisterWorkerFailed("Attempted to re-register worker at same address: "
            + workerAddress))
        }
      }

上面的处理逻辑比较简单,就是master根据worker的信息,做不同的处理。但是最终都会调用方法context.reply,首先我们需要确认context对象是什么,在前面方法postRemoteMessage中创建了该对象RemoteNettyRpcCallContext,所以context即RemoteNettyRpcCallContext。根据代码调用可知,最终会调用以下代码:

NettyRpcCallContext.scala
-----------------------


  override protected def send(message: Any): Unit = {
     
    // 序列化message
    val reply = nettyEnv.serialize(message)
    // 调用callback的onSuccess方法发送信息
    callback.onSuccess(reply)
  }

callback在前面已经分析过了,就是TransportRequestHandler中的匿名内部类RpcResponseCallback的对象。

TransportRequestHandler.java
----------------------

new RpcResponseCallback() {
     
        @Override
        public void onSuccess(ByteBuffer response) {
     
          respond(new RpcResponse(req.requestId, new NioManagedBuffer(response)));
        }

        @Override
        public void onFailure(Throwable e) {
     
          respond(new RpcFailure(req.requestId, Throwables.getStackTraceAsString(e)));
        }
      }

所以master就会将处理后的结果返回到worker服务(注意:Worker的注册请求产生的请求对象是RpcRequest对象,master响应使用的是RpcResponse/RpcFailure对象)

3.总结

通过上面对worker注册到master endpoint的远程请求过程介绍,可以使用下面的示意图表示:
spark源码解析-分析一次完整的远程请求过程_第1张图片

你可能感兴趣的:(spark,spark,rpc,源码分析)