spark版本: 2.0.0
前面已经介绍了master启动,worker启动和rpc原理,现在结合这些,来探究一下一次完整的远程请求到底是咋样的?就以worker启动后注册到master为例,我们来细细品味一下其远程服务调用过程。
worker注册到master的方法是masterEndpoint.ask[RegisterWorkerResponse](RegisterWorker( workerId, host, port, self, cores, memory, workerWebUiUrl)
,即worker通过访问master服务然后根据master的结果,worker也响应的处理。
在rpc原理博客中,可以知道这代码最终是将message=RegisterWorker
添加到outbox中,然后取出来通过RpcOutboxMessage.sendWith方法发送到服务端(master endpoint),在sendWith方法中调用了sendRpc方法中的channel.writeAndFlush(new RpcRequest(requestId, new NioManagedBuffer(message)))
将数据以RpcRequest请求类型发送到master endpoint。接下来会经过一个关键的handler:MessageEncoder将请求对象转换为字节数组,然后master endpoint收到字节数组之后,先通过MessageDecoder的decode方法转为RpcRequest对象。(TransportFrameDecoder是处理粘包拆包不是重点)
private Message decode(Message.Type msgType, ByteBuf in) {
switch (msgType) {
case RpcRequest:
return RpcRequest.decode(in);
然后依次通过IdleStateHandler,TransportChannelHandler这两个handler,最主要的是TransportChannelHandler,它将告诉我们最终的调用过程。
TransportChannelHandler.java
-----------------------
public void channelRead0(ChannelHandlerContext ctx, Message request) throws Exception {
// 区分消息类型
if (request instanceof RequestMessage) {
requestHandler.handle((RequestMessage) request);
TransportRequestHandler.java
----------------------
public void handle(RequestMessage request) {
...
else if (request instanceof RpcRequest) {
processRpcRequest((RpcRequest) request);
/**
* 处理rpc请求
* @param req
*/
private void processRpcRequest(final RpcRequest req) {
try {
// rpcHandler=NettyRpcHandler
rpcHandler.receive(reverseClient, req.body().nioByteBuffer(), new RpcResponseCallback() {
@Override
public void onSuccess(ByteBuffer response) {
respond(new RpcResponse(req.requestId, new NioManagedBuffer(response)));
}
@Override
public void onFailure(Throwable e) {
respond(new RpcFailure(req.requestId, Throwables.getStackTraceAsString(e)));
}
});
} catch (Exception e) {
logger.error("Error while invoking RpcHandler#receive() on RPC id " + req.requestId, e);
respond(new RpcFailure(req.requestId, Throwables.getStackTraceAsString(e)));
} finally {
req.body().release();
}
}
可以看出请求处理逻辑是通过:rpcHandler.receive
方法(rpcHandler=NettyRpcHandler)
NettyRpcEnv.scala
-------------------
override def receive(
client: TransportClient,
message: ByteBuffer,
callback: RpcResponseCallback): Unit = {
// ByteBuffer => requestMessage
val messageToDispatch = internalReceive(client, message)
// 分发请求信息
dispatcher.postRemoteMessage(messageToDispatch, callback)
}
Dispatcher.scala
-------------------
def postRemoteMessage(message: RequestMessage, callback: RpcResponseCallback): Unit = {
val rpcCallContext =
new RemoteNettyRpcCallContext(nettyEnv, callback, message.senderAddress)
val rpcMessage = RpcMessage(message.senderAddress, message.content, rpcCallContext)
postMessage(message.receiver.name, rpcMessage, (e) => callback.onFailure(e))
}
postMessage这个方法前面已经介绍了就是消息会发送到inbox中,然后自动调用masterEndpoint.receiveAndReply(因为请求类型是RpcMessage),但是异常情况没有说明,现在来看一下:
private def postMessage(
endpointName: String,
message: InboxMessage,
callbackIfStopped: (Exception) => Unit): Unit = {
val error = synchronized {
// 获取endpoint的信息
val data = endpoints.get(endpointName)
if (stopped) {
Some(new RpcEnvStoppedException())
} else if (data == null) {
Some(new SparkException(s"Could not find $endpointName."))
} else {
// 往需要发送的通信端inbox中添加一条消息,并添加到receivers从而触发消息处理
data.inbox.post(message)
receivers.offer(data)
None
}
}
// 错误处理核心
error.foreach(callbackIfStopped)
}
注意错误处理代码:callbackIfStopped是
TransportRequestHandler.java
----------------------
new RpcResponseCallback() {
@Override
public void onSuccess(ByteBuffer response) {
respond(new RpcResponse(req.requestId, new NioManagedBuffer(response)));
}
@Override
public void onFailure(Throwable e) {
respond(new RpcFailure(req.requestId, Throwables.getStackTraceAsString(e)));
}
}
最终会调用TransportRequestHandler.respond方法
private void respond(final Encodable result) {
final String remoteAddress = channel.remoteAddress().toString();
channel.writeAndFlush(result).addListener(
new ChannelFutureListener() {
@Override
public void operationComplete(ChannelFuture future) throws Exception {
if (future.isSuccess()) {
logger.trace(String.format("Sent result %s to client %s", result, remoteAddress));
} else {
logger.error(String.format("Error sending result %s to %s; closing connection",
result, remoteAddress), future.cause());
channel.close();
}
}
}
);
}
即将错误信息返回到请求方。
如果请求是正常的最终会调用Master.receiveAndReply请求类型是RegisterWorker:
Master.scala
--------------------
/**
* 请求并返回
* @param context
* @return
*/
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
case RegisterWorker(
id, workerHost, workerPort, workerRef, cores, memory, workerWebUiUrl) =>
logInfo("Registering worker %s:%d with %d cores, %s RAM".format(
workerHost, workerPort, cores, Utils.megabytesToString(memory)))
// 如果master是STANDBY,直接返回MasterInStandby消息
if (state == RecoveryState.STANDBY) {
context.reply(MasterInStandby)
// 如果已经存在相同的workerid
} else if (idToWorker.contains(id)) {
context.reply(RegisterWorkerFailed("Duplicate worker ID"))
} else {
// 封装worker信息对象
val worker = new WorkerInfo(id, workerHost, workerPort, cores, memory,
workerRef, workerWebUiUrl)
// 注册worker并修改对应的属性
if (registerWorker(worker)) {
persistenceEngine.addWorker(worker)
// 响应master的web ui地址
context.reply(RegisteredWorker(self, masterWebUiUrl))
// 应用程序的资源调度(比较复杂,后面介绍)
schedule()
} else {
val workerAddress = worker.endpoint.address
logWarning("Worker registration failed. Attempted to re-register worker at same " +
"address: " + workerAddress)
context.reply(RegisterWorkerFailed("Attempted to re-register worker at same address: "
+ workerAddress))
}
}
上面的处理逻辑比较简单,就是master根据worker的信息,做不同的处理。但是最终都会调用方法context.reply
,首先我们需要确认context对象是什么,在前面方法postRemoteMessage
中创建了该对象RemoteNettyRpcCallContext
,所以context即RemoteNettyRpcCallContext。根据代码调用可知,最终会调用以下代码:
NettyRpcCallContext.scala
-----------------------
override protected def send(message: Any): Unit = {
// 序列化message
val reply = nettyEnv.serialize(message)
// 调用callback的onSuccess方法发送信息
callback.onSuccess(reply)
}
callback在前面已经分析过了,就是TransportRequestHandler中的匿名内部类RpcResponseCallback的对象。
TransportRequestHandler.java
----------------------
new RpcResponseCallback() {
@Override
public void onSuccess(ByteBuffer response) {
respond(new RpcResponse(req.requestId, new NioManagedBuffer(response)));
}
@Override
public void onFailure(Throwable e) {
respond(new RpcFailure(req.requestId, Throwables.getStackTraceAsString(e)));
}
}
所以master就会将处理后的结果返回到worker服务(注意:Worker的注册请求产生的请求对象是RpcRequest对象,master响应使用的是RpcResponse/RpcFailure对象)