微信号:519292115
尊重原创,禁止转载!!
Spark目前是大数据领域中最火的框架之一,可高效实现离线批处理,实时计算和机器学习等多元化操作,阅读源码有助你加深对框架的理解和认知
本人将依次剖析Spark2.0.0.X版本的各个核心组件,包括以后章节的BlockManager,OutputTracker,TaskScheduler,DAGScheduler,Shuffle等
RpcEnv是各个组件之间通信的执行环境,每个节点之间(Driver或者Worker)组件的Endpoint和对应的EndpointRef之间的信息通信和方法调用都是通过RpcEnv作协调,而底层是通过Netty NIO框架实现(Spark早期版本通信是通过Akka,大的文件传输是通过Netty,在2.0.0版本后统一由Netty替换成了Akka,实现了通信传输统一化)
打个比方:像mapOutputTracker(M-S模式,上个章节提及的def registerOrLookupEndpoint)在自己的节点创建RpcEnv的时候(Driver或者Worker)都会注册自己的Endpoint到RpcEnv上或者拿到对方的Ref,这样就可以简历彼此之间的通信并通过Ref调用对方远程节点的方法了。
而Netty底层的本地进程通信和远程进程通信使用到的组件和方法函数都不一样,先从RpcEnv开始
/** * An RPC environment. [[RpcEndpoint]]s need to register itself with a name to [[RpcEnv]] to * receives message s. Then [[RpcEnv]] will process messages sent from [[RpcEndpointRef]] or remote * nodes, and deliver them to corresponding [[RpcEndpoint]]s. For uncaught exceptions caught by * [[RpcEnv]], [[RpcEnv]] will use [[RpcCallContext.sendFailure]] to send exceptions back to the * sender, or logging them if no such sender or `NotSerializableException`. * * [[RpcEnv]] also provides some methods to retrieve [[RpcEndpointRef]]s given name or uri. */ private[spark] abstract class RpcEnv(conf: SparkConf) {
注册和检索
/** * Register a [[RpcEndpoint]] with a name and return its [[RpcEndpointRef]]. [[RpcEnv]] does not * guarantee thread-safety. */ def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef
/** * Retrieve the [[RpcEndpointRef]] represented by `address` and `endpointName`. * This is a blocking action. */ def setupEndpointRef(address: RpcAddress, endpointName: String): RpcEndpointRef = { setupEndpointRefByURI(RpcEndpointAddress(address, endpointName).toString) }
RpcEnv最开始是在SparkEnv生成的时候构建的,在SparkEnv的create方法中
// 开始构建RpcEnv val rpcEnv = RpcEnv.create(systemName, bindAddress, advertiseAddress, port.getOrElse(-1), conf, securityManager, clientMode = !isDriver)调用半生对象里的创建方法
private[spark] object RpcEnv { def create( name: String, host: String, port: Int, conf: SparkConf, securityManager: SecurityManager, clientMode: Boolean = false): RpcEnv = { // 调用下面的create create(name, host, host, port, conf, securityManager, clientMode) } // RpcEnv创建方法 def create( name: String, bindAddress: String, advertiseAddress: String, port: Int, conf: SparkConf, securityManager: SecurityManager, clientMode: Boolean): RpcEnv = { // RpcEnvConfig被声明为case class,专门用来存储 配置的成员变量 val config = RpcEnvConfig(conf, name, bindAddress, advertiseAddress, port, securityManager, clientMode) //最后还是new的NettyRpcEnvFactory new NettyRpcEnvFactory().create(config) } }create方法最后还是会调用NettyRpcEnv的create,里面封装的方法将生成NettyRpcEnv,Dispatcher,TransportClient,Inbox,Outbox等核心组件
private[rpc] class NettyRpcEnvFactory extends RpcEnvFactory with Logging { def create(config: RpcEnvConfig): RpcEnv = { val sparkConf = config.conf // Use JavaSerializerInstance in multiple threads is safe. However, if we plan to support // KryoSerializer in future, we have to use ThreadLocal to store SerializerInstance // Netty的通信都是基于Java序列化,暂时不支持Kryo val javaSerializerInstance = new JavaSerializer(sparkConf).newInstance().asInstanceOf[JavaSerializerInstance] //初始化NettyRpcEnv val nettyEnv = new NettyRpcEnv(sparkConf, javaSerializerInstance, config.advertiseAddress, config.securityManager) //判断下是否在driver端 if (!config.clientMode) { // startNettyRpcEnv作为一个函数变量将在下面的startServiceOnPort中被调用 // 简单解释下这段代码 // 声明一个函数变量,参数是Int(actualPort),=>后面是实现体,最终返回的是2元祖(NettyRpcEnv, Int) val startNettyRpcEnv: Int => (NettyRpcEnv, Int) = { actualPort => // 主要是构建TransportServer和注册dispatcher nettyEnv.startServer(config.bindAddress, actualPort) (nettyEnv, nettyEnv.address.port) } try { // 其实内部实现还是调用startNettyRpcEnv在指定的端口实例化,并返回nettyEnv对象 Utils.startServiceOnPort(config.port, startNettyRpcEnv, sparkConf, config.name)._1 } catch { case NonFatal(e) => nettyEnv.shutdown() throw e } } nettyEnv } }我们来看下nettyEnv的实现,它集成于顶级抽象类RpcEnv,在nettyEnv里也实现setupEndopint,endpointRef,send等核心方法,并也初始化了用作数据分发的Dispatcher,负责远程下载文件的NettyStreamManager以及创建TransportServer和TransportClientFactor的TransportContext等
private[netty] class NettyRpcEnv( val conf: SparkConf, javaSerializerInstance: JavaSerializerInstance, host: String, securityManager: SecurityManager) extends RpcEnv(conf) with Logging { private[netty] val transportConf = SparkTransportConf.fromSparkConf( conf.clone.set("spark.rpc.io.numConnectionsPerPeer", "1"), "rpc", conf.getInt("spark.rpc.io.threads", 0)) // Dispatcher负责把messages发送到相关的Endpoint上 private val dispatcher: Dispatcher = new Dispatcher(this) // NettyStreamManager负责远程executor下载Driver端的jar或者其他格式的文件 private val streamManager = new NettyStreamManager(this) // TransportContext主要用于创建TransportServer和TransportClientFactory private val transportContext = new TransportContext(transportConf, new NettyRpcHandler(dispatcher, this, streamManager))
而Dispatcher内部有个核心部件是EndpointData,而它里面封装了用来存储接和处理本地消息的Inbox对象(处理远程消息的交由在NettyRpcEnv生成的Outbox处理)
简单来说就是 接收到的messages都会封装成EndpointData 然后加入到receivers里,最后交由线程池消费
/** * A message dispatcher, responsible for routing RPC messages to the appropriate endpoint(s). */ private[netty] class Dispatcher(nettyEnv: NettyRpcEnv) extends Logging { // EndpointData里面封装了自己的信息和对应的Inbox private class EndpointData( val name: String, val endpoint: RpcEndpoint, val ref: NettyRpcEndpointRef) { // 用来存储本地messages和根据消息内容执行相关的操作 val inbox = new Inbox(ref, endpoint) } // 数据存储结构:java.util.concurrent下的ConcurrentHashMap // key是endpont名字,value是EndponitData(endpoint) private val endpoints: ConcurrentMap[String, EndpointData] = new ConcurrentHashMap[String, EndpointData] // 数据存储结构:java.util.concurrent下的ConcurrentHashMap // RpcEndpoint和RpcEndpointRef的对应关系 private val endpointRefs: ConcurrentMap[RpcEndpoint, RpcEndpointRef] = new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef] // Track the receivers whose inboxes may contain messages. // 数据存储结构:java.util.concurrent下的LinkedBlockingQueue // 里面维护着EndpointData的线程阻塞链表 private val receivers = new LinkedBlockingQueue[EndpointData]
包括注册自己到NettyRpcEnv上并发回自己的Ref的实现
def registerRpcEndpoint(name: String, endpoint: RpcEndpoint): NettyRpcEndpointRef = { // 拿到nettyEnv地址 val addr = RpcEndpointAddress(nettyEnv.address, name) // 创建NettyRpcEndpointRef,继承于顶级超类RpcEndpointRef val endpointRef = new NettyRpcEndpointRef(nettyEnv.conf, addr, nettyEnv) synchronized { if (stopped) { throw new IllegalStateException("RpcEnv has been stopped") } if (endpoints.putIfAbsent(name, new EndpointData(name, endpoint, endpointRef)) != null) { throw new IllegalArgumentException(s"There is already an RpcEndpoint called $name") } // 根据endpoint的名字提取到对应的EndpointData val data = endpoints.get(name) // 放入endpoint和对应的ref endpointRefs.put(data.endpoint, data.ref) // 最后把EndpointData加入到receivers // 调用offer塞入数据到尾部的时候 不会因为队列已满而报错或阻塞,而是直接返回fals(put会阻塞,add会报错) receivers.offer(data) // for the OnStart message } // 返回endpointRef endpointRef }
最后我们来看下send这个经典的发送消息的方法,里面封装了不同类型消息体之间的通信的不同实现
private[netty] def send(message: RequestMessage): Unit = { // 拿到需要发送的ednpoint地址 val remoteAddr = message.receiver.address // 判断是否是远程地址 if (remoteAddr == address) { // Message to a local RPC endpoint. try { // 如果消息接受者在本地就调用dispatcher来发送消息 dispatcher.postOneWayMessage(message) } catch { case e: RpcEnvStoppedException => logWarning(e.getMessage) } } else { // Message to a remote RPC endpoint. // 如果消息接受者在远程节点就发送到对应节点的outbox postToOutbox(message.receiver, OneWayOutboxMessage(message.serialize(this))) } }首先看下dispatcher做了些什么
调用了一个单向的发送消息
/** Posts a one-way message. */ def postOneWayMessage(message: RequestMessage): Unit = { // 调用了postMessage postMessage(message.receiver.name, OneWayMessage(message.senderAddress, message.content), (e) => throw e) }
private def postMessage( endpointName: String, message: InboxMessage, callbackIfStopped: (Exception) => Unit): Unit = { val error = synchronized { // 拿到对应的EndpointData val data = endpoints.get(endpointName) if (stopped) { Some(new RpcEnvStoppedException()) } else if (data == null) { Some(new SparkException(s"Could not find $endpointName.")) } else { // 调用inbox对象把massage加入到java.util.LinkedList[InboxMessage]消息队列链表中 data.inbox.post(message) // 把EndpointData加入到receivers链表中等待被消费 receivers.offer(data) None } } // We don't need to call `onStop` in the `synchronized` block error.foreach(callbackIfStopped) }
dispatcher会用java.util.concurrent.newFixedThreadPool创建一个属于自己的ThreadPoolExecutor线程池,然后不停的会去拿去messages链表队列里的消息数据,并根据消息的类型执行message的模式匹配做对应的处理
/** Thread pool used for dispatching messages. */ // dispatcher的线程池 private val threadpool: ThreadPoolExecutor = { // 通过配置项拿到dispatcher的线程数量 val numThreads = nettyEnv.conf.getInt("spark.rpc.netty.dispatcher.numThreads", math.max(2, Runtime.getRuntime.availableProcessors())) // 最后会生成Java的ThreadPoolExecutor线程池 val pool = ThreadUtils.newDaemonFixedThreadPool(numThreads, "dispatcher-event-loop") for (i <- 0 until numThreads) { // 直接调用execute执行线程MessageLoop类型是Runnable pool.execute(new MessageLoop) } pool }接下来大家要认真好好看了,因为process里面涉及到一些组件真正调用单向和双向消息的具体实现,模式匹配+偏函数经典搭配;包括还有远程消息体的处理方式/** * Wrapper over newFixedThreadPool. Thread names are formatted as prefix-ID, where ID is a * unique, sequentially assigned integer. */ def newDaemonFixedThreadPool(nThreads: Int, prefix: String): ThreadPoolExecutor = { val threadFactory = namedThreadFactory(prefix) // 通过java.util.concurrent.ThreadPoolExecutor构建线程池 Executors.newFixedThreadPool(nThreads, threadFactory).asInstanceOf[ThreadPoolExecutor] }/** Message loop used for dispatching messages. */private class MessageLoop extends Runnable { override def run(): Unit = { try { // 线程会不停的去处理过来的messages while (true) { try { val data = receivers.take() if (data == PoisonPill) { // Put PoisonPill back so that other MessageLoops can see it. receivers.offer(PoisonPill) return } // 调用inbox的process方法,根据需要处理的消息类型message的模式匹配来执行对应的处理方式 data.inbox.process(Dispatcher.this) } catch { case NonFatal(e) => logError(e.getMessage, e) } } } catch { case ie: InterruptedException => // exit } }}
/** * Process stored messages. */ def process(dispatcher: Dispatcher): Unit = { var message: InboxMessage = null inbox.synchronized { if (!enableConcurrent && numActiveThreads != 0) { return } // 先从messages链表里poll出一条消息数据 message = messages.poll() if (message != null) { numActiveThreads += 1 } else { return } } while (true) { safelyCall(endpoint) { // 对poll出来的message做模式匹配,调用对应的处理机制 message match { // 匹配到一条普通的Rpc消息 // 这里说一下,一下所有匹配的消息类型都实现了trait InboxMessage,包括这条RpcMessage case RpcMessage(_sender, content, context) => try { // 这个方法是接收并返回的双向消息体,是通过sender调用对应的Ref的ask方法触发的 // 包括在下个章节会提及的blockmanager中的BlockManagerSlaveEndpoint组件在执行 // RemoveBlock,GetBlockStatus等操作时都是调用receiveAndReply // 这里补充一下:receiveAndReply是一个PartialFunction(偏函数),当endpoint调用 // receiveAndReply时会根据case 到的类型执行对应的操作 endpoint.receiveAndReply(context).applyOrElse[Any, Unit](content, { msg => throw new SparkException(s"Unsupported message $message from ${_sender}") }) } catch { case NonFatal(e) => context.sendFailure(e) // Throw the exception -- this exception will be caught by the safelyCall function. // The endpoint's onError function will be called. throw e }双向消息处理调用的偏函数receiveAndReply
/** * Process messages from `RpcEndpointRef.ask`. If receiving a unmatched message, * `SparkException` will be thrown and sent to `onError`. */ // trait RpcEndpoint接收双向消息的偏函数 def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = { case _ => context.sendFailure(new SparkException(self + " won't reply anything")) }这里顺便补充下单向消息的接口,下面的代码马上会有提及
/** * Process messages from `RpcEndpointRef.send` or `RpcCallContext.reply`. If receiving a * unmatched message, `SparkException` will be thrown and sent to `onError`. */ // 处理一个Ref调用send或者reply发送过过来的消息 def receive: PartialFunction[Any, Unit] = { case _ => throw new SparkException(self + " does not implement 'receive'") }继续看process的实现:
// 匹配一个单向的消息处理机制 case OneWayMessage(_sender, content) => // 这就是刚刚说到的单向消息体的具体实现 // 调用偏函数receive处理一个Ref调用send或者reply发送过过来的消息 endpoint.receive.applyOrElse[Any, Unit](content, { msg => throw new SparkException(s"Unsupported message $message from ${_sender}") }) // 匹配一个开启endpoint接收消息的方法 case OnStart => // 在endpoint接收任何消息之前调用,启动它的接收消息功能 endpoint.onStart() // 如果它的实例不是ThreadSafeRpcEndpoint类型就强制关闭 if (!endpoint.isInstanceOf[ThreadSafeRpcEndpoint]) { inbox.synchronized { if (!stopped) { enableConcurrent = true } } } // 匹配一个停止endpoint接收消息的方法,当匹配到这个方法后,它的send和ask都不能用了 case OnStop => val activeThreads = inbox.synchronized { inbox.numActiveThreads } // 做个断言 assert(activeThreads == 1, s"There should be only a single active thread but found $activeThreads threads.") // 移除掉RpcEndpointRef dispatcher.removeRpcEndpointRef(endpoint) // 停止接收消息 endpoint.onStop() // 断言是否为空 assert(isEmpty, "OnStop should be the last message") // 匹配到一条告诉所有节点的消息,一个远程进程已连接 case RemoteProcessConnected(remoteAddress) => endpoint.onConnected(remoteAddress) // 匹配到一条告诉所有节点的消息,一个远程进程已断开连接 case RemoteProcessDisconnected(remoteAddress) => endpoint.onDisconnected(remoteAddress) // 匹配到一条告诉所有节点的消息,一个远程进程连接发生错误状态 case RemoteProcessConnectionError(cause, remoteAddress) => endpoint.onNetworkError(cause, remoteAddress) } }关于本地处理消息的机制解析完毕,接下来是远程消息体的处理机制解析
这里的消息的底层完全基于Netty管道的writeAndFlush操作,当然也包括了单向和双向消息体,具体实现如下
先了解下这个outboxes-----每个节点都有个outboxes用来存储各个节点对应的outbox
如果接收者非本地地址就会直接发送给对方的outbox, 然后等待线程消费
/** * A map for [[RpcAddress]] and [[Outbox]]. When we are connecting to a remote [[RpcAddress]], * we just put messages to its [[Outbox]] to implement a non-blocking `send` method. */ // 当调远程Ref的时候,仅需连接到远程对应的Rpc地址并把message放入它的Outbox等待消费而避免了线程阻塞 // 还是调用的Java的ConcurrentHashMap数据结构做的outboxes,里面存放的是Rpc地址和他对应的outbox对象 // outbox里面封装的则是messages消息队列,TransportClient,消息的处理机制等逻辑 private val outboxes = new ConcurrentHashMap[RpcAddress, Outbox]()
接着之前send方法里的判断:
else { // Message to a remote RPC endpoint. // 如果消息接受者在远程节点就发送到对应节点的outbox postToOutbox(message.receiver, OneWayOutboxMessage(message.serialize(this))) } }
private def postToOutbox(receiver: NettyRpcEndpointRef, message: OutboxMessage): Unit = { if (receiver.client != null) { // 如果接收端的TransportClient启动了 就直接调用sendWith // 调用sendWith核心方法 // 提醒一下:这里所有的outbox里提取出的消息体都是实现了trait OutboxMessage // 所以不同类型的message调用的sendWith实现也不同 // 也是分为单向和双向消息体 message.sendWith(receiver.client) } else { // 如果接收端没有启动TransportClient就会先查询下是否包含接收者地址 require(receiver.address != null, "Cannot send message to client endpoint with no listen address.") val targetOutbox = { // 通过Rpc地址从outboxes拿到接收者地址的对应的outbox // 数据结构:Java的ConcurrentHashMap[RpcAddress, Outbox] val outbox = outboxes.get(receiver.address) if (outbox == null) { // 如果该地址对应的outbox不存在就构建一个 val newOutbox = new Outbox(this, receiver.address) // 并加入到outboxes里面 val oldOutbox = outboxes.putIfAbsent(receiver.address, newOutbox) if (oldOutbox == null) { // 若为空就直接引用刚生成的newOutbox newOutbox } else { // 返回 oldOutbox } } else { // 返回 outbox } } if (stopped.get) { // It's possible that we put `targetOutbox` after stopping. So we need to clean it. outboxes.remove(receiver.address) targetOutbox.stop() } else { // 最后生成的outbox对象会根据不同的状态执行send中不同的实现 // 包括可能也会走drainOutbox方法(里面包含在接收者端启动一个TransportClient) targetOutbox.send(message) } } }
这里我们来看一下sendWith2种实现
单向消息体,底层调用的是Netty的io.netty.channel.writeAndFlush
// 单向消息体 private[netty] case class OneWayOutboxMessage(content: ByteBuffer) extends OutboxMessage with Logging { override def sendWith(client: TransportClient): Unit = { // 通过TransportClient发送消息 // 底层则会调用 Netty的io.netty.channel.writeAndFlush client.send(content) }下面是双向消息体的实现 底层一样调用的是io.netty.channel.writeAndFlush,只是多个回调函数
private[netty] case class RpcOutboxMessage( content: ByteBuffer, _onFailure: (Throwable) => Unit, _onSuccess: (TransportClient, ByteBuffer) => Unit) extends OutboxMessage with RpcResponseCallback with Logging { private var client: TransportClient = _ private var requestId: Long = _ override def sendWith(client: TransportClient): Unit = { this.client = client // 底层也是调用Netty的io.netty.channel.writeAndFlush // 只是多了一个接收server端消息响应的回调函数RpcResponseCallback this.requestId = client.sendRpc(content, this)接下来是发现接收端没有启动TransportClient
/** * Send a message. If there is no active connection, cache it and launch a new connection. If * [[Outbox]] is stopped, the sender will be notified with a [[SparkException]]. */ def send(message: OutboxMessage): Unit = { // 检查状态 val dropped = synchronized { // 判断Outbox是否关闭 if (stopped) { true } else { // 如果Outbox启动了 则把message添加到自己的消息队列里 // 加入 java.util.LinkedList[OutboxMessage] 的队列中,等待被线程消费 messages.add(message) false } } if (dropped) { // 如果Outbox停止状态 message.onFailure(new SparkException("Message is dropped because Outbox is stopped")) } else { // Outbox启动状态则调用drainOutbox处理消息 drainOutbox() } }根据情况 最终还是可能会调用drainOutbox(),里面会再次判断接收端是否启动了TransportClient如果没有就去调用nettyEnv去执行远程创建TransportClient,然后会在同步锁里无限循环根据不同类型的message调用的sendWith实现
/** * Drain the message queue. If there is other draining thread, just exit. If the connection has * not been established, launch a task in the `nettyEnv.clientConnectionExecutor` to setup the * connection. */ // outbox消费message队列 private def drainOutbox(): Unit = { var message: OutboxMessage = null // 附加同步锁 synchronized { if (stopped) { // outbox停止状态直接返回 return } if (connectFuture != null) { // We are connecting to the remote address, so just exit return } // 判断TransportClient对象是否为空 if (client == null) { // There is no connect task but client is null, so we need to launch the connect task. // 如果TransportClient为空就会建立连接,然后再调用drainOutbox方法 launchConnectTask() return }launchConnectTask()具体实现:private def launchConnectTask(): Unit = { // 调用ThreadUtils.newDaemonCachedThreadPool底层会创建一个java的ThreadPoolExecutor线程池 connectFuture = nettyEnv.clientConnectionExecutor.submit(new Callable[Unit] { override def call(): Unit = { try { // 根据Rpc地址在接收者端启动一个TransportClient val _client = nettyEnv.createClient(address) // outbox给上同步锁 outbox.synchronized { client = _client if (stopped) { // 如果是stop状态就关闭,会把client设置成null,但连接还在,方便重连 closeClient() } } } catch { case ie: InterruptedException => // exit return case NonFatal(e) => outbox.synchronized { connectFuture = null } handleNetworkFailure(e) return } outbox.synchronized { connectFuture = null } // It's possible that no thread is draining now. If we don't drain here, we cannot send the // messages until the next message arrives. // 最后调用drainOutbox drainOutbox() } }) }继续接着刚刚的drainOutbox()实现:
// 判断下琐里有没有其他线程在使用 if (draining) { // There is some thread draining, so just exit return } // 没有的话就会从messages链表里poll移出第一个消息体 message = messages.poll() if (message == null) { // 如果消息队列为空直接返回 return } // 相当于强制锁,其他线程如果走到这里就说明能执行下面的while循环 // 而上线的draining判断则会让其他过来的线程强制return // 在循环的最后消息处理完毕后就会把draining赋值为false,这样其他线程又能来使用了 draining = true } while (true) { try { // 同步拿到client val _client = synchronized { client } if (_client != null) { // 调用sendWith核心方法 // 提醒一下:这里所有的outbox里提取出的消息体都是实现了trait OutboxMessage // 所以不同类型的message调用的sendWith实现也不同 // 也是分为单向和双向消息体 message.sendWith(_client) } else { // 断言判断outbox assert(stopped == true) } } catch { case NonFatal(e) => handleNetworkFailure(e) return } synchronized { if (stopped) { return } // 这个线程会在这个while无限循环中不停的poll出消息体并执行上面的动作 // 直到消息队列里没有消息后会把draining赋值会false,以便下个线程使用 message = messages.poll() if (message == null) { draining = false return } } } }