Flink使用Akka+Netty框架实现RPC通信,之前在spark框架源码剖析过程中已经对Akka实现RPC通信过程有所介绍,这里不做过多描述。相关概念说明如下:
Flink中RPC实现主要是在flink-runtime模块下的org.apache.flink.runtime.rpc包中,涉及到的API主要是以下四个:
注:RpcEndpoint下面有四个比较重要的子类:TaskExecutor、Dispatcher、JobMaster、ResourceManager
public abstract class RpcEndpoint implements RpcGateway, AutoCloseableAsync{
// 只要当前的RpcEndpoint被实例化成功之后,调用onStart()方法,不是直接调用
protected void onStart() throws Exception {}
//当前RpcEndpoint需要被销毁的时候,在销毁之前,调用这个方法执行一次
public final CompletableFuture<Void> internalCallOnStop()
protected RpcEndpoint(final RpcService rpcService, final String endpointId) {
//启动RPC服务
//12.1、启动ResourceManager的RPC服务端,接收TaskManager的汇报信息
this.rpcServer = rpcService.startServer(this);
}
}
Flink集群的启动脚本在:flink-dist子项目中,位于flink-bin下的bin目录,启动脚本为start-cluster.sh。该脚本会首先调用config.sh来获取masters和workers,masters的配置信息位于conf/master文件中,workers的配置信息位于conf/workers中。
在start-cluster.sh中,分别通过执行jobmanager.sh和taskmanager.sh分别启动JobManager和TaskManager。
在jobmanager.sh和taskmanager.sh中,通过调用flink-daemon.sh来启动JVM进程,具体来说:JobManager启动参数为standalonesession,实现类是:org.apache.flink.container.entrypointorg.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint;
TaskManager启动参数为taskexecutor,实现类是:org.apache.flink.runtime.taskexecutor.TaskManagerRunner。
JobManager是Flink集群的主节点,它包含四个重要的组件:
总的来说,Flink集群的主节点运行ResourceManager和Dispatcher,当客户端提交一个job到集群中,Dispatcher拉起一个JobManager来负责这个job中task的执行,执行过程中所需要的资源,则通过JobManager向ResourceManager申请。
注:在Flink的心跳机制中,和其他集群不一样:
1、ResourceManager发送心跳给从节点TaskManager
2、从节点收到心跳信息之后,返回相应
public class StandaloneSessionClusterEntrypoint extends SessionClusterEntrypoint {
public static void main(String[] args){
//1、注册钩子函数,当集群出问题之前,可以关闭各种组件
JvmShutdownSafeguard.installAsShutdownHook(LOG);
//2、解析flink配置文件:flink-conf.yaml
Configuration configuration = loadConfiguration(entrypointClusterConfiguration);
//3、创建StandaloneSessionClusterEntrypoint对象
StandaloneSessionClusterEntrypoint entrypoint = new StandaloneSessionClusterEntrypoint(configuration);
//4、这个方法接受的父类是ClusterEntrypoint,其它几种启动方式也是通过这种方法
ClusterEntrypoint.runClusterEntrypoint(entrypoint);
}
}
public abstract class ClusterEntrypoint implements AutoCloseableAsync, FatalErrorHandler {
public static void runClusterEntrypoint(ClusterEntrypoint clusterEntrypoint){
//5、启动主节点,JobManager
clusterEntrypoint.startCluster();
}
public void startCluster() throws ClusterEntrypointException {
//6、PluginManager负责管理集群插件,这些插件是单独使用的类加载器加载的,从而不干扰Flink的依赖关系
PluginManager pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);
/*
7、根据初始化配置文件系统
1.本地 local 客户端的时候会用 JobGraph——>JobGraphFile,√
2.HDFS FileSystem(DistributedFileSystem)
3.封装对象 HadoopFileSystem,里面包装了Hdfs的FileSystem对象,√
*/
configureFileSystems(configuration, pluginManager);
runCluster(configuration, pluginManager);
}
private void runCluster(Configuration configuration, PluginManager pluginManager) throws Exception{
/*
8、初始化master节点要使用到的一些服务
1.commonRpcService:基于Akka的RpcService实现。Rpc服务启动参与者来接收从RpcGateway调用RPC
2.haServices:提高对高可用性所需要服务的访问注册,分布式计数器和领导人选举
3.blobServer:负责监听传入的请求,生成线程来处理这些请求
4.heartbeatServices:提供心跳所需的所有服务,包括创建心跳接收器和心跳发送者
5.metricRegistry:跟踪已注册的metric,用来连接MetricGroup和MetricReporter
6.archivedExecutionGraphStore:存储执行图ExecutionGraph的可序列化形式
*/
initializeServices(configuration, pluginManager);
/*
9、内部初始化了四大工厂实例
1.DispatcherRunnerFactory
2.ResourceManagerFactory
3.RestEndpointFactory
4.返回值:DispatcherResourceManagerComponentFactory,内部包含了这三个工厂实例,就是三个成员变量
*/
final DispatcherResourceManagerComponentFactory dispatcherResourceManagerComponentFactory = createDispatcherResourceManagerComponentFactory(configuration);
/*
10、创建并启动三个关键组件:Dispatcher、ResourceManager、WebMonitorEndpoint
*/
clusterComponent = dispatcherResourceManagerComponentFactory.create(...);
}
protected void initializeServices(Configuration configuration, PluginManager pluginManager) throws Exception {
/*
8.1、commonRpcService其实是一个基于akka得到ActorSystem,基于TCP的RPC服务,端口:6123
1、初始化ActorSystem
2、启动Actor
*/
commonRpcService = AkkaRpcServiceUtils.createRemoteRpcService(...);
//8.2、初始化ioExecutor,默认启动的线程数量是cpu核数*4
ioExecutor = Executors.newFixedThreadPool(
ClusterEntrypointUtils.getPoolSize(configuration),
new ExecutorThreadFactory("cluster-io"));
//8.3、haServices = ZooKeeperHaServices
haServices = createHaServices(configuration, ioExecutor);
//8.4、初始化一个BlobServer,管理大文件的上传,如用户作业的jar包、TM上传的log文件
//Blob指Binary Large Object
blobServer = new BlobServer(configuration, haServices.createBlobStore());
blobServer.start();
/*
8.5、初始化一个心跳服务
在主节点中,其它角色的心跳服务,都是建立在heartbeatServices基础之上。
需要心跳服务的角色,通过heartbeatServices提供一个HeartBeatImpl,完成心跳
*/
heartbeatServices = createHeartbeatServices(configuration);
/*
8.6、metrics(性能监控)相关的服务
1.metricQueryServiceRpcService 也是一个ActorSystem
2.用来跟踪已注册的metric
*/
metricRegistry = createMetricRegistry(configuration, pluginManager);
final RpcService metricQueryServiceRpcService = MetricUtils.startRemoteMetricsRpcService(configuration, commonRpcService.getAddress());
metricRegistry.startQueryService(metricQueryServiceRpcService, null);
/*
8.7、archivedExecutionGraphStore:存储Execution Graph服务,有两种实现方式
1.MemoryArchivedExecutionGraphStore,基于内存缓存
2.FileArchivedExecutionGraphStore,持久化到文件系统,内存中也会缓存,默认缓存方式
这些服务会在DispatcherResourceManagerComponent对象时使用
*/
archivedExecutionGraphStore = createSerializableExecutionGraphStore(configuration, commonRpcService.getScheduledExecutor());
}
}
public class DefaultDispatcherResourceManagerComponentFactory implements DispatcherResourceManagerComponentFactory {
@Override
public DispatcherResourceManagerComponent create(...){
//11、创建WebMonitorEndpoint实例,webMonitorEndpoint = DispatcherRestEndpoint
webMonitorEndpoint = restEndpointFactory.createRestEndpoint(...);
webMonitorEndpoint.start();
//12、创建StandaloneResourceManager示例,
resourceManager = resourceManagerFactory.createResourceManager(...);
//13、创建并启动Dispatcher,以前使用dispatcher.start()启动
dispatcherRunner = dispatcherRunnerFactory.createDispatcherRunner(...);
resourceManager.start();
}
}
private static final class Supervisor implements AutoCloseableAsync {
//8.1.1、Supervisor是对Actor的封装
private Supervisor(ActorRef actor, ExecutorService terminationFutureExecutor) {
this.actor = actor;
this.terminationFutureExecutor = terminationFutureExecutor;
}
}
public class HighAvailabilityServicesUtils {
public static HighAvailabilityServices createHighAvailabilityServices(...){
//8.2.1、获取HA模式,在flink-conf.yaml配置文件中,配置high-availability = zookeeper
HighAvailabilityMode highAvailabilityMode = HighAvailabilityMode.fromConfig(configuration);
switch (highAvailabilityMode) {
case ZOOKEEPER:
//8.2.2、创建BlobStoreService
BlobStoreService blobStoreService = BlobUtils.createBlobStoreFromConfig(configuration);
//8.2.3、创建ZooKeeperHaServices,包装了一个Zookeeper实例对象,通过Curator框架实现
return new ZooKeeperHaServices(...);
}
}
}
public abstract class RestServerEndpoint implements AutoCloseableAsync {
public final void start() throws Exception {
//11.1、初始化各种handler,包括:JobSubmitHandler
handlers = initializeHandlers(restAddressFuture);
//11.2、按照RestHandlerUrlComparator将handlers进行排序
Collections.sort(handlers,RestHandlerUrlComparator.INSTANCE);
//11.3、启动Netty服务端
ChannelInitializer<SocketChannel> initializer = new ChannelInitializer<SocketChannel>() {...}
...
//至此,主节点上的WebMonitorEndpoint组件的Netty服务端启动完毕。在客户端提交任务的时候,其会启动相应的Netty的客户端
state = State.RUNNING;
//11.4、启动WebMonitorEndpoint服务
startInternal();
};
}
}
public class DispatcherRestEndpoint extends WebMonitorEndpoint<DispatcherGateway> {
protected List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> initializeHandlers(final CompletableFuture<String> localAddressFuture) {
//11.1.1、父类WebMonitorEndpoint中初始化众多handler
List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> handlers = super.initializeHandlers(localAddressFuture);
//11.1.3、添加JobSubmitHandler,任务提交处理器
handlers.add(Tuple2.of(jobSubmitHandler.getMessageHeaders(), jobSubmitHandler));
}
}
public class WebMonitorEndpoint<T extends RestfulGateway> extends RestServerEndpoint implements LeaderContender, JsonArchivist {
protected List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> initializeHandlers(final CompletableFuture<String> localAddressFuture) {
/*
11.1.2、初始化一个ArrayList容器
ChannelInboundHandler:channelRead0()方法,会自动被Netty去调用执行,入栈处理器
channelRead0()的底层,最终调用的是handler.handleRequest()方法
客户端提交job的后,WebMonitorEndpoint接收到,交由JobSubmitHandler执行,最终执行请求的是handleRequest()
*/
ArrayList<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> handlers = new ArrayList<>(30);
//这些handler的作用,对应到Flink web业务的rest服务,可以把handler理解为servlet
///jobs/:jobid
handlers.add(Tuple2.of(JobManagerLogFileHeader.getInstance(), jobManagerLogFileHandler));
handlers.add(Tuple2.of(JobManagerStdoutFileHeader.getInstance(), jobManagerStdoutFileHandler));
handlers.add(Tuple2.of(JobManagerCustomLogHeaders.getInstance(), jobManagerCustomLogHandler));
handlers.add(Tuple2.of(JobManagerLogListHeaders.getInstance(), jobManagerLogListHandler));
...
}
public void startInternal() throws Exception {
/*
11.4.1、ZooKeeperLeaderElectionService执行选举,Dispatcher和ResourceManager也会执行选举,从而触发服务的启动
1.选举成功,调用leaderElectionService.isLeader()
2.选举失败,调用leaderElectionService.notLeader()
*/
leaderElectionService.start(this);
//11.4.2、开启定时任务
startExecutionGraphCacheCleanupTask();
}
private void startExecutionGraphCacheCleanupTask() {
/*
11.4.2.1、最终执行的方法是executionGraphCache.cleanup(),清理那些执行完成的executionGraph
cachedExecutionGraphs.values().removeIf((ExecutionGraphEntry entry) -> currentTime >= entry.getTTL());
*/
executionGraphCleanupTask = executor.scheduleWithFixedDelay(
executionGraphCache::cleanup,
cleanupInterval,
cleanupInterval,
TimeUnit.MILLISECONDS);
}
}
public class ZooKeeperLeaderElectionService implements LeaderLatchListener... {
/*
Zookeeper的API框架cruator机制:
当前这个类是LeaderLatchListener的子类,
所以当选举成功的时候,会自动调用isLeader()方法,否则调用notLeader()方法
*/
public void start(LeaderContender contender) throws Exception {
leaderContender = contender;
leaderLatch.addListener(this);
//11.4.1.2、执行选举
leaderLatch.start();
}
public void isLeader() {
/*
11.4.1.3、成为leader后
leaderElectionService.start(this);
leaderContender = this = WebMonitorEndpoint
其它组件启动时leaderContender = ResourceManager/DefaultDispatcherRunner
*/
leaderContender.grantLeadership(issuedLeaderSessionID);
}
}
public class AkkaRpcService implements RpcService {
public <C extends RpcEndpoint & RpcGateway> RpcServer startServer(C rpcEndpoint){
//12.2、获取hostname和port
final String akkaAddress = AkkaUtils.getAkkaURL(actorSystem, actorRef);
...
//12.3、定义接口处理器
final InvocationHandler akkaInvocationHandler;
//12.4、通过代理的方式创建一个RpcServer
RpcServer server = (RpcServer) Proxy.newProxyInstance(...akkaInvocationHandler);
}
}
public abstract class ResourceManager {
//12.5、执行onStart()方法,开启ResourceManager服务
private void startResourceManagerServices() throws Exception {
//12.6、进行选举,选举成功,调用leaderElectionService.isLeader()
leaderElectionService.start(this);
}
public void grantLeadership(final UUID newLeaderSessionID) {
//12.7、异步调用tryAcceptLeadership(...)方法
acceptLeadershipFuture = clearStateFuture.thenComposeAsync(
(ignored) -> tryAcceptLeadership(newLeaderSessionID),
getUnfencedMainThreadExecutor());
}
protected void startServicesOnLeadership() {
//12.8、开启心跳服务
startHeartbeatServices();
//12.13、启动SlotManager
slotManager.start(getFencingToken(), getMainThreadExecutor(), new ResourceActionsImpl());
}
private void startHeartbeatServices() {
//12.8.1、提供和TaskManager心跳相关的服务,关心TaskManager的死活
taskManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(...);
//12.8.2、提供和JobManager相关的服务,每一个Job都会启动的一个主控程序
jobManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(...);
}
}
public class HeartbeatManagerSenderImpl<I, O> ... implements Runnable {
HeartbeatManagerSenderImpl(...) {
//12.9、调度当前类实例的run()方法执行
mainThreadExecutor.schedule(this, 0L, TimeUnit.MILLISECONDS);
}
public void run() {
//12.10、实现循环执行控制参数
if (!stopped) {
//12.11、发送心跳信息
requestHeartbeat(heartbeatMonitor);
//12.12、实现循环
getMainThreadExecutor().schedule(this, heartbeatPeriod, TimeUnit.MILLISECONDS);
}
}
/*
12.11.1、发送心跳信息详解
HeartbeatMonitor:管理所有的心跳目标对象,如果从节点返回心跳响应,则会被加入到HeartbeatMonitor
heartbeatTarget:集群中启动的从节点,TaskExecutor
*/
private void requestHeartbeat(HeartbeatMonitor<O> heartbeatMonitor) {
heartbeatTarget.requestHeartbeat(getOwnResourceID(), payload);
}
}
public class SlotManagerImpl implements SlotManager {
public void start(...) {
//12.13.1、开启定时任务checkTaskManagerTimeouts,检查TaskManager的心跳
taskManagerTimeoutCheck = scheduledExecutor.scheduleWithFixedDelay(...);
//12.13.1、开启定时任务checkTaskManagerTimeouts,检查SlotRequest 超时处理
slotRequestTimeoutCheck = scheduledExecutor.scheduleWithFixedDelay(...);
}
}
public final class DefaultDispatcherRunner implements DispatcherRunner, LeaderContender {
public static DispatcherRunner create(...) throws Exception {
//13.1、创建DefaultDispatcherRunner
final DefaultDispatcherRunner dispatcherRunner = new DefaultDispatcherRunner();
//13.2、开启DefaultDispatcherRunner的生命周期,leaderElectionService为选举服务
return DispatcherRunnerLeaderElectionLifecycleManager.createFor(dispatcherRunner, leaderElectionService);
}
}
final class DispatcherRunnerLeaderElectionLifecycleManager implements DispatcherRunner {
private DispatcherRunnerLeaderElectionLifecycleManager(...) throws Exception {
/*
13.3、leaderElectionService.start(this);
leaderElectionService内部的选举对象leaderContender是DefaultDispatcherRunner
*/
leaderElectionService.start(dispatcherRunner);
}
//13.4、选举完成成为主节点后...
public void grantLeadership(UUID leaderSessionID) {
runActionIfRunning(() -> startNewDispatcherLeaderProcess(leaderSessionID));
}
//13.5、调用DispatcherLeaderProcess的start()方法
private void startNewDispatcherLeaderProcess(UUID leaderSessionID) {
//停掉已有的DispatcherLeaderProcess
stopDispatcherLeaderProcess();
//创建新的DispatcherLeaderProcess
final DispatcherLeaderProcess =...;
//newDispatcherLeaderProcess::start
FutureUtils.assertNoException(
previousDispatcherLeaderProcessTerminationFuture.thenRun(
newDispatcherLeaderProcess::start
));
}
}
public abstract class AbstractDispatcherLeaderProcess implements DispatcherLeaderProcess {
private void startInternal() {
log.info("Start {}.", getClass().getSimpleName());
//13.6、DispatcherLeaderProcess已经启动,改变状态
state = State.RUNNING;
onStart();
}
}
public class SessionDispatcherLeaderProcess ... {
protected void onStart() {
//13.7、开启服务:启动JobGraphStore,一个用来存储JobGraph的存储组件
startServices();
//13.8、开始创建Dispatcher
onGoingRecoveryOperation = recoverJobsAsync()
.thenAccept(this::createDispatcherIfRunning)
.handle(this::onErrorIfRunning);
}
}
class DefaultDispatcherGatewayServiceFactory implements ... {
public AbstractDispatcherLeaderProcess.DispatcherGatewayService create(...) {
//13.9、创建Dispatcher
dispatcher = dispatcherFactory.createDispatcher(...);
//13.12、Dispatcher启动后,发送一个hello消息给自己,说明启动成功
dispatcher.start();
}
}
public abstract class Dispatcher ... {
//13.10、执行onStart()方法
public void onStart() throws Exception {
//启动Dispatcher服务
startDispatcherServices();
//13.11、引导程序初始化,把所有中断的job恢复执行
dispatcherBootstrap.initialize(...);
}
//*/13.11.2、客户端提交job的时候,由Dispatcher接收到提交执行
private CompletableFuture<Void> runJob(JobGraph jobGraph) {
//提交任务 == start JobManagerRunner,封装了一个JobManager
return jobManagerRunnerFuture
.thenApply(FunctionUtils.uncheckedFunction(this::startJobManagerRunner))
...
}
}
public class DefaultDispatcherBootstrap extends AbstractDispatcherBootstrap {
public void initialize(...) {
/*
13.11.1、recoveredJobs:待恢复的job
AbstractDispatcherBootstrap底层:dispatcher.runRecoveredJob(recoveredJob);
*/
launchRecoveredJobGraphs(dispatcher, recoveredJobs);
//13.11.2、恢复之后,清空
recoveredJobs.clear();
}
}