从启动命令flink-daemon.sh可以看出TaskManger入口类为org.apache.flink.runtime.taskexecutor.TaskManagerRunner
TaskManagerRunner::main
TaskManagerRunner::runTaskManagerProcessSecurely
TaskManagerRunner::runTaskManager //构造TaskManagerRunner并调用start()方法
TaskManagerRunner::new //核心
在TaskManagerRunner构造函数中,可以看出与JobManger类似,也是先构造出一些公共服务:
highAvailabilityServices//用于获取JobManger的地址
rpcService //将TaskExecutor包装为AkkaActor提供RPC服务
heartbeatServices //心跳服务,与JobManger通信
metricRegistry //metric服务,提供metric注册和查询
blobCacheService //缓存Blob
这些服务在构造TaskExecutor时作为构造函数参数传入
构造TaskExecutor前会先构造TaskManagerServices辅助TaskExecutor实现其核心功能
TaskManagerRunner::createTaskExecutorService
TaskManagerRunner::startTaskManager // 构造MetricGroup和相关服务
TaskManagerServices.fromConfiguration//读取TaskManger的配置信息启动TaskManager相关服务
TaskExecutor::new //核心
启动TaskEexector后会与ResouceManager建立连接,将自身信息注册到RM后发送Slot报告给RM,具体调用链路如下:
TaskManagerRunner::start
TaskExecutorToServiceAdapter::start
TaskExecutor::start
TaskExecutor::onStart
TaskExecutor::startTaskExecutorServices //获取ResourceManager地址后与ResourceManager建立连接,发送Slot报告
ResourceManagerLeaderListener::notifyLeaderAddress
TaskExecutor::notifyOfNewResourceManagerLeader
TaskExecutor::reconnectToResourceManager
TaskExecutor::tryConnectToResourceManager
TaskExecutor::connectToResourceManager
TaskExecutorToResourceManagerConnection::start
RegisteredRpcConnection::start
RegisteredRpcConnection::createNewRegistration
TaskExecutorToResourceManagerConnection::generateRegistration
RetryingRegistration::startRegistration //与resourcemanager建立连接
RetryingRegistration::register
ResourceManagerRegistration::invokeRegistration //向ResourceManager注册TaskExecutorRegistration信息
ResourceManagerGateway.registerTaskExecutor
TaskExecutorToResourceManagerConnection::onRegistrationSuccess
ResourceManagerRegistrationListener::onRegistrationSuccess
TaskExecutor::establishResourceManagerConnection
ResourceManagerGateway.sendSlotReport //发送自身slot信息给ResourceManager
HeartbeatManagerImpl::monitorTarget//与RM建立心跳连接,当接到来自RM的心跳请求时,就会将SlotReport发送给RM作为心跳回应
TaskExecutor提供了以下两个核心方法:
//RM将Slot分配给JobMaster请求TM将具体Slot信息发送给JobMaster
CompletableFuture requestSlot(
SlotID slotId,
JobID jobId,
AllocationID allocationId,
ResourceProfile resourceProfile,
String targetAddress,
ResourceManagerId resourceManagerId,
@RpcTimeout Time timeout);
//执行JobMaster提交的物理Task
CompletableFuture submitTask(
TaskDeploymentDescriptor tdd, JobMasterId jobMasterId, @RpcTimeout Time timeout);
TaskManager中管理Slot的实现类TaskSlotTableImpl,该实例记录了Slot的分配信息。
在TaskExecutor构造函数中有两个HeartbeatManager,实现类都是HeartbeatManagerImpl,此类是接受心跳请求,发送心跳响应:
ResourceManagerHeartbeatManager //响应RM的心跳请求,心跳响应中带上SlotReport
JobManagerHeartbeatManager //响应JobMaster的心跳请求, 心跳响应中带上AccumulatorReport
调用HeartbeatManagerImpl.monitorTarget(ResourceID resourceID, HeartbeatTarget heartbeatTarget) 与目标对象建立心跳连接。
HeartbeatManager还有个实现类是HeartbeatManagerSenderImpl,用于主动向监控目标发送心跳请求,比如在ResourceManager中创建的就是HeartbeatManagerSenderImpl,TaskManager启动时向ResourceManager注册后,RM就会调用HeartbeatManagerSenderImpl.monitor监控TM, 并定时向TM的HeartbeatManagerImpl发送心跳请求。同样,在JobMaster中创建的也是HeartbeatManagerSenderImpl,JobMaster定时向执行当前Job的TM发送心跳请求,TM响应与该Job相关信息。
综上,TM启动后向RM注册,与TM通过心跳信息同步Slot分配状况,接受RM的Slot分配请求向JobMaster提供Slot后,就可以接受JobMaster 执行具体的物理Task了。