TaskManager在启动后,会调用ResourceManagerGateway#registerTaskExecutor()方法,将TaskManager的连接信息注册到ResourceManager中。注册的本质就是将这个TaskExecutor对应的注册信息TaskExecutorRegistration,按照映射关系“ResourceID:TaskExecutorRegistration”存储到ResourceManager的Map集合中。
一旦TaskExecutor向ResourceManager注册成功后,“成功消息”会通过监听器通知TaskExecutor。TaskExecutor收到后,会调用ResourceManagerGateway#sendSlotReport()方法,向ResourceManager汇报SlotReport信息。
在ResourceManagerGateway#sendSlotReport()方法中,ResourceManager会调用SlotManager#registerTaskManager(),将Slot注册到SlotManager中。本质上就是将这个TaskExecutor的注册信息,put到“所有已注册的TaskManager”的Map集合中,将这个TaskExecutor上的Slot全都put到“所有已经注册到SlotManager中的Slot”的Map集合和“当前SlotManager中可用(空闲、未被分配和使用)的Slot”的Map中。
/**
* SlotReport中有一个Collection,描述了这个TaskManager内的每个Slot的状态。
* 这里会遍历这个Collection集合,为每个Slot都进行注册
*/
@Override
public void registerTaskManager(final TaskExecutorConnection taskExecutorConnection, SlotReport initialSlotReport) {
// 初始化检查,SlotManager是否启动,确保SlotManager处于运行状态
checkInit();
LOG.debug("Registering TaskManager {} under {} at the SlotManager.", taskExecutorConnection.getResourceID(), taskExecutorConnection.getInstanceID());
// we identify task managers by their instance id
// Map结构,存储了所有“已注册”的TaskManager。Key:InstanceID Value:TaskManagerRegistration
// 判断在“已注册的TaskManager”的Map集合中是否有注册的TaskManager信息
if (taskManagerRegistrations.containsKey(taskExecutorConnection.getInstanceID())) {
// 如果这个TaskManager已经注册过了,那就对SlotReport中的已经注册的TaskManager的Slot信息进行更新处理
reportSlotStatus(taskExecutorConnection.getInstanceID(), initialSlotReport);
} else {
// first register the TaskManager
// Map结构中不存在当前TaskManager的注册信息,就把当前TaskManager就作为新启动的实例,并注册到“已注册TaskManager”的Map中
ArrayList<SlotID> reportedSlots = new ArrayList<>();
// 将initialSlotReport中的SlotStatus添加到List中
// SlotReport中有一个集合(Collection slotsStatus),保存了TaskManager的“Slot 状态”
// SlotReport内有一个迭代器,就是这个“SlotStatus集合”
for (SlotStatus slotStatus : initialSlotReport) {
// 将这个“即将要注册的TaskManager中的所有Slot”add到List集合中
reportedSlots.add(slotStatus.getSlotID());
}
// 创建新的TaskManagerRegistration,put到“所有已注册的TaskManager”的Map集合中
TaskManagerRegistration taskManagerRegistration = new TaskManagerRegistration(
// TaskExecutor在启动时就会将TaskExecutorConnection注册到ResourceManager中
taskExecutorConnection,
// 这个TaskManager中的所有Slot的List集合
reportedSlots);
// 将这个TaskManager的注册信息,添加到“已注册的TaskManager”的Map集合中
taskManagerRegistrations.put(taskExecutorConnection.getInstanceID(), taskManagerRegistration);
// next register the new slots
// 遍历TaskManager内的“装有每个Slot状态”的Collection集合,对每个Slot进行注册
for (SlotStatus slotStatus : initialSlotReport) {
// 对SlotReport中Collection集合中的每个SlotStatus进行注册
registerSlot(
slotStatus.getSlotID(),
slotStatus.getAllocationID(),
slotStatus.getJobID(),
slotStatus.getResourceProfile(),
taskExecutorConnection);
}
}
// 走完以上流程,最终完成TaskManager的“Slot计算资源”的上报、注册
}
这样,SlotManager就“掌握”了TaskExecutor汇报过来的Slot:
// 所有已经注册到SlotManager中的“Slot计算资源”
private final HashMap<SlotID, TaskManagerSlot> slots;
// 存储在当前SlotManager中可用(空闲、未被分配和使用)的"Slot计算资源"
private final LinkedHashMap<SlotID, TaskManagerSlot> freeSlots;
JobManager服务启动后,会主动向ResourceManager申请作业所需的Slot
// 向ResourceManager发送“申请Slot计算资源”的请求
slotPool.connectToResourceManager(resourceManagerGateway);
这个“申请Slot”的请求,会以SlotRequest的形式发送给ResourceManager
/**
* SlotPool通过远程RPC方法向ResourceManager申请Slot计算资源,ResourceManager收到SlotRequest后,
* 会将SlotRequest转发给SlotManager处理。如果能正常分配Slot,就返回ACK消息
*/
@Override
public CompletableFuture<Acknowledge> requestSlot(
JobMasterId jobMasterId,
SlotRequest slotRequest,
final Time timeout) {
// 省略部分代码......
// 将SlotPool发来的SlotRequest,转发给SlotManager处理
slotManager.registerSlotRequest(slotRequest);
// 省略部分代码......
}
SlotManager收到SlotRequest后,将SlotRequest转换成PendingSlotRequest结构后,将PendingSlotRequest按照“AllocationID:PendingSlotRequest”的映射关系,存储到Map集合中。
/**
* JobMaster内的SlotPool会将SlotRequest发送给ResourceManager,ResourceManager会将SlotRequest转发给内部的SlotManager处理。
*/
@Override
public boolean registerSlotRequest(SlotRequest slotRequest) throws ResourceManagerException {
checkInit();
if (checkDuplicateRequest(slotRequest.getAllocationId())) {
LOG.debug("Ignoring a duplicate slot request with allocation id {}.", slotRequest.getAllocationId());
return false;
} else {
// 将SlotRequest转换成PendingSlotRequest
PendingSlotRequest pendingSlotRequest = new PendingSlotRequest(slotRequest);
// 按照映射关系“AllocationID:PendingSlotRequest”,将PendingSlotRequest存储到Map集合中
pendingSlotRequests.put(slotRequest.getAllocationId(), pendingSlotRequest);
try {
// 处理PendingSlotRequest
internalRequestSlot(pendingSlotRequest);
} catch (ResourceManagerException e) {
// requesting the slot failed --> remove pending slot request
pendingSlotRequests.remove(slotRequest.getAllocationId());
throw new ResourceManagerException("Could not fulfill slot request " + slotRequest.getAllocationId() + '.', e);
}
return true;
}
}
之后SlotManager就要对PendingSlotRequest进行处理。首先去SlotManager中的“当前SlotManager中可用(空闲、未被分配和使用)的Slot”的Map集合中匹配,有就直接分配。
如果没有,那就再去SlotManager中“正处于Pending状态的TaskManager的PendingTaskManagerSlot”的Map集合中碰碰运气,运气好的话,等它释放了就会立即分配给PendingSlotRequest;运气不好的话,只能抛异常了。
// 正处于Pending状态的TaskManager的Slot
private final HashMap<TaskManagerSlotId, PendingTaskManagerSlot> pendingSlots;
/**
* 给PendingSlotRequest(SlotRequest会被SlotManager转换成PendingSlotRequest结构)分配Slot,
* 主要是根据资源描述信息,从SlotManager中的“当前SlotManager中可用(空闲、未被分配和使用)的Slot”的Map集合中匹配。
* 如果匹配上了,就直接分配;
* 如果没匹配上,就去SlotManager中“正处于Pending状态的TaskManager的PendingTaskManagerSlot”的Map集合中碰碰运气,运气好的话,等它释放了就会立即分配给PendingSlotRequest。
* 运气不好的话,就只能抛异常了。
*/
private void internalRequestSlot(PendingSlotRequest pendingSlotRequest) throws ResourceManagerException {
// 获取PendingSlotRequest(也就是SlotRequest)中的“配置资源信息”,以便于和SlotManager中“可用的、空闲未被分配的Slot”进行匹配
final ResourceProfile resourceProfile = pendingSlotRequest.getResourceProfile();
// 根据“配置资源信息”匹配资源,主要是在freeSlots集合中检索
OptionalConsumer.of(findMatchingSlot(resourceProfile))
// 如果匹配上,调用allocateSlot()方法分配“Slot计算资源”
.ifPresent(taskManagerSlot -> allocateSlot(taskManagerSlot, pendingSlotRequest))
// 匹配不上,继续处理:
// 1.如果SlotManager中“正处于Pending状态的TaskManager的Slot”的Map集合中有符合条件的PendingTaskManagerSlot,等它释放了,第一时间将它分配给这个获取PendingSlotRequest。
// 2.如果这都找不到合适的Slot,那就抛异常
.ifNotPresent(() -> fulfillPendingSlotRequestWithPendingTaskManagerSlot(pendingSlotRequest));
}
看Slot是如何正常分配的:
// 存储了所有“已注册”的TaskManager的注册信息
private final HashMap<InstanceID, TaskManagerRegistration> taskManagerRegistrations;
/**
* 已经为SlotRequest匹配到了合适的Slot,立即分配给SlotRequest
*/
private void allocateSlot(TaskManagerSlot taskManagerSlot, PendingSlotRequest pendingSlotRequest) {
// 如果TaskManagerSlot中的Slot的状态为“FREE”,那就分配给PendingRequest。否则,抛出异常。
Preconditions.checkState(taskManagerSlot.getState() == TaskManagerSlot.State.FREE);
// 从TaskManagerSlot中获取到TaskExecutor的连接信息--TaskExecutorConnection
TaskExecutorConnection taskExecutorConnection = taskManagerSlot.getTaskManagerConnection();
// 从TaskExecutorConnection中获取到TaskExecutor的Gateway
TaskExecutorGateway gateway = taskExecutorConnection.getTaskExecutorGateway();
final CompletableFuture<Acknowledge> completableFuture = new CompletableFuture<>();
final AllocationID allocationId = pendingSlotRequest.getAllocationId();
final SlotID slotId = taskManagerSlot.getSlotId();
final InstanceID instanceID = taskManagerSlot.getInstanceId();
// 将这个TaskManagerSlot分配给PendingSlotRequest,本质就是将这个Slot的状态置为:PENDING
taskManagerSlot.assignPendingSlotRequest(pendingSlotRequest);
// 将PendingSlotRequest设置为异步CompletableFuture操作
pendingSlotRequest.setRequestFuture(completableFuture);
returnPendingTaskManagerSlotIfAssigned(pendingSlotRequest);
// 从“所有已注册的TaskManager”的Map集合中,取出这个TaskManagerSlot对应的TaskManager的注册信息--TaskManagerRegistration
TaskManagerRegistration taskManagerRegistration = taskManagerRegistrations.get(instanceID);
if (taskManagerRegistration == null) {
throw new IllegalStateException("Could not find a registered task manager for instance id " +
instanceID + '.');
}
// 将这个TaskManager的注册信息,标记为“已被使用”状态
taskManagerRegistration.markUsed();
// RPC call to the task manager
// 调用TaskExecutorGateway#requestSlot()方法为PendingSlotRequest分配Slot,
// 此时TaskManager会收到来自ResourceManager内部的SlotManager的请求:将指定的Slot提供给指定的JobManager(这是响应当初JobMaster通过SlotPool发起的SlotRequest)
CompletableFuture<Acknowledge> requestFuture = gateway.requestSlot(
slotId,
pendingSlotRequest.getJobId(),
allocationId,
pendingSlotRequest.getResourceProfile(),
pendingSlotRequest.getTargetAddress(),
resourceManagerId,
taskManagerRequestTimeout);
// 执行TaskExecutorGateway的“分配Slot”操作,并返回ACK
requestFuture.whenComplete(
(Acknowledge acknowledge, Throwable throwable) -> {
if (acknowledge != null) {
completableFuture.complete(acknowledge);
} else {
completableFuture.completeExceptionally(throwable);
}
});
// 判断ACK
completableFuture.whenCompleteAsync(
(Acknowledge acknowledge, Throwable throwable) -> {
try {
if (acknowledge != null) {
// 返回了ACK,说明TaskExecutor已经正常且顺利的分配了Slot,更新Slot的状态为:ALLOCATED
updateSlot(slotId, allocationId, pendingSlotRequest.getJobId());
} else {
// 未能正常返回ACK,判断失败原因
if (throwable instanceof SlotOccupiedException) {
// 如果是“Slot已经被占用的异常”,那就用最新返回的Slot信息,更新SlotManager中的Slot状态
SlotOccupiedException exception = (SlotOccupiedException) throwable;
updateSlot(slotId, exception.getAllocationId(), exception.getJobId());
} else {
// 如果不是上述异常,就将Slot中的SlotRequest信息清除:这个Slot配不上这个SlotRequest
removeSlotRequestFromSlot(slotId, allocationId);
}
if (!(throwable instanceof CancellationException)) {
// 如果不是CancellationException类型的异常,SlotManager会重新为SlotRequest分配一个Slot。
handleFailedSlotRequest(slotId, allocationId, throwable);
} else {
LOG.debug("Slot allocation request {} has been cancelled.", allocationId, throwable);
}
}
} catch (Exception e) {
LOG.error("Error while completing the slot allocation.", e);
}
},
mainThreadExecutor);
}
ResourceManager根据JobManager的SlotRequest,完成了Slot的分配。具体的“Slot分配”操作由TaskExecutor操刀执行的:
/**
* ResourceManager向TaskExecutor通知、分配Slot的全过程
*/
@Override
public CompletableFuture<Acknowledge> requestSlot(
final SlotID slotId,
final JobID jobId,
final AllocationID allocationId,
final ResourceProfile resourceProfile,
final String targetAddress,
final ResourceManagerId resourceManagerId,
final Time timeout) {
// TODO: Filter invalid requests from the resource manager by using the instance/registration Id
log.info("Receive slot request {} for job {} from resource manager with leader id {}.",
allocationId, jobId, resourceManagerId);
try {
// 首先过滤来自ResourceManager的无效的资源申请:TaskManager和ResourceManager之间的连接是否有异常
if (!isConnectedToResourceManager(resourceManagerId)) {
final String message = String.format("TaskManager is not connected to the resource manager %s.", resourceManagerId);
log.debug(message);
throw new TaskManagerException(message);
}
// TaskSlotTable通过维护多个Index,存储了多个Slot实例,通过TaskSlotTable可以快速查找“指定index的Slot”是否free
// 检查“指定索引的Slot”是否空闲:SlotNumber为Slot对应的唯一id
if (taskSlotTable.isSlotFree(slotId.getSlotNumber())) {
// 指定Index的Slot处于空闲状态,可分配
if (taskSlotTable.allocateSlot(slotId.getSlotNumber(), jobId, allocationId, resourceProfile, taskManagerConfiguration.getTimeout())) {
log.info("Allocated slot for {}.", allocationId);
} else {
// 指定Index的Slot不空闲,抛出异常
log.info("Could not allocate slot for {}.", allocationId);
throw new SlotAllocationException("Could not allocate slot.");
}
} else if (!taskSlotTable.isAllocated(slotId.getSlotNumber(), jobId, allocationId)) {
// Slot已经分配给其他Job了
final String message = "The slot " + slotId + " has already been allocated for a different job.";
log.info(message);
final AllocationID allocationID = taskSlotTable.getCurrentAllocation(slotId.getSlotNumber());
throw new SlotOccupiedException(message, allocationID, taskSlotTable.getOwningJob(allocationID));
}
// JobManagerTable内部通过一个映射关系为“JobID:JobManagerConnection”的Map集合,存储了多个JobManagerConnection(TaskExecutor和JobManager通信的专用工具),
// 传入jobId判断TaskExecutor和JobManager之间的通信是否正常
if (jobManagerTable.contains(jobId)) {
// 直接向JobManager提供分配的Slot:将Slot包装成SlotOffer,经JobMasterGateway的RPC方法提供给JobMaster
offerSlotsToJobManager(jobId);
} else {
// 如果JobManagerTable中没有对应的JobID的注册信息
try {
// 将JobID对应的作业信息注册到JobLeaderService中(期间出现任何异常,都要释放掉Slot),
// JobLeaderService服务会帮忙找到JobManager的Leader,并尝试与之建立连接
jobLeaderService.addJob(jobId, targetAddress);
} catch (Exception e) {
// free the allocated slot
try {
// 将Job交给JobLeaderService服务监视,意味着JobLeaderService服务会尝试发现Job对应的JobManager的Leader,并尝试与其建立连接
// 该过程出现任何异常,都会释放掉“已分配的Slot”
taskSlotTable.freeSlot(allocationId);
} catch (SlotNotFoundException slotNotFoundException) {
// slot no longer existent, this should actually never happen, because we've
// just allocated the slot. So let's fail hard in this case!
onFatalError(slotNotFoundException);
}
// release local state under the allocation id.
localStateStoresManager.releaseLocalStateForAllocationId(allocationId);
// sanity check
if (!taskSlotTable.isSlotFree(slotId.getSlotNumber())) {
onFatalError(new Exception("Could not free slot " + slotId));
}
throw new SlotAllocationException("Could not add job to job leader service.", e);
}
}
} catch (TaskManagerException taskManagerException) {
return FutureUtils.completedExceptionally(taskManagerException);
}
return CompletableFuture.completedFuture(Acknowledge.get());
}
TaskSlotTable是TaskExecutor内部用来存储Slot的容器,它维护了多个Index可以快速查找指定Index的Slot是否FREE。向JobManager分配的Slot,就是出自TaskSlotTable。
JobManagerTable内部通过一个映射关系为“JobID:JobManagerConnection”的Map集合,存储了多个JobManagerConnection。通过JobID就能判断出这个TaskExecutor和JobManager之间是否能正常通信。如果能正常通信,就直接向JobManager提供分配的指定的Slot;如果不能,就让JobLeaderService服务帮忙找到JobManager的Leader,并尝试与之建立连接。
特别注意一点:为JobManager分配的Slot,会包装成SlotOffer后,经JobMasterGateway的RPC方法提供给JobMaster。简单理解就是TaskExecutor向JobMaster“发Slot Offer”
JobMaster收到这一批SlotOffer后,说明JobManager已经可以使用这些Slot对Task进行调度和执行了。
/**
* TaskExecutor遵从ResourceManager的最高指示,这一批Slot包装成SlotOffer后提供给JobMaster
* SlotOffer会被保存到SlotPool(管理Slot的“Slot池子”)中,Task在调度、执行时,会从SlotPool中获取有效的Slot,通过调度器向Slot所在的TaskManager提交Task实例运行。
*/
@Override
public CompletableFuture<Collection<SlotOffer>> offerSlots(
final ResourceID taskManagerId,
final Collection<SlotOffer> slots,
final Time timeout) {
// 从“已注册的TaskManager”的Map集合中,取出对应的TaskManager的“TaskManagerLocation和TaskExecutorGateway”
Tuple2<TaskManagerLocation, TaskExecutorGateway> taskManager = registeredTaskManagers.get(taskManagerId);
if (taskManager == null) {
return FutureUtils.completedExceptionally(new Exception("Unknown TaskManager " + taskManagerId));
}
final TaskManagerLocation taskManagerLocation = taskManager.f0;
final TaskExecutorGateway taskExecutorGateway = taskManager.f1;
// 创建RpcTaskManagerGateway实例,它是TaskManagerGateway的实现子类,可以和TaskManager进行通信
final RpcTaskManagerGateway rpcTaskManagerGateway = new RpcTaskManagerGateway(taskExecutorGateway, getFencingToken());
return CompletableFuture.completedFuture(
// 将SlotOffer集合交给SlotPool保管
slotPool.offerSlots(
taskManagerLocation,
rpcTaskManagerGateway,
slots));
}
SlotOffer会被保存到SlotPool(管理Slot的“Slot池子”)中,Task在调度、执行时,会从SlotPool中获取有效的Slot,通过调度器向Slot所在的TaskManager提交Task实例运行。