Slot计算资源管理

文章目录

  • 1.TaskManager向ResourceManager汇报Slot计算资源
  • 2.JobManager向ResourceManager发送SlotRequest
  • 3.TaskExecutor接收到ResourceManager的Slot分配请求
  • 4.JobManager接收TaskExecutor的SlotOffer

ResourceManager内部通过SlotManager服务,统一对整个集群的Slot计算资源进行管理。

1.TaskManager向ResourceManager汇报Slot计算资源

TaskManager在启动后,会调用ResourceManagerGateway#registerTaskExecutor()方法,将TaskManager的连接信息注册到ResourceManager中。注册的本质就是将这个TaskExecutor对应的注册信息TaskExecutorRegistration,按照映射关系“ResourceID:TaskExecutorRegistration”存储到ResourceManager的Map集合中。

一旦TaskExecutor向ResourceManager注册成功后,“成功消息”会通过监听器通知TaskExecutor。TaskExecutor收到后,会调用ResourceManagerGateway#sendSlotReport()方法,向ResourceManager汇报SlotReport信息。

在ResourceManagerGateway#sendSlotReport()方法中,ResourceManager会调用SlotManager#registerTaskManager(),将Slot注册到SlotManager中。本质上就是将这个TaskExecutor的注册信息,put到“所有已注册的TaskManager”的Map集合中,将这个TaskExecutor上的Slot全都put到“所有已经注册到SlotManager中的Slot”的Map集合和“当前SlotManager中可用(空闲、未被分配和使用)的Slot”的Map中。

/**
 * SlotReport中有一个Collection,描述了这个TaskManager内的每个Slot的状态。
 * 这里会遍历这个Collection集合,为每个Slot都进行注册
 */
@Override
public void registerTaskManager(final TaskExecutorConnection taskExecutorConnection, SlotReport initialSlotReport) {
    // 初始化检查,SlotManager是否启动,确保SlotManager处于运行状态
    checkInit();

    LOG.debug("Registering TaskManager {} under {} at the SlotManager.", taskExecutorConnection.getResourceID(), taskExecutorConnection.getInstanceID());

    // we identify task managers by their instance id
    // Map结构,存储了所有“已注册”的TaskManager。Key:InstanceID  Value:TaskManagerRegistration
    // 判断在“已注册的TaskManager”的Map集合中是否有注册的TaskManager信息
    if (taskManagerRegistrations.containsKey(taskExecutorConnection.getInstanceID())) {
        // 如果这个TaskManager已经注册过了,那就对SlotReport中的已经注册的TaskManager的Slot信息进行更新处理
        reportSlotStatus(taskExecutorConnection.getInstanceID(), initialSlotReport);
    } else {
        // first register the TaskManager
        // Map结构中不存在当前TaskManager的注册信息,就把当前TaskManager就作为新启动的实例,并注册到“已注册TaskManager”的Map中
        ArrayList<SlotID> reportedSlots = new ArrayList<>();

        // 将initialSlotReport中的SlotStatus添加到List中
        // SlotReport中有一个集合(Collection slotsStatus),保存了TaskManager的“Slot 状态”
        // SlotReport内有一个迭代器,就是这个“SlotStatus集合”
        for (SlotStatus slotStatus : initialSlotReport) {
            // 将这个“即将要注册的TaskManager中的所有Slot”add到List集合中
            reportedSlots.add(slotStatus.getSlotID());
        }

        // 创建新的TaskManagerRegistration,put到“所有已注册的TaskManager”的Map集合中
        TaskManagerRegistration taskManagerRegistration = new TaskManagerRegistration(
            // TaskExecutor在启动时就会将TaskExecutorConnection注册到ResourceManager中
            taskExecutorConnection,
            // 这个TaskManager中的所有Slot的List集合
            reportedSlots);

        // 将这个TaskManager的注册信息,添加到“已注册的TaskManager”的Map集合中
        taskManagerRegistrations.put(taskExecutorConnection.getInstanceID(), taskManagerRegistration);

        // next register the new slots
        // 遍历TaskManager内的“装有每个Slot状态”的Collection集合,对每个Slot进行注册
        for (SlotStatus slotStatus : initialSlotReport) {
            // 对SlotReport中Collection集合中的每个SlotStatus进行注册
            registerSlot(
                slotStatus.getSlotID(),
                slotStatus.getAllocationID(),
                slotStatus.getJobID(),
                slotStatus.getResourceProfile(),
                taskExecutorConnection);
        }
    }
    // 走完以上流程,最终完成TaskManager的“Slot计算资源”的上报、注册

}

这样,SlotManager就“掌握”了TaskExecutor汇报过来的Slot:

// 所有已经注册到SlotManager中的“Slot计算资源”
private final HashMap<SlotID, TaskManagerSlot> slots;

// 存储在当前SlotManager中可用(空闲、未被分配和使用)的"Slot计算资源"
private final LinkedHashMap<SlotID, TaskManagerSlot> freeSlots;

2.JobManager向ResourceManager发送SlotRequest

JobManager服务启动后,会主动向ResourceManager申请作业所需的Slot

// 向ResourceManager发送“申请Slot计算资源”的请求
slotPool.connectToResourceManager(resourceManagerGateway);

这个“申请Slot”的请求,会以SlotRequest的形式发送给ResourceManager

/**
 * SlotPool通过远程RPC方法向ResourceManager申请Slot计算资源,ResourceManager收到SlotRequest后,
 * 会将SlotRequest转发给SlotManager处理。如果能正常分配Slot,就返回ACK消息
 */
@Override
public CompletableFuture<Acknowledge> requestSlot(
    JobMasterId jobMasterId,
    SlotRequest slotRequest,
    final Time timeout) {

    // 省略部分代码......

    // 将SlotPool发来的SlotRequest,转发给SlotManager处理
    slotManager.registerSlotRequest(slotRequest);

    // 省略部分代码......
}

SlotManager收到SlotRequest后,将SlotRequest转换成PendingSlotRequest结构后,将PendingSlotRequest按照“AllocationID:PendingSlotRequest”的映射关系,存储到Map集合中。

/**
 * JobMaster内的SlotPool会将SlotRequest发送给ResourceManager,ResourceManager会将SlotRequest转发给内部的SlotManager处理。
 */
@Override
public boolean registerSlotRequest(SlotRequest slotRequest) throws ResourceManagerException {
    checkInit();

    if (checkDuplicateRequest(slotRequest.getAllocationId())) {
        LOG.debug("Ignoring a duplicate slot request with allocation id {}.", slotRequest.getAllocationId());

        return false;
    } else {
        // 将SlotRequest转换成PendingSlotRequest
        PendingSlotRequest pendingSlotRequest = new PendingSlotRequest(slotRequest);

        // 按照映射关系“AllocationID:PendingSlotRequest”,将PendingSlotRequest存储到Map集合中
        pendingSlotRequests.put(slotRequest.getAllocationId(), pendingSlotRequest);

        try {
            // 处理PendingSlotRequest
            internalRequestSlot(pendingSlotRequest);
        } catch (ResourceManagerException e) {
            // requesting the slot failed --> remove pending slot request
            pendingSlotRequests.remove(slotRequest.getAllocationId());

            throw new ResourceManagerException("Could not fulfill slot request " + slotRequest.getAllocationId() + '.', e);
        }

        return true;
    }
}

之后SlotManager就要对PendingSlotRequest进行处理。首先去SlotManager中的“当前SlotManager中可用(空闲、未被分配和使用)的Slot”的Map集合中匹配,有就直接分配。
如果没有,那就再去SlotManager中“正处于Pending状态的TaskManager的PendingTaskManagerSlot”的Map集合中碰碰运气,运气好的话,等它释放了就会立即分配给PendingSlotRequest;运气不好的话,只能抛异常了。

// 正处于Pending状态的TaskManager的Slot
private final HashMap<TaskManagerSlotId, PendingTaskManagerSlot> pendingSlots;

/**
 * 给PendingSlotRequest(SlotRequest会被SlotManager转换成PendingSlotRequest结构)分配Slot,
 * 主要是根据资源描述信息,从SlotManager中的“当前SlotManager中可用(空闲、未被分配和使用)的Slot”的Map集合中匹配。
 * 如果匹配上了,就直接分配;
 * 如果没匹配上,就去SlotManager中“正处于Pending状态的TaskManager的PendingTaskManagerSlot”的Map集合中碰碰运气,运气好的话,等它释放了就会立即分配给PendingSlotRequest。
 * 		运气不好的话,就只能抛异常了。
 */
private void internalRequestSlot(PendingSlotRequest pendingSlotRequest) throws ResourceManagerException {
    // 获取PendingSlotRequest(也就是SlotRequest)中的“配置资源信息”,以便于和SlotManager中“可用的、空闲未被分配的Slot”进行匹配
    final ResourceProfile resourceProfile = pendingSlotRequest.getResourceProfile();

    // 根据“配置资源信息”匹配资源,主要是在freeSlots集合中检索
    OptionalConsumer.of(findMatchingSlot(resourceProfile))
        // 如果匹配上,调用allocateSlot()方法分配“Slot计算资源”
        .ifPresent(taskManagerSlot -> allocateSlot(taskManagerSlot, pendingSlotRequest))
        // 匹配不上,继续处理:
        // 1.如果SlotManager中“正处于Pending状态的TaskManager的Slot”的Map集合中有符合条件的PendingTaskManagerSlot,等它释放了,第一时间将它分配给这个获取PendingSlotRequest。
        // 2.如果这都找不到合适的Slot,那就抛异常
        .ifNotPresent(() -> fulfillPendingSlotRequestWithPendingTaskManagerSlot(pendingSlotRequest));
}

看Slot是如何正常分配的:

  • 1.只有FREE状态的Slot才有资格参与分配
  • 2.将这个Slot的状态置为:PENDING
  • 3.将这个Slot对应的TaskManager的注册信息,标记为:已被使用
  • 4.根据这个Slot,得到对应的TaskManager的Gateway,通过RPC调用的方式(异步的)让TaskExecutor将指定的Slot提供给指定的JobManager(这是相应当初JobMaster通过SlotPool发起的SlotRequest)
  • 5.如果上述异步操作正常返回了ACK,说明TaskExecutor已经正常且顺利的分配了Slot,将Slot状态置为:ALLOCATED
  • 6.如果上述异步操作未返回ACK,那就判断失败原因。只要不是CancellationException,SlotManager就会重新为SlotRequest分配一个新的Slot
// 存储了所有“已注册”的TaskManager的注册信息
private final HashMap<InstanceID, TaskManagerRegistration> taskManagerRegistrations;



/**
 * 已经为SlotRequest匹配到了合适的Slot,立即分配给SlotRequest
 */
private void allocateSlot(TaskManagerSlot taskManagerSlot, PendingSlotRequest pendingSlotRequest) {
    // 如果TaskManagerSlot中的Slot的状态为“FREE”,那就分配给PendingRequest。否则,抛出异常。
    Preconditions.checkState(taskManagerSlot.getState() == TaskManagerSlot.State.FREE);

    // 从TaskManagerSlot中获取到TaskExecutor的连接信息--TaskExecutorConnection
    TaskExecutorConnection taskExecutorConnection = taskManagerSlot.getTaskManagerConnection();
    // 从TaskExecutorConnection中获取到TaskExecutor的Gateway
    TaskExecutorGateway gateway = taskExecutorConnection.getTaskExecutorGateway();

    final CompletableFuture<Acknowledge> completableFuture = new CompletableFuture<>();
    final AllocationID allocationId = pendingSlotRequest.getAllocationId();
    final SlotID slotId = taskManagerSlot.getSlotId();
    final InstanceID instanceID = taskManagerSlot.getInstanceId();

    // 将这个TaskManagerSlot分配给PendingSlotRequest,本质就是将这个Slot的状态置为:PENDING
    taskManagerSlot.assignPendingSlotRequest(pendingSlotRequest);
    // 将PendingSlotRequest设置为异步CompletableFuture操作
    pendingSlotRequest.setRequestFuture(completableFuture);

    returnPendingTaskManagerSlotIfAssigned(pendingSlotRequest);

    // 从“所有已注册的TaskManager”的Map集合中,取出这个TaskManagerSlot对应的TaskManager的注册信息--TaskManagerRegistration
    TaskManagerRegistration taskManagerRegistration = taskManagerRegistrations.get(instanceID);

    if (taskManagerRegistration == null) {
        throw new IllegalStateException("Could not find a registered task manager for instance id " +
                                        instanceID + '.');
    }

    // 将这个TaskManager的注册信息,标记为“已被使用”状态
    taskManagerRegistration.markUsed();

    // RPC call to the task manager
    // 调用TaskExecutorGateway#requestSlot()方法为PendingSlotRequest分配Slot,
    // 此时TaskManager会收到来自ResourceManager内部的SlotManager的请求:将指定的Slot提供给指定的JobManager(这是响应当初JobMaster通过SlotPool发起的SlotRequest)
    CompletableFuture<Acknowledge> requestFuture = gateway.requestSlot(
        slotId,
        pendingSlotRequest.getJobId(),
        allocationId,
        pendingSlotRequest.getResourceProfile(),
        pendingSlotRequest.getTargetAddress(),
        resourceManagerId,
        taskManagerRequestTimeout);

    // 执行TaskExecutorGateway的“分配Slot”操作,并返回ACK
    requestFuture.whenComplete(
        (Acknowledge acknowledge, Throwable throwable) -> {
            if (acknowledge != null) {
                completableFuture.complete(acknowledge);
            } else {
                completableFuture.completeExceptionally(throwable);
            }
        });

    // 判断ACK
    completableFuture.whenCompleteAsync(
        (Acknowledge acknowledge, Throwable throwable) -> {
            try {
                if (acknowledge != null) {
                    // 返回了ACK,说明TaskExecutor已经正常且顺利的分配了Slot,更新Slot的状态为:ALLOCATED
                    updateSlot(slotId, allocationId, pendingSlotRequest.getJobId());
                } else {
                    // 未能正常返回ACK,判断失败原因
                    if (throwable instanceof SlotOccupiedException) {
                        // 如果是“Slot已经被占用的异常”,那就用最新返回的Slot信息,更新SlotManager中的Slot状态
                        SlotOccupiedException exception = (SlotOccupiedException) throwable;
                        updateSlot(slotId, exception.getAllocationId(), exception.getJobId());
                    } else {
                        // 如果不是上述异常,就将Slot中的SlotRequest信息清除:这个Slot配不上这个SlotRequest
                        removeSlotRequestFromSlot(slotId, allocationId);
                    }

                    if (!(throwable instanceof CancellationException)) {
                        // 如果不是CancellationException类型的异常,SlotManager会重新为SlotRequest分配一个Slot。
                        handleFailedSlotRequest(slotId, allocationId, throwable);
                    } else {
                        LOG.debug("Slot allocation request {} has been cancelled.", allocationId, throwable);
                    }
                }
            } catch (Exception e) {
                LOG.error("Error while completing the slot allocation.", e);
            }
        },
        mainThreadExecutor);
}

3.TaskExecutor接收到ResourceManager的Slot分配请求

ResourceManager根据JobManager的SlotRequest,完成了Slot的分配。具体的“Slot分配”操作由TaskExecutor操刀执行的:

/**
 * ResourceManager向TaskExecutor通知、分配Slot的全过程
 */
@Override
public CompletableFuture<Acknowledge> requestSlot(
    final SlotID slotId,
    final JobID jobId,
    final AllocationID allocationId,
    final ResourceProfile resourceProfile,
    final String targetAddress,
    final ResourceManagerId resourceManagerId,
    final Time timeout) {
    // TODO: Filter invalid requests from the resource manager by using the instance/registration Id

    log.info("Receive slot request {} for job {} from resource manager with leader id {}.",
             allocationId, jobId, resourceManagerId);

    try {
        // 首先过滤来自ResourceManager的无效的资源申请:TaskManager和ResourceManager之间的连接是否有异常
        if (!isConnectedToResourceManager(resourceManagerId)) {
            final String message = String.format("TaskManager is not connected to the resource manager %s.", resourceManagerId);
            log.debug(message);
            throw new TaskManagerException(message);
        }

        // TaskSlotTable通过维护多个Index,存储了多个Slot实例,通过TaskSlotTable可以快速查找“指定index的Slot”是否free
        // 检查“指定索引的Slot”是否空闲:SlotNumber为Slot对应的唯一id
        if (taskSlotTable.isSlotFree(slotId.getSlotNumber())) {
            // 指定Index的Slot处于空闲状态,可分配
            if (taskSlotTable.allocateSlot(slotId.getSlotNumber(), jobId, allocationId, resourceProfile, taskManagerConfiguration.getTimeout())) {
                log.info("Allocated slot for {}.", allocationId);
            } else {
                // 指定Index的Slot不空闲,抛出异常
                log.info("Could not allocate slot for {}.", allocationId);
                throw new SlotAllocationException("Could not allocate slot.");
            }
        } else if (!taskSlotTable.isAllocated(slotId.getSlotNumber(), jobId, allocationId)) {
            // Slot已经分配给其他Job了
            final String message = "The slot " + slotId + " has already been allocated for a different job.";

            log.info(message);

            final AllocationID allocationID = taskSlotTable.getCurrentAllocation(slotId.getSlotNumber());
            throw new SlotOccupiedException(message, allocationID, taskSlotTable.getOwningJob(allocationID));
        }

        // JobManagerTable内部通过一个映射关系为“JobID:JobManagerConnection”的Map集合,存储了多个JobManagerConnection(TaskExecutor和JobManager通信的专用工具),
        // 传入jobId判断TaskExecutor和JobManager之间的通信是否正常
        if (jobManagerTable.contains(jobId)) {
            // 直接向JobManager提供分配的Slot:将Slot包装成SlotOffer,经JobMasterGateway的RPC方法提供给JobMaster
            offerSlotsToJobManager(jobId);
        } else {
            // 如果JobManagerTable中没有对应的JobID的注册信息
            try {
                // 将JobID对应的作业信息注册到JobLeaderService中(期间出现任何异常,都要释放掉Slot),
                // JobLeaderService服务会帮忙找到JobManager的Leader,并尝试与之建立连接
                jobLeaderService.addJob(jobId, targetAddress);
            } catch (Exception e) {
                // free the allocated slot
                try {
                    // 将Job交给JobLeaderService服务监视,意味着JobLeaderService服务会尝试发现Job对应的JobManager的Leader,并尝试与其建立连接
                    // 该过程出现任何异常,都会释放掉“已分配的Slot”
                    taskSlotTable.freeSlot(allocationId);
                } catch (SlotNotFoundException slotNotFoundException) {
                    // slot no longer existent, this should actually never happen, because we've
                    // just allocated the slot. So let's fail hard in this case!
                    onFatalError(slotNotFoundException);
                }

                // release local state under the allocation id.
                localStateStoresManager.releaseLocalStateForAllocationId(allocationId);

                // sanity check
                if (!taskSlotTable.isSlotFree(slotId.getSlotNumber())) {
                    onFatalError(new Exception("Could not free slot " + slotId));
                }

                throw new SlotAllocationException("Could not add job to job leader service.", e);
            }
        }
    } catch (TaskManagerException taskManagerException) {
        return FutureUtils.completedExceptionally(taskManagerException);
    }

    return CompletableFuture.completedFuture(Acknowledge.get());
}

TaskSlotTable是TaskExecutor内部用来存储Slot的容器,它维护了多个Index可以快速查找指定Index的Slot是否FREE。向JobManager分配的Slot,就是出自TaskSlotTable。

JobManagerTable内部通过一个映射关系为“JobID:JobManagerConnection”的Map集合,存储了多个JobManagerConnection。通过JobID就能判断出这个TaskExecutor和JobManager之间是否能正常通信。如果能正常通信,就直接向JobManager提供分配的指定的Slot;如果不能,就让JobLeaderService服务帮忙找到JobManager的Leader,并尝试与之建立连接。

特别注意一点:为JobManager分配的Slot,会包装成SlotOffer后,经JobMasterGateway的RPC方法提供给JobMaster。简单理解就是TaskExecutor向JobMaster“发Slot Offer”

4.JobManager接收TaskExecutor的SlotOffer

JobMaster收到这一批SlotOffer后,说明JobManager已经可以使用这些Slot对Task进行调度和执行了。

/**
 * TaskExecutor遵从ResourceManager的最高指示,这一批Slot包装成SlotOffer后提供给JobMaster
 * SlotOffer会被保存到SlotPool(管理Slot的“Slot池子”)中,Task在调度、执行时,会从SlotPool中获取有效的Slot,通过调度器向Slot所在的TaskManager提交Task实例运行。
 */
@Override
public CompletableFuture<Collection<SlotOffer>> offerSlots(
    final ResourceID taskManagerId,
    final Collection<SlotOffer> slots,
    final Time timeout) {

    // 从“已注册的TaskManager”的Map集合中,取出对应的TaskManager的“TaskManagerLocation和TaskExecutorGateway”
    Tuple2<TaskManagerLocation, TaskExecutorGateway> taskManager = registeredTaskManagers.get(taskManagerId);

    if (taskManager == null) {
        return FutureUtils.completedExceptionally(new Exception("Unknown TaskManager " + taskManagerId));
    }

    final TaskManagerLocation taskManagerLocation = taskManager.f0;
    final TaskExecutorGateway taskExecutorGateway = taskManager.f1;

    // 创建RpcTaskManagerGateway实例,它是TaskManagerGateway的实现子类,可以和TaskManager进行通信
    final RpcTaskManagerGateway rpcTaskManagerGateway = new RpcTaskManagerGateway(taskExecutorGateway, getFencingToken());

    return CompletableFuture.completedFuture(
        // 将SlotOffer集合交给SlotPool保管
        slotPool.offerSlots(
            taskManagerLocation,
            rpcTaskManagerGateway,
            slots));
}

SlotOffer会被保存到SlotPool(管理Slot的“Slot池子”)中,Task在调度、执行时,会从SlotPool中获取有效的Slot,通过调度器向Slot所在的TaskManager提交Task实例运行。

你可能感兴趣的:(Flink,flink)