2、如果集群采用 Native【本地模式】部署,则 ResourceManager 会动态地向集群资源管理器申请 Container 并启动TaskManager,例如Hadoop Yarn、Kubernetes等。
支持动态资源管理的集群类型,可以按需启动TaskManager资源,根据Job所需的资源请求,动态启动TaskManager节点,这种资源管理方式不用担心资源浪费和资源动态伸缩的问题。
实现动态资源管理的ResourceManager需要继承ActiveResourceManager基本实现类。
ResourceManagerRuntimeServices 中包含 SlotManager 和 JobLeaderldService 两个主要服务和 HeartbeatService 心跳服务。
resourceManager =
resourceManagerFactory.createResourceManager(
configuration,
ResourceID.generate(),
rpcService,
highAvailabilityServices,
heartbeatServices,
fatalErrorHandler,
new ClusterInformation(hostname, blobServer.getPort()),
webMonitorEndpoint.getRestBaseUrl(),
metricRegistry,
hostname,
ioExecutor);
SlotMatchingStrategy 根据作业中给定的 ResourceProfile 匹配 Slot 计算资源。SlotMatchingStrategy主要分为两种类型:
一种是LeastUtilizationSlotMatchingStrategy,即按照利用率最低原则匹配Slot资源,尽可能保证TaskExecutor上资源的使用率处于比较低的水平,这种策略能够有效降低机器的负载。
另一种是AnyMatchingSlotMatchingStrategy,即直接返回第一个匹配的Slot资源策略。
private static SlotManager createSlotManager(
ResourceManagerRuntimeServicesConfiguration configuration,
ScheduledExecutor scheduledExecutor,
SlotManagerMetricGroup slotManagerMetricGroup) {
final SlotManagerConfiguration slotManagerConfiguration =
configuration.getSlotManagerConfiguration();
if (configuration.isEnableFineGrainedResourceManagement()) {
return new FineGrainedSlotManager(
scheduledExecutor,
slotManagerConfiguration,
slotManagerMetricGroup,
new DefaultResourceTracker(),
new FineGrainedTaskManagerTracker(),
new DefaultSlotStatusSyncer(
slotManagerConfiguration.getTaskManagerRequestTimeout()),
new DefaultResourceAllocationStrategy(
SlotManagerUtils.generateTaskManagerTotalResourceProfile(
slotManagerConfiguration.getDefaultWorkerResourceSpec()),
slotManagerConfiguration.getNumSlotsPerWorker()),
Time.milliseconds(REQUIREMENTS_CHECK_DELAY_MS));
} else if (configuration.isDeclarativeResourceManagementEnabled()) {
return new DeclarativeSlotManager(
scheduledExecutor,
slotManagerConfiguration,
slotManagerMetricGroup,
new DefaultResourceTracker(),
new DefaultSlotTracker());
} else {
return new SlotManagerImpl(
scheduledExecutor, slotManagerConfiguration, slotManagerMetricGroup);
}
}
final JobLeaderIdService jobLeaderIdService =
new DefaultJobLeaderIdService(
highAvailabilityServices, scheduledExecutor, configuration.getJobTimeout());
return new StandaloneResourceManager(
rpcService,
resourceId,
highAvailabilityServices,
heartbeatServices,
resourceManagerRuntimeServices.getSlotManager(),
ResourceManagerPartitionTrackerImpl::new,
resourceManagerRuntimeServices.getJobLeaderIdService(),
clusterInformation,
fatalErrorHandler,
resourceManagerMetricGroup,
standaloneClusterStartupPeriodTime,
AkkaUtils.getTimeoutAsTime(configuration),
ioExecutor);
在 StandaloneResourceManager 构造方法中启动 RpcServer
this.rpcServer = rpcService.startServer(this);
resourceManager.start()->ResourceManager#onStart
ResourceManager#startResourceManagerServices
leaderElectionService =
highAvailabilityServices.getResourceManagerLeaderElectionService();
resourceManagerDriver.initialize(this, new GatewayMainThreadExecutor(), ioExecutor);
在ResourceManager中HeartbeatService的启动方法中,包括了对taskManagerHeartbeatManager和jobManagerHeartbeatManager两个心跳管理服务的启动操作。
而心跳管理服务主要通过TaskManagerHeartbeatListener和JobManagerHeartbeatListener两个监听器收集来自TaskManager和JobManager的心跳信息,以保证整个运行时中各个组件之间能够正常通信。
startHeartbeatServices();
通过scheduledExecutor线程池启动TaskManager周期性超时检查服务,通过checkTaskManagerTimeouts()方法实现该检查,防止TaskManager长时间掉线等问题。
启动单独的线程对提交的SlotRequest进行周期性超时检查,防止Slot请求超时。
slotManager.start(getFencingToken(), getMainThreadExecutor(), new ResourceActionsImpl());
jobLeaderIdService.start(new JobLeaderIdActionsImpl());