yarn 资源调度问题排查

yarn 中默认使用protobuf消息格式进行RPC调用,ApplicationMasterProtocolPBClientImpl

RMContainerRequestor:remoteRequestsTable is not update ??
AllocateResponse RMContainerRequestor:makeRemoteRequest(): here ask.size() =1 ??

通过allocate的接口,去找服务器端的实现 ApplicationMasterProtocolPBServiceImpl,进入服务器端的功能实现类ApplicationMasterProtocol 的子类 ApplicationMasterService

使用调度器进行调度
Allocation allocation = this.rScheduler.allocate(appAttemptId, ask, release, blacklistAdditions, blacklistRemovals);

// containers 分配代码核心

ContainersAndNMTokensAllocation SchedulerApplicationAttempt:pullNewlyAllocatedContainersAndNMTokens() --> newlyAllocatedContainers 返回列表中所有container
-->
CapacityScheduler.AsyncScheduleThread 启动后台线程不断进行schedule
CapacityScheduler.schedule()
CapacityScheduler.allocateContainersToNode()
CSAssignment ParentQueue.assignContainers()
CSAssignment ParentQueue.assignContainersToChildQueues()

// Try to schedule
CSAssignment assignment =LeafQueue:assignContainersOnNode()
LeafQueue:[assignNodeLocalContainers() | assignRackLocalContainers() | assignOffSwitchContainers() ]--> LeafQueue:assignContainer()
LeafQueue:assignContainer()
FiCaSchedulerApp.allocate()
RMContainerAllocator:allocateResponse.getAvailableResources();  response :<vCores=-21> ,means this job has used 21 vCores 

–> Heart of the scheduler…
Headroom is min((userLimit, queue-max-cap) - consumed)

在FifoScheduler:assignContainers(FiCaSchedulerNode node) 在分配container之后会updateAppHeadRoom(attempt)
  1. 参数计算:
    User limit computation for wankun in queue default

userLimit=100
userLimitFactor=10.0
required:

activeUsers: 1 clusterCapacity:

你可能感兴趣的:(yarn 资源调度问题排查)