YARN 框架源码分析

UHP博客文章地址:http://yuntai.1kapp.com/?p=652

原创文章,转载请注明出处:http://blog.csdn.net/wind5shy/article/details/8348839

ResourceManager

管理集群资源,创建时需要一个Store存储其信息。

Store

·          管理和存储RM状态接口,包含以下两个子接口,同时继承NodeStore和ApplicationsStore接口。

·          实现目前有MemStore和ZKStore两种,它们都实现了RMState和ApplicationStore。

ApplicationInfo

可以通过该接口获取应用的ApplicationMaster、MasterContainer、ApplicationSubmissionContext和所有Container。

RMState

可以通过该接口获取RM已存储的RMNode和ApplicationInfo信息

NodeStore

管理和存储RMNode的接口,主要是添加和删除RMNode。

RMNode

Nodemanagers information on available resources and other static information.

RM管理的节点(NM)接口,包括节点信息和节点资源信息。

NodeId

RMNode id,包括节点host和端口。

ApplicationsStore

管理和存储应用的接口,包括根据ApplicationId和ApplicationSubmissionContext来创建应用以及根据ApplicationId删除相应应用。

ApplicationId

应用id接口,包含整数id和RM的启动时间戳。

ApplicationStore

应用用来管理和存储Container的接口,包括添加和删除Container、存储MasterContainer、和ApplicationMaster联系以更新应用状态。

ClientToAMSecretManager

管理客户端令牌。

RMContainerTokenSecretManager

管理容器令牌。

ApplicationTokenSecretManager

管理应用令牌。

RMDelegationTokenSecretManager

@Private
@Unstable

AResourceManager specific delegation token secret manager. The secret manager isresponsible for generating and accepting the password for each token.

管理RM委托令牌,处理每一个令牌密码的创建和验证。

Dispatcher

EventDispatcher interface. It dispatches events to registered event handlers basedon event types.

根据事件类型将事件分发到相应的已注册的事件处理器上进行处理。

实现类AsyncDispatcher,主要包含:

private final BlockingQueue<Event> eventQueue;

存放事件的阻塞队列。

 

private ThreadeventHandlingThread;

处理事件的线程,不断地从eventQueue中取出事件并分发到相应EventHandler上进行处理。

 

protected final Map<Class<? extends Enum>,EventHandler>eventDispatchers;

事件类型--事件处理器的map,用于管理相关EventHandler。

 

主要对外接口是register和getEventHandler。前者将事件类型和相应的EventHandler对象注册到Dispatcher上(放入eventDispatchers),后者则是获得一个GenericEventHandler,通过调用这个GenericEventHandler的handle来处理相关事件。GenericEventHandler的handle实际上只是将事件放入eventQueue,具体的事件处理是由已注册的相应EventHandler来处理的。(个人感觉这种编码模式有坏味,应该直接在Dispatcher中添加一个handle方法来表示接受事件并进行相应处理的概念,GenericEventHandler的handle方法内容移到此方法中。)

RM初始化的时候会将所有相关EventHandler注册到Dispatcher上。

ResourceScheduler

@LimitedPrivate(value={"yarn"})
@Evolving

Thisinterface is the one implemented by the schedulers. It mainly extends YarnScheduler.

·          调度器接口,主要功能继承自YarnScheduler,同时继承了Recoverable,可以从指定RMState中恢复。

·          实现有FairScheduler和FifoScheduler。

ClientRMService

Theclient interface to the Resource Manager. This module handles all the rpcinterfaces to the resource manager from the client.

实现ClientRMProtocol,处理客户端相关请求,包括获取新应用、获取应用报告、提交应用,杀掉应用等。

ApplicationMasterService

实现AMRMProtocol,处理ApplicationMaster相关请求,包括注册/完成ApplicationMaster请求、ApplicationMaster对RM的分配请求、注册/注销ApplicationAttemptId等。

ApplicationMasterLauncher

用来处理AMLauncherEvent,根据事件的类型(LAUNCH,CLEANUP)创建相应AMLauncher线程并放入一个BlockingQueue中,使用一个专门的线程不断从这个queue中取出线程执行。

AdminService

实现RMAdminProtocol,提供administration相关功能,包括获取集群当前的应用队列、节点、用户、服务控制等信息。

ContainerAllocationExpirer

主要继承自AbstractLivelinessMonitor,监控Container是否过期。主要方法包括根据ContainerId对Container进行注册/注销、对过期的Container提交相应事件到相关EventHandler进行处理。

NMLivelinessMonitor

同样主要继承自AbstractLivelinessMonitor,监控RMNode是否存活。主要方法负责根据NodeId对RMNode进行注册/注销、对过期的RMNode提交相应事件到相关EventHandler进行处理。

NodesListManager

实现EventHandler,可以对节点状态事件(NODE_USABLE,NODE_UNUSABLE)进行处理(将不可用的节点放入不可用节点列表中,将恢复的节点从不可用节点中移除);同时,还可以对配置中的节点信息进行访问(包括slaves和excludeds,支持对这两个配置的动态修改)。

SchedulerEventDispatcher

实现EventHandler,处理SchedulerEvent(类型包括NODE_ADDED,NODE_REMOVED, NODE_UPDATE, APP_ADDED, APP_REMOVED, CONTAINER_EXPIRED)。包含一个BlockingQueue,handle时只负责将事件放入queue;同时还包含一个EventProcessor线程和一个ResourceScheduler,EventProcessor不断从queue中取出事件交给ResourceScheduler处理。

RMAppManager

实现EventHandler,通过处理RMAppManagerEvent(APP_SUBMIT, APP_COMPLETED)来对应用进行管理。

ApplicationACLsManager

应用访问控制管理器。

WebApp

RM的web页面管理。

RMContext

RM的上下文接口,可以以之获得以上的大部分RM成员。

 

ResourceTrackerService

实现ResourceTracker,主要处理NM注册和心跳请求并返回相应响应信息。

NodeManager

NodeManagerMetrics

NM统计数据类。

ApplicationACLsManager

应用访问控制管理器。

NodeHealthCheckerService

Theclass which provides functionality of checking the health of the node andreporting back to the service for which the health checker has been asked toreport.

提供检查本节点是否健康和健康报告的功能。

LocalDirsHandlerService

Theclass which provides functionality of checking the health of the localdirectories of a node. This specifically manages nodemanager-local-dirs andnodemanager-log-dirs by periodically checking their health.

提供节点本地文件夹是否健康的功能,具体就是检查NM的本地文件夹和日志文件夹。

CompositeServiceShutdownHook

JVMShutdown hook for CompositeService which will stop the give CompositeServicegracefully in case of JVM shutdown.

在JVM关闭时优雅地关闭指定服务,这里是关闭NM。

ApplicationMaster

AnApplicationMaster for executing shell commands on a set of launched containersusing the YARN framework.

Thisclass is meant to act as an example on how to write yarn-based applicationmasters.

TheApplicationMaster is started on a container by the ResourceManager's launcher. The first thing that theApplicationMaster needs to do is to connect and registeritself with theResourceManager. Theregistration sets up information within the ResourceManager regardingwhat host:port the ApplicationMaster is listening on to provide any form offunctionality to a client as well as a tracking url that a client can use tokeep track of status/job history if needed.

TheApplicationMaster needs to send aheartbeat to the ResourceManager at regularintervals to inform theResourceManager that it isup and alive. TheAMRMProtocol.allocate to theResourceManager from the ApplicationMaster acts as aheartbeat.

Forthe actual handling of the job, the ApplicationMaster has torequest theResourceManager viaAllocateRequest for therequired no. of containers usingResourceRequest with thenecessary resource specifications such as node location, computational(memory/disk/cpu) resource requirements. TheResourceManager respondswith anAllocateResponse thatinforms theApplicationMaster of the setof newly allocated containers, completed containers as well as current state ofavailable resources.

Foreach allocated container, the ApplicationMaster can thenset up the necessary launch context viaContainerLaunchContext to specifythe allocated container id, local resources required by the executable, theenvironment to be setup for the executable, commands to execute, etc. andsubmit aStartContainerRequest to theContainerManager to launchand execute the defined commands on the given allocated container.

TheApplicationMaster can monitor thelaunched container by either querying theResourceManager usingAMRMProtocol.allocate to getupdates on completed containers or via theContainerManager by queryingfor the status of the allocated container'sContainerId.

Afterthe job has been completed, the ApplicationMaster has to sendaFinishApplicationMasterRequest to theResourceManager to inform it that theApplicationMaster has been completed.

·          启动、连接、注册

AM由RM的launcher启动,运行在一个container上。首先AM需要连接RM并注册自身。注册时附带以下信息:AM所在节点host和AM监听的rpc端口(AM通过此端口响应客户端请求),追踪url(客户端通过此url追踪应用的状态和历史)。

·          心跳

AM需要通过定期的心跳向RM报告其状态,心跳具体是通过AMRMProtocol.allocate(ApplicationMasterService)进行的。

实际中的任务处理,AM通过AllocateRequest接口向RM请求所需的containers。每个AllocateRequest包含了一个ResourceRequest列表。每个ResourceRequest代表了一个应用对一个节点的所有资源请求,包括当前请求的一个Resource和之前已分配的containers。RM通过AllocateResponse通知AM新分配的containers、已完成的containers和可用的Resource(均包含在AMResponse)。

对于每个被分配的container,AM使用ContainerLaunchContext来设置必要的加载上下文,具体包括container id、运行所需的本地资源、运行的环境变量、将要运行的命令等,同时提交一个StartContainerRequest到ContainerManager以在分配的container上加载和执行定义的命令。

AM可以通过使用AMRMProtocol.allocate向RM查询已完成的containers的更新信息,或者根据ContainerId通过ContainerManager.getContainerStatus向ContainerManager查询已分配的containers的状态。

·          完成

当应用完成后,AM需要向RM发送以FinishApplicationMasterRequest以通知RM自己已经完成。

 

AllocateRequest

@Public
@Stable

Thecore request sent by the ApplicationMaster to the ResourceManager to obtain resources in the cluster.

AM向RM获取资源的核心请求。

Therequest includes:

·        ApplicationAttemptId beingmanaged by the ApplicationMaster

应用尝试id。

·        A response id to track duplicate responses.

 

·        Progress information.

意义不明,目前是AM中已完成的containers。

·        A list of ResourceRequest to informthe ResourceManager about theapplication's resource requirements.

资源请求列表。

·        A list of unused Container which arebeing returned.

AM释放的containers列表。

SeeAlso:

AMRMProtocol.allocate(AllocateRequest)

 

ResourceRequest

Application

主要包含用户、获得的containers、应用id和状态(NEW, INITING, RUNNING,FINISHING_CONTAINERS_WAIT,APPLICATION_RESOURCES_CLEANINGUP, FINISHED)等信息。

ContainerManager

@Public
@Stable

Theprotocol between an ApplicationMaster and a NodeManager to start/stop containers and to get status ofrunning containers.

Ifsecurity is enabled the NodeManager verifiesthat the ApplicationMaster has trulybeen allocated the container by theResourceManager and alsoverifies all interactions such as stopping the container or obtaining statusinformation for the container.

用于处理AM和NM直接启动/停止containers和获取运行的containers状态。

启用安全验证的话,NM将验证AM确定从RM处分配到了containers,同时也会验证其他所有的交互,如停止container和获取container状态。

Container

@Public
@Stable

Container represents anallocated resource in the cluster.

TheResourceManager is the sole authorityto allocate any Container toapplications. The allocatedContainer is alwayson a single node and has a uniqueContainerId. It has aspecific amount ofResource allocated.

代表了集群中一个已分配的资源。

只有RM可以给应用分配container。分配的container只会在一个节点上,并且有唯一的ContainerId。(暂时不确定最后一句的意义)

Itincludes details such as:

·        ContainerId for thecontainer, which is globally unique.

全局唯一。

·        NodeId of the nodeon which it is allocated.

container所在节点。

·        HTTP uri of the node.

节点uri。

·        Resource allocatedto the container.

分配给container的资源。

·        Priority at whichthe container was allocated.

container优先级。

·        ContainerState of thecontainer.

container状态,NEW,RUNNING,COMPLETE

·        ContainerToken of thecontainer, used to securely verify authenticity of the allocation.

container令牌,验证分配的安全性。

·        ContainerStatus of thecontainer.

container详细状态信息。

Typically,an ApplicationMaster receivesthe Container from theResourceManager during resource-negotiation and then talks totheNodManager to start/stopcontainers.

SeeAlso:

AMRMProtocol.allocate(org.apache.hadoop.yarn.api.protocolrecords.AllocateRequest)

ContainerManager.startContainer(org.apache.hadoop.yarn.api.protocolrecords.StartContainerRequest)

ContainerManager.stopContainer(org.apache.hadoop.yarn.api.protocolrecords.StopContainerRequest)

具体包括:

·        ContainerId for thecontainer, which is globally unique.

·        NodeId of the nodeon which it is allocated.

·        HTTP uri of the node.

·        Resource allocatedto the container.

·        Priority at whichthe container was allocated.

·        ContainerState of thecontainer.

·        ContainerToken of thecontainer, used to securely verify authenticity of the allocation.

·        ContainerStatus of thecontainer.

 

Resource

代表集群中的计算资源,目前仅代表一块大小可以设置的内存。

YarnClient

客户端,主要功能包括提交/杀死任务,获取相关信息(任务报告、所有任务列表、所有节点报告、队列信息等)。

ClientRMProtocol

@Public
@Stable

Theprotocol between clients and the ResourceManager tosubmit/abort jobs and to get information on applications, cluster metrics, nodes,queues and ACLs.

客户端和RM交互的协议,YarnClient的主要功能都通过其实现。协议实现是ClientRMService。

 

原创文章,转载请注明出处:http://blog.csdn.net/wind5shy/article/details/8348839

你可能感兴趣的:(源码,hadoop,框架,0.23)