yarn是一款非常优秀的分布式资源管理和调度框架,我们的应用程序想要分布式运行,只要使用yarn来管理资源就会非常放心。现如今好多大型计算框架都可以运行在yarn框架上,比如天生运行在yarn上的MapReduce、优秀的内存计算引擎Spark、后起之秀Flink等都支持yarn的运行模式。那么我们自己开发的程序该如何运行在这款优秀的资源管理和调度框架呢。
友情链接:
hadoop源码下载地址
hadoop-yarn官网地址
说明:
1. 屎黄色的框框代表一个节点也就是一台机器,这几个节点上运行着几个常驻进程,图片上蓝色背景的ResourceManager,NodeManager。
2. 其他的红色和紫色背景的都是因为提交了yarn应用程序才启动的,也就是说不是常驻进程。
3. 我们自定义yarn应用程序主要就是需要完成Client、AppMaster、和container内的执行代码。
基本流程:
a. 首先启动的是我们的Client客户端程序,这个程序会向ResourceManager发出要提交一个yarn应用的请求,ResourceManager是一个常驻进程可以看做是一个服务。
b. 当你的Client发出请求之后自认ResourceManager会做出相应的回应,此时Client就会得到一个回应response。
c. Client对这个response给出首先启动AppMaster的描述,然后将这个任务submit提交给ResourceManager。
d. 此时ResourceManager就会给你的submit的AppMaster找个合适的资源节点,并且启动它。
e. 当这个AppMaster启动之后首先是向ResourceManager注册自己,发送自己的运行状态。
f. 接下来就是AppMaster和ResourceManager之间的各种请求和响应了,它们之间的请求和响应为了就是启动用来干活的container来完成分布式计算的主体工作。其启动过程和Client与ResourceManager交互启动AppMaster是差不多的。
tips:该部分源码是在这个目录下
hadoop-2.7.3-src\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-applications\hadoop-yarn-applications-distributedshell
这个程序的功能就是根据你输出的参数和脚本参数,分布式执行你给出的shell命令,可以是shell脚本文件可以是可执行的shell命令。在这部分代码中一共就两个主要的代码文件一个是Client.java另一个就是ApplicationMaster.java,其他两个就是辅助性代码文件了。下面就摘取代码片段做精简解释。
public class Client{
/**
* Main run function for the client
* @return true if application completed successfully
* @throws IOException
* @throws YarnException
*/
public boolean run() throws IOException, YarnException {
// 1. yarnClient已经在构造函数中被初始化,现启动yarnClient客户端
yarnClient.start();
// 2. 获取集群信息
YarnClusterMetrics clusterMetrics = yarnClient.getYarnClusterMetrics();
// 3. 获取各节点的运行状态
List<NodeReport> clusterNodeReports = yarnClient.getNodeReports(
NodeState.RUNNING);
// 4. 创建Yarn应用程序
YarnClientApplication app = yarnClient.createApplication();
// 5. 获取ResourceManager响应
GetNewApplicationResponse appResponse = app.getNewApplicationResponse();
// 6. 获得最大申请资源包括内存资源和cpu核心数资源,控制申请资源不要大于可用资源
int maxMem = appResponse.getMaximumResourceCapability().getMemory();
if (amMemory > maxMem) {
amMemory = maxMem;
}
int maxVCores = appResponse.getMaximumResourceCapability().getVirtualCores();
if (amVCores > maxVCores) {
amVCores = maxVCores;
}
// 7. 设置任务上下文环境,其过程就是准备你的AppMaster的运行环境,包括启动脚本、配置文件、运行参数、class环境等
ApplicationSubmissionContext appContext = app.getApplicationSubmissionContext();
ApplicationId appId = appContext.getApplicationId();
// keepContainers的意思就是如果container失败了重试的时候是杀死重新创建container还是保留container容器
appContext.setKeepContainersAcrossApplicationAttempts(keepContainers);
// 8. 设置任务名称
appContext.setApplicationName(appName);
// 将需要使用的资源文件拷贝到hdfs上使各个节点都能共享下载使用,因为你的AppMaster或container程序都不一定在哪个节点上运行着
Map<String, LocalResource> localResources = new HashMap<String, LocalResource>();
FileSystem fs = FileSystem.get(new Configuration());
addToLocalResources(fs, appMasterJar, appMasterJarPath, appId.toString(),
localResources, null);
// 作为分布式的container需要执行的shell的hdfs脚本路径
String hdfsShellScriptLocation = "";
long hdfsShellScriptLen = 0;
long hdfsShellScriptTimestamp = 0;
if (!shellScriptPath.isEmpty()) {
Path shellSrc = new Path(shellScriptPath);
String shellPathSuffix =
appName + "/" + appId.toString() + "/" + SCRIPT_PATH;
Path shellDst =
new Path(fs.getHomeDirectory(), shellPathSuffix);
fs.copyFromLocalFile(false, true, shellSrc, shellDst);
hdfsShellScriptLocation = shellDst.toUri().toString();
FileStatus shellFileStatus = fs.getFileStatus(shellDst);
hdfsShellScriptLen = shellFileStatus.getLen();
hdfsShellScriptTimestamp = shellFileStatus.getModificationTime();
}
// 如果是需要安全验证的集群要设置token
//amContainer.setContainerTokens(containerToken);
// 准备好master运行所依赖的运行环境
Map<String, String> env = new HashMap<String, String>();
env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTLOCATION, hdfsShellScriptLocation);
env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTTIMESTAMP, Long.toString(hdfsShellScriptTimestamp));
env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTLEN, Long.toString(hdfsShellScriptLen));
if (domainId != null && domainId.length() > 0) {
env.put(DSConstants.DISTRIBUTEDSHELLTIMELINEDOMAIN, domainId);
}
// 任务的运行依赖jar包的准备
StringBuilder classPathEnv = new StringBuilder(Environment.CLASSPATH.$$())
.append(ApplicationConstants.CLASS_PATH_SEPARATOR).append("./*");
for (String c : conf.getStrings(
YarnConfiguration.YARN_APPLICATION_CLASSPATH,
YarnConfiguration.DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH)) {
classPathEnv.append(ApplicationConstants.CLASS_PATH_SEPARATOR);
classPathEnv.append(c.trim());
}
classPathEnv.append(ApplicationConstants.CLASS_PATH_SEPARATOR).append(
"./log4j.properties");
// 准备class环境
if (conf.getBoolean(YarnConfiguration.IS_MINI_YARN_CLUSTER, false)) {
classPathEnv.append(':');
classPathEnv.append(System.getProperty("java.class.path"));
}
env.put("CLASSPATH", classPathEnv.toString());
// 根据参数构建启动AppMaster的脚本
Vector<CharSequence> vargs = new Vector<CharSequence>(30);
vargs.add(Environment.JAVA_HOME.$$() + "/bin/java");
vargs.add("-Xmx" + amMemory + "m");
vargs.add(appMasterMainClass);
vargs.add("--container_memory " + String.valueOf(containerMemory));
vargs.add("--container_vcores " + String.valueOf(containerVirtualCores));
vargs.add("--num_containers " + String.valueOf(numContainers));
if (null != nodeLabelExpression) {
appContext.setNodeLabelExpression(nodeLabelExpression);
}
vargs.add("--priority " + String.valueOf(shellCmdPriority));
for (Map.Entry<String, String> entry : shellEnv.entrySet()) {
vargs.add("--shell_env " + entry.getKey() + "=" + entry.getValue());
}
if (debugFlag) {
vargs.add("--debug");
}
vargs.add("1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/AppMaster.stdout");
vargs.add("2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/AppMaster.stderr");
// 最终构建好的AppMaster启动脚本
StringBuilder command = new StringBuilder();
for (CharSequence str : vargs) {
command.append(str).append(" ");
}
List<String> commands = new ArrayList<String>();
commands.add(command.toString());
// AppMaster程序其实也是一个container程序
ContainerLaunchContext amContainer = ContainerLaunchContext.newInstance(
localResources, env, commands, null, null, null);
// 设置资源大小
Resource capability = Resource.newInstance(amMemory, amVCores);
appContext.setResource(capability);
appContext.setAMContainerSpec(amContainer);
// 设置任务优先级
Priority pri = Priority.newInstance(amPriority);
appContext.setPriority(pri);
// 最终的提交
applicationsManager.submitApplication(appRequest);
yarnClient.submitApplication(appContext);
}
}
public class ApplicationMaster {
/**
* Main run function for the application master
*
* @throws YarnException
* @throws IOException
*/
@SuppressWarnings({ "unchecked" })
public void run() throws YarnException, IOException, InterruptedException {
// 获得用户证书
Credentials credentials =
UserGroupInformation.getCurrentUser().getCredentials();
DataOutputBuffer dob = new DataOutputBuffer();
credentials.writeTokenStorageToStream(dob);
// 确保applicationMaster和resourceManager之间的通信畅通,去掉身份认证
// Now remove the AM->RM token so that containers cannot access it.
Iterator<Token<?>> iter = credentials.getAllTokens().iterator();
LOG.info("Executing with tokens:");
while (iter.hasNext()) {
Token<?> token = iter.next();
LOG.info(token);
if (token.getKind().equals(AMRMTokenIdentifier.KIND_NAME)) {
iter.remove();
}
}
allTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength());
// Create appSubmitterUgi and add original tokens to it
String appSubmitterUserName =
System.getenv(ApplicationConstants.Environment.USER.name());
appSubmitterUgi =
UserGroupInformation.createRemoteUser(appSubmitterUserName);
appSubmitterUgi.addCredentials(credentials);
// resourceManager函数回调控制器,创建异步通信
AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler();
amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener);
amRMClient.init(conf);
amRMClient.start();
// 创建nodeManager节点回调控制器
containerListener = createNMCallbackHandler();
nmClientAsync = new NMClientAsyncImpl(containerListener);
nmClientAsync.init(conf);
nmClientAsync.start();
startTimelineClient(conf);
if(timelineClient != null) {
publishApplicationAttemptEvent(timelineClient, appAttemptID.toString(),
DSEvent.DS_APP_ATTEMPT_START, domainId, appSubmitterUgi);
}
// 向resourceManager注册自己,并开始心跳通信
// Register self with ResourceManager
// This will start heartbeating to the RM
appMasterHostname = NetUtils.getHostname();
RegisterApplicationMasterResponse response = amRMClient
.registerApplicationMaster(appMasterHostname, appMasterRpcPort,
appMasterTrackingUrl);
// 以下部分都是对资源的合理性校验,避免资源占用过多
int maxMem = response.getMaximumResourceCapability().getMemory();
int maxVCores = response.getMaximumResourceCapability().getVirtualCores();
if (containerMemory > maxMem) {
LOG.info("Container memory specified above max threshold of cluster."
+ " Using max value." + ", specified=" + containerMemory + ", max="
+ maxMem);
containerMemory = maxMem;
}
if (containerVirtualCores > maxVCores) {
LOG.info("Container virtual cores specified above max threshold of cluster."
+ " Using max value." + ", specified=" + containerVirtualCores + ", max="
+ maxVCores);
containerVirtualCores = maxVCores;
}
// 创建这个master所申请管理的计算节点的container
List<Container> previousAMRunningContainers =
response.getContainersFromPreviousAttempts();
LOG.info(appAttemptID + " received " + previousAMRunningContainers.size()
+ " previous attempts' running containers on AM registration.");
numAllocatedContainers.addAndGet(previousAMRunningContainers.size());
int numTotalContainersToRequest =
numTotalContainers - previousAMRunningContainers.size();
// 向resourceManager请求所需要申请的资源
for (int i = 0; i < numTotalContainersToRequest; ++i) {
ContainerRequest containerAsk = setupContainerAskForRM();
amRMClient.addContainerRequest(containerAsk);
}
numRequestedContainers.set(numTotalContainers);
}
}
tips:这段代码的路径是:spark-2.3.3\resource-managers\yarn\src\main\scala\org\apache\spark\deploy\yarn\Client.scala
这段代码是将spark任务提交到yarn上的一段代码,以下代码片段只是其中一部分片段
private[spark] class Client(
val args: ClientArguments,
val sparkConf: SparkConf)
extends Logging {
/**
* Submit an application running our ApplicationMaster to the ResourceManager.
*
* The stable Yarn API provides a convenience method (YarnClient#createApplication) for
* creating applications and setting up the application submission context. This was not
* available in the alpha API.
*/
def submitApplication(): ApplicationId = {
var appId: ApplicationId = null
try {
launcherBackend.connect()
// Setup the credentials before doing anything else,
// so we have don't have issues at any point.
setupCredentials()
yarnClient.init(hadoopConf)
yarnClient.start()
logInfo("Requesting a new application from cluster with %d NodeManagers"
.format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))
// Get a new application from our RM
val newApp = yarnClient.createApplication()
val newAppResponse = newApp.getNewApplicationResponse()
appId = newAppResponse.getApplicationId()
new CallerContext("CLIENT", sparkConf.get(APP_CALLER_CONTEXT),
Option(appId.toString)).setCurrentContext()
// Verify whether the cluster has enough resources for our AM
verifyClusterResources(newAppResponse)
// Set up the appropriate contexts to launch our AM
val containerContext = createContainerLaunchContext(newAppResponse)
val appContext = createApplicationSubmissionContext(newApp, containerContext)
// Finally, submit and monitor the application
logInfo(s"Submitting application $appId to ResourceManager")
yarnClient.submitApplication(appContext)
launcherBackend.setAppId(appId.toString)
reportLauncherState(SparkAppHandle.State.SUBMITTED)
appId
} catch {
case e: Throwable =>
if (appId != null) {
cleanupStagingDir(appId)
}
throw e
}
}
}
我们写一个比较简单的yarn应用程序,这个应用程序也包括最基本的两部分代码,Client和AppMaster,只不过AppMaster不再申请更多的container来执行具体的计算,只需要在AppMaster里面运行一个Hello Yarn的输出即可。
public class AppClient {
private static Logger LOG = LoggerFactory.getLogger(AppClient.class);
private static String appMasterClass = "com.example.yarn.demo01.AppMaster";
private static final String appName = "yarn application demo";
public static void main(String[] args) {
AppClient client = new AppClient();
try {
client.run();
} catch (Exception e) {
LOG.error("client run exception , please check log file.", e);
}
}
// 开始执行任务
public void run() throws IOException, YarnException {
Configuration hadoopConf = new Configuration();
// 1. 创建YarnClient和ResourceManager进行交互
YarnClient yarnClient = YarnClient.createYarnClient();
yarnClient.init(hadoopConf);
// yarnClient需要启动之后才能用
yarnClient.start();
// 这是我们在yarn上创建出来的应用
YarnClientApplication application = yarnClient.createApplication();
ApplicationSubmissionContext applicationSubmissionContext = application.getApplicationSubmissionContext();
GetNewApplicationResponse newApplicationResponse = application.getNewApplicationResponse();
// 设置application对象运行上下文
/**
* 设置上限文对象都需要设置哪些东西呢
*{{{
* // 一般情况这个id是不用设置的这个ID会根据集群的时间戳和排序的id自动生成
* setApplicationId(ApplicationId applicationId);
* // 这是任务的名称,这个需要设置
* setApplicationName(String applicationName);
* // 设置任务指定所在的队列,默认的队列default
* setQueue(String queue);
* // 任务优先级设置
* setPriority(Priority priority);
* // 设置applicationMaster运行container环境,也就是任务的master,最为关键
* setAMContainerSpec(ContainerLaunchContext amContainer);
* // 设置UnmmanageAM,默认值是false,am默认是启动在节点上的container,如果设置成true,再配合其他设置可将这个am启动在指定的环境下方便调试
* setUnmanagedAM(boolean value);
* // 完成任务之后是否销毁令牌
* setCancelTokensWhenComplete(boolean cancel);
* // 最多重试多少次
* setMaxAppAttempts(int maxAppAttempts);
* // 设置资源,这里的资源指的是计算机资源包括cpu和内存等的资源
* setResource(Resource resource);
* // 设置任务类型
* setApplicationType(String applicationType);
* // 在应用重试的时候这个container容器是否可以正常访问
* setKeepContainersAcrossApplicationAttempts(Boolean boolean);
* // 为应用程序设置标签
* setApplicationTags(Set tags);
* //
* setNodeLabelExpression(String nodeLabelExpression);
* //
* setAMContainerResourceRequest(ResourceRequest request);
* setAttemptFailuresValidityInterval()
* setLogAggregationContext()
* setReservationID(ReservationId reservationID);
*}}}
*/
// 这个如果不设置的话默认是N/A,也就是空的意思
applicationSubmissionContext.setApplicationName(appName);
// 设置任务优先级,数字越高优先级越高,默认是-1
applicationSubmissionContext.setPriority(Priority.newInstance(10));
// TODO 添加本地资源
Map<String, LocalResource> localResources = new HashMap<>(1 << 4);
FileSystem fs = FileSystem.get(hadoopConf);
String appMasterJarPath = "yarn-application-demo-1.0-SNAPSHOT.jar";
String appMasterJar = "D:\\Users\\Bigdata\\learning\\source\\yarn-application-demo\\target\\yarn-application-demo-1.0-SNAPSHOT.jar" ;
ApplicationId appId = applicationSubmissionContext.getApplicationId();
addToLocalResources(fs,appMasterJar,appMasterJarPath,appId.toString(),localResources,null);
// TODO 添加运行环境
Map<String, String> env = new HashMap<>(1 << 4);
// 任务的运行依赖jar包的准备
StringBuilder classPathEnv = new StringBuilder(ApplicationConstants.Environment.CLASSPATH.$$())
.append(ApplicationConstants.CLASS_PATH_SEPARATOR).append("./*");
for (String c : hadoopConf.getStrings(
YarnConfiguration.YARN_APPLICATION_CLASSPATH,
YarnConfiguration.DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH)) {
classPathEnv.append(ApplicationConstants.CLASS_PATH_SEPARATOR);
classPathEnv.append(c.trim());
}
classPathEnv.append(ApplicationConstants.CLASS_PATH_SEPARATOR).append(
"./log4j.properties");
// add the runtime classpath needed for tests to work
if (hadoopConf.getBoolean(YarnConfiguration.IS_MINI_YARN_CLUSTER, false)) {
classPathEnv.append(':');
classPathEnv.append(System.getProperty("java.class.path"));
}
env.put("CLASSPATH", classPathEnv.toString());
// TODO 添加命令列表
List<String> commands = new ArrayList<>(1 << 4);
// 1. 需要将path下面的jar包上传至hdfs,然后其他节点从hdfs上下载下来
commands.add(ApplicationConstants.Environment.JAVA_HOME.$$() + "/bin/java"+ " -Xmx200m -Xms200m -Xmn20m " + appMasterClass);
ContainerLaunchContext amContainer = ContainerLaunchContext.newInstance(
localResources, env, commands, null, null, null);
// 准备amContainer的运行环境
applicationSubmissionContext.setAMContainerSpec(amContainer);
// 设置UnmmanageAM,默认值是false,am默认是启动在节点上的container,如果设置成true,再配合其他设置可将这个am启动在指定的环境下方便调试
// applicationSubmissionContext.setUnmanagedAM(false);
// 任务完成时令牌是否销毁,默认值是true
applicationSubmissionContext.setCancelTokensWhenComplete(true);
// 任务失败后最大重试次数,
// applicationSubmissionContext.setMaxAppAttempts();
// 对资源进行设置,正常是从用户输入的参数中解析出来设置进入
int memory = 1024;
int vCores = 2;
applicationSubmissionContext.setResource(Resource.newInstance(memory, vCores));
// 设置任务类型
applicationSubmissionContext.setApplicationType("my-yarn-application");
// 默认是false
applicationSubmissionContext.setKeepContainersAcrossApplicationAttempts(false);
// 为应用程序设置标签
Set<String> tags = new HashSet<>(1 << 2);
tags.add("tag1");
tags.add("tag2");
applicationSubmissionContext.setApplicationTags(tags);
// 设置节点标签
// applicationSubmissionContext.setNodeLabelExpression();
// 设置applicationMaster的container运行资源请求
// String hostName = "127.0.0.1";
// int numContainers = 1;
// ResourceRequest amRequest = ResourceRequest.newInstance(Priority.newInstance(10), hostName, Resource.newInstance(memory, vCores), numContainers);
// applicationSubmissionContext.setAMContainerResourceRequest(amRequest);
// 应用失败重试时间间隔
applicationSubmissionContext.setAttemptFailuresValidityInterval(30 * 1000L);
// 日志聚合上下文
// applicationSubmissionContext.setLogAggregationContext();
// TODO 检查提交申请的资源上限,避免程序资源过载造成系统宕机
// 最后提交开始正式运行设置好的任务
yarnClient.submitApplication(applicationSubmissionContext);
}
private void addToLocalResources(FileSystem fs, String fileSrcPath,
String fileDstPath, String appId, Map<String, LocalResource> localResources,
String resources) throws IOException {
String suffix =
appName + "/" + appId + "/" + fileDstPath;
Path dst =
new Path(fs.getHomeDirectory(), suffix);
if (fileSrcPath == null) {
FSDataOutputStream ostream = null;
try {
ostream = FileSystem
.create(fs, dst, new FsPermission((short) 0710));
ostream.writeUTF(resources);
} finally {
IOUtils.closeQuietly(ostream);
}
} else {
fs.copyFromLocalFile(new Path(fileSrcPath), dst);
}
FileStatus scFileStatus = fs.getFileStatus(dst);
LocalResource scRsrc =
LocalResource.newInstance(
ConverterUtils.getYarnUrlFromURI(dst.toUri()),
LocalResourceType.FILE, LocalResourceVisibility.APPLICATION,
scFileStatus.getLen(), scFileStatus.getModificationTime());
localResources.put(fileDstPath, scRsrc);
}
}
public class AppMaster {
private static Logger LOG = LoggerFactory.getLogger(AppClient.class);
private AMRMClientAsync amRmClient ;
private Configuration hadoopConf ;
private int appMasterRpcPort = -1 ;
private String appMasterTrackingUrl ;
/**
* 使用脚本运行appMaster之后主要运行main方法里面的内容
*
* @param args 执行参数
*/
public static void main(String[] args) {
try{
AppMaster master = new AppMaster();
master.run();
}catch (Exception e){
e.printStackTrace();
}
}
public void run(){
try{
// 提交任务需要注册本身到ResourceManager上
AMRMClientAsync.CallbackHandler allocListener = new RMCallBackHandler();
amRmClient = AMRMClientAsync.createAMRMClientAsync(1000,allocListener);
hadoopConf = new Configuration();
amRmClient.init(hadoopConf);
amRmClient.start();
String hostName = NetUtils.getHostname();
amRmClient.registerApplicationMaster(hostName,appMasterRpcPort,appMasterTrackingUrl);
}catch (Exception e){
e.printStackTrace();
}
try {
System.out.println("hello yarn");
} catch (Exception e) {
e.printStackTrace();
}
try{
amRmClient.unregisterApplicationMaster(FinalApplicationStatus.SUCCEEDED,"任务运行成功!!!",appMasterTrackingUrl);
}catch (Exception e){
e.printStackTrace();
}
}
private class RMCallBackHandler implements AMRMClientAsync.CallbackHandler {
/**
* 当container完成之后调用的函数
* @param statuses container状态
*/
@Override
public void onContainersCompleted(List<ContainerStatus> statuses) {
}
/**
* 当container分配完成之后
* @param containers 分配好的container
*/
@Override
public void onContainersAllocated(List<Container> containers) {
}
/**
* 关闭请求回调
*/
@Override
public void onShutdownRequest() {
}
/**
* 节点更新
* @param updatedNodes 更新的节点
*/
@Override
public void onNodesUpdated(List<NodeReport> updatedNodes) {
}
/**
* 获取进程
* @return
*/
@Override
public float getProgress() {
return 0;
}
/**
* 发生错误之后
* @param e 异常信息
*/
@Override
public void onError(Throwable e) {
}
}
}
熟悉yarn应用程序的基本工作流程就能很好的理解yarn应用的构成,这样我们就能很好的利用这款优秀的资源管理框架,他还能监控我们的任务运行状态等其他功能。