java提交flink任务到yarn及其系列操作小记

小帅我找了许久相关博客资料，苦恼于没有一个完整的系列文案，经过小半个月的盲人摸象，总结了些通过apache官方api将flink任务提交到yarn以及standalone上运行，停止任务触发checkpoin并生成对应的savepoint，实现任务续跑等等一系列操作，并记录其中碰到的问题以及对应的解决方案等等。希望这篇小记能给大家带来作用，也希望大家能多多提些宝贵意见。

1、通过java api提交flink任务到yarn

首先一共三种提交模式，我们在这里选择的是application模式提交（至于为什么选择这种模式，给大家推荐一篇博客：Flink 部署模式，session 、pre job、aplication三种主要模式_xuye0606的博客-CSDN博客）。

选定模式后，开始撸代码，废话不多说，上demo：

/**

* 提交flink任务demo

*

* @return

*/

public String submitDemo() {

System.setProperty("HADOOP_USER_NAME", "root");

System.out.println("==================== Thread.currentThread().getContextClassLoader().getResource:" + Thread.currentThread().getContextClassLoader().getResource(""));

String configurationDirectory = Thread.currentThread().getContextClassLoader().getResource("conf").getPath();

System.out.println("====================configurationDirectory:" + configurationDirectory);

//你自己的依赖包地址

String flinkLibs = "hdfs://nnCluster/data/flink/libs";

//你需要提交的flink任务包

String userJarPath = "hdfs://nnCluster/data/flink/jars/flink-sql-submit-sdk.jar";

//将flink提到yarn上所需的依赖包记住要放到libs目录下

String flinkDistJar = "hdfs://nnCluster/data/flink/libs/flink-yarn_2.11-1.12.0.jar";

YarnClient yarnClient = YarnClient.createYarnClient();

String yarnHaEnabled = "true";

YarnConfiguration yarnConfiguration = new YarnConfiguration();

yarnConfiguration.set("yarn.resourcemanager.ha.enabled", yarnHaEnabled); //是否激活高可用

yarnConfiguration.set("yarn.resourcemanager.cluster-id", "yarn-cluster"); // 集群ID，使用自己服务器上配置的集群ID

if (Boolean.valueOf(yarnHaEnabled)) {

//高可用需要再配置RMID

yarnConfiguration.set("yarn.resourcemanager.address.rm1", "10.255.157.235:8050");

yarnConfiguration.set("yarn.resourcemanager.address.rm2", "10.255.157.233:8050");

yarnConfiguration.set("yarn.resourcemanager.ha.rm-ids", "rm1,rm2");

}

// ---------- hdfs 相关配置 -------------//

String hdfsNameService = "nnCluster";

yarnConfiguration.set("dfs.nameservices", hdfsNameService);

yarnConfiguration.set("fs.defaultFS", hdfsNameService);

// hadoop高可用节点配置

String hdfsHaEnabled = "true"; //默认高可用

if (Boolean.valueOf(hdfsHaEnabled)) {

// nn1, nn2 都是name节点

yarnConfiguration.set("dfs.namenode.rpc-address." + hdfsNameService + ".nn1", "10.255.157.230:8020");

yarnConfiguration.set("dfs.namenode.rpc-address." + hdfsNameService + ".nn2", "10.255.157.234:8020");

yarnConfiguration.set("dfs.ha.namenodes." + hdfsNameService, "nn1,nn2");

}

yarnConfiguration.set("dfs.client.failover.proxy.provider." + hdfsNameService, "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");

yarnClient.init(yarnConfiguration);

yarnClient.start();

YarnClusterInformationRetriever clusterInformationRetriever = YarnClientYarnClusterInformationRetriever.create(yarnClient);

//获取flink的配置

Configuration flinkConfiguration = GlobalConfiguration.loadConfiguration(configurationDirectory);

flinkConfiguration.set(CheckpointingOptions.INCREMENTAL_CHECKPOINTS, true);

flinkConfiguration.set(PipelineOptions.JARS, Collections.singletonList(userJarPath));

YarnLogConfigUtil.setLogConfigFileInConfig(flinkConfiguration, configurationDirectory);

flinkConfiguration.set(YarnConfigOptions.PROVIDED_LIB_DIRS, Collections.singletonList(flinkLibs));

flinkConfiguration.set(YarnConfigOptions.FLINK_DIST_JAR, flinkDistJar);

//设置为application模式

flinkConfiguration.set(DeploymentOptions.TARGET, YarnDeploymentTarget.APPLICATION.getName());

//yarn application name

flinkConfiguration.set(YarnConfigOptions.APPLICATION_NAME, "你自己的任务名");

//设置配置，可以设置很多

flinkConfiguration.set(PipelineOptionsInternal.PIPELINE_FIXED_JOB_ID, "32位字母加数字组合flink的jobid");

flinkConfiguration.set(JobManagerOptions.TOTAL_PROCESS_MEMORY, MemorySize.parse("1024", MEGA_BYTES));

flinkConfiguration.set(TaskManagerOptions.TOTAL_PROCESS_MEMORY, MemorySize.parse("1024", MEGA_BYTES));

flinkConfiguration.set(ConfigOptions.key("env.java.opts").stringType().noDefaultValue(), "-Dflink_job_name=ss_test");

String hadoopconf = Thread.currentThread().getContextClassLoader().getResource("flink").getPath();

flinkConfiguration.set(ConfigOptions.key("fs.hdfs.hadoopconf").stringType().noDefaultValue(), hadoopconf);

flinkConfiguration.set(CheckpointingOptions.STATE_BACKEND, "filesystem");

flinkConfiguration.set(CheckpointingOptions.MAX_RETAINED_CHECKPOINTS, 20);

flinkConfiguration.set(CheckpointingOptions.CHECKPOINTS_DIRECTORY, "出发checkpoin生成记录文件的存放地址");

ClusterSpecification clusterSpecification = new ClusterSpecification.ClusterSpecificationBuilder().createClusterSpecification();

// 设置用户jar的参数和主类

ApplicationConfiguration appConfig = new ApplicationConfiguration(new String[]{submitVO.toJSONString()}, "com.cestc.bigdataclient.sqlsubmit.SqlSubmit");

YarnClusterDescriptor yarnClusterDescriptor = new YarnClusterDescriptor(flinkConfiguration, yarnConfiguration, yarnClient, clusterInformationRetriever, true);

ClusterClientProvider clusterClientProvider;

try {

clusterClientProvider = yarnClusterDescriptor.deployApplicationCluster(clusterSpecification, appConfig);

} catch (ClusterDeploymentException e) {

e.printStackTrace();

throw new RuntimeException(e);

}

ClusterClient clusterClient = clusterClientProvider.getClusterClient();

ApplicationId applicationId = clusterClient.getClusterId();

System.out.println(applicationId);

return null;

}

提交任务中碰到的问题及其解决方法

1、无法灵活的去加载配置文件

由于"fs.hdfs.hadoopconf"参数key在org.apache.flink.configuration包中未定义，需要自己定义指定

flinkConfiguration.set(ConfigOptions.key("fs.hdfs.hadoopconf").stringType().noDefaultValue(),"core-site.xml和hdfs-site.xml所在位置");

2、上传到hdfs上的文件当前任务没有使用权限

hadoop fs -chown -R root:hdfs 你的路径

3、任务运行后如果存在部分源码包二开需要做相应的调整

我这边对flink的jdbc连接这块做了二开，故需要将对应jar包整合

4、任务提交成功后运行失败

java.lang.NoSuchMethodException: org.apache.hadoop.yarn.api.records.Resource.setResourceInformation(java.lang.String, org.apache.hadoop.yarn.api.records.ResourceInformation)

这是因为引用flink提供的原生的

flink-shaded-hadoop-2-uber-2.8.3-10.0.jar包中对应的类中没有setResourceInformation方法

如下：

样例

此处需要将flink-shaded-hadoop-2-uber-2.8.3-10.0.jar包替换成hadoop自己的hadoop-yarn-api-3.1.1.jar包来获取该方法：

代码样例

2、通过java api停止yarn上的flink任务并将savepoint存到对应的指定目录下

关于停止操作里面或多或少有一些小细节需要记录,先上代码：

//前面配置对应相关参数略，可参考提交任务

YarnClusterInformationRetriever clusterInformationRetriever = YarnClientYarnClusterInformationRetriever.create(yarnClient);

//获取flink的配置

Configuration flinkConfiguration = new Configuration();

flinkConfiguration.set(YarnConfigOptions.APPLICATION_ID, "当前需要停止的任务在yarn上的applicationId");

YarnClusterDescriptor yarnClusterDescriptor = new YarnClusterDescriptor(flinkConfiguration, yarnConfiguration, yarnClient, clusterInformationRetriever, true);

ClusterClientProvider clusterClientProvider;

YarnClusterClientFactory clusterClientFactory = new YarnClusterClientFactory();

ApplicationId applicationId = clusterClientFactory.getClusterId(flinkConfiguration);

clusterClientProvider = yarnClusterDescriptor.retrieve(applicationId);

ClusterClient clusterClient = clusterClientProvider.getClusterClient();

Collection jobStatusMessages = clusterClient.listJobs().get();

//注意此处的jobId是yarn分配的flink运行空间中flink的jobId

final JobID jobIds;

jobIds = JobID.fromHexString("需要停止的jobId");

CompletableFuture completableFuture = clusterClient.stopWithSavepoint(jobIds, true, "之前设置的savepoint目录");

String savepoint = completableFuture.get();

System.out.println(savepoint);

停止yarn上flink任务相关小问题

1、在停止yarn上的相关flink任务之前需要将对应的checlpoint机制打开，否则后续无法续跑。

2、注意JOBID可以使用轮询方法去获取停止，但是个人建议还是记录自己的JobId去定向停止

3、续跑停止任务

yarn上的flink任务的续跑不同于在standalone上运行的flink任务续跑，只要将对应的savepoint获取即可

//其他同提交任务

flinkConfiguration.set(CheckpointingOptions.CHECKPOINTS_DIRECTORY, "你的savepoint的全路径");

续跑yarn上flink任务相关小问题：

1、在yarn上续跑一个flink任务的实质是利用停止或者当前任务失败去出发chekpoint所生成的savepoint源文件来重新提交一个新的flink任务

2、在yarn上停止后得任务不要再用YarnClusterDescriptor得retrieve方法去获取对应任务连接来操作JobGraph去使用triggerSavepoint来进行续跑（此方法适用于standalone模式）