小帅我找了许久相关博客资料,苦恼于没有一个完整的系列文案,经过小半个月的盲人摸象,总结了些通过apache官方api将flink任务提交到yarn以及standalone上运行,停止任务触发checkpoin并生成对应的savepoint,实现任务续跑等等一系列操作,并记录其中碰到的问题以及对应的解决方案等等。希望这篇小记能给大家带来作用,也希望大家能多多提些宝贵意见。
1、通过java api提交flink任务到yarn
首先一共三种提交模式,我们在这里选择的是application模式提交(至于为什么选择这种模式,给大家推荐一篇博客:Flink 部署模式,session 、pre job、aplication三种主要模式_xuye0606的博客-CSDN博客)。
选定模式后,开始撸代码,废话不多说,上demo:
/**
* 提交flink任务demo
*
* @return
*/
public String submitDemo() {
System.setProperty("HADOOP_USER_NAME", "root");
System.out.println("==================== Thread.currentThread().getContextClassLoader().getResource:" + Thread.currentThread().getContextClassLoader().getResource(""));
String configurationDirectory = Thread.currentThread().getContextClassLoader().getResource("conf").getPath();
System.out.println("====================configurationDirectory:" + configurationDirectory);
//你自己的依赖包地址
String flinkLibs = "hdfs://nnCluster/data/flink/libs";
//你需要提交的flink任务包
String userJarPath = "hdfs://nnCluster/data/flink/jars/flink-sql-submit-sdk.jar";
//将flink提到yarn上所需的依赖包记住要放到libs目录下
String flinkDistJar = "hdfs://nnCluster/data/flink/libs/flink-yarn_2.11-1.12.0.jar";
YarnClient yarnClient = YarnClient.createYarnClient();
String yarnHaEnabled = "true";
YarnConfiguration yarnConfiguration = new YarnConfiguration();
yarnConfiguration.set("yarn.resourcemanager.ha.enabled", yarnHaEnabled); //是否激活高可用
yarnConfiguration.set("yarn.resourcemanager.cluster-id", "yarn-cluster"); // 集群ID,使用自己服务器上配置的集群ID
if (Boolean.valueOf(yarnHaEnabled)) {
//高可用需要再配置RMID
yarnConfiguration.set("yarn.resourcemanager.address.rm1", "10.255.157.235:8050");
yarnConfiguration.set("yarn.resourcemanager.address.rm2", "10.255.157.233:8050");
yarnConfiguration.set("yarn.resourcemanager.ha.rm-ids", "rm1,rm2");
}
// ---------- hdfs 相关配置 -------------//
String hdfsNameService = "nnCluster";
yarnConfiguration.set("dfs.nameservices", hdfsNameService);
yarnConfiguration.set("fs.defaultFS", hdfsNameService);
// hadoop高可用节点配置
String hdfsHaEnabled = "true"; //默认高可用
if (Boolean.valueOf(hdfsHaEnabled)) {
// nn1, nn2 都是name节点
yarnConfiguration.set("dfs.namenode.rpc-address." + hdfsNameService + ".nn1", "10.255.157.230:8020");
yarnConfiguration.set("dfs.namenode.rpc-address." + hdfsNameService + ".nn2", "10.255.157.234:8020");
yarnConfiguration.set("dfs.ha.namenodes." + hdfsNameService, "nn1,nn2");
}
yarnConfiguration.set("dfs.client.failover.proxy.provider." + hdfsNameService, "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");
yarnClient.init(yarnConfiguration);
yarnClient.start();
YarnClusterInformationRetriever clusterInformationRetriever = YarnClientYarnClusterInformationRetriever.create(yarnClient);
//获取flink的配置
Configuration flinkConfiguration = GlobalConfiguration.loadConfiguration(configurationDirectory);
flinkConfiguration.set(CheckpointingOptions.INCREMENTAL_CHECKPOINTS, true);
flinkConfiguration.set(PipelineOptions.JARS, Collections.singletonList(userJarPath));
YarnLogConfigUtil.setLogConfigFileInConfig(flinkConfiguration, configurationDirectory);
flinkConfiguration.set(YarnConfigOptions.PROVIDED_LIB_DIRS, Collections.singletonList(flinkLibs));
flinkConfiguration.set(YarnConfigOptions.FLINK_DIST_JAR, flinkDistJar);
//设置为application模式
flinkConfiguration.set(DeploymentOptions.TARGET, YarnDeploymentTarget.APPLICATION.getName());
//yarn application name
flinkConfiguration.set(YarnConfigOptions.APPLICATION_NAME, "你自己的任务名");
//设置配置,可以设置很多
flinkConfiguration.set(PipelineOptionsInternal.PIPELINE_FIXED_JOB_ID, "32位字母加数字组合flink的jobid");
flinkConfiguration.set(JobManagerOptions.TOTAL_PROCESS_MEMORY, MemorySize.parse("1024", MEGA_BYTES));
flinkConfiguration.set(TaskManagerOptions.TOTAL_PROCESS_MEMORY, MemorySize.parse("1024", MEGA_BYTES));
flinkConfiguration.set(ConfigOptions.key("env.java.opts").stringType().noDefaultValue(), "-Dflink_job_name=ss_test");
String hadoopconf = Thread.currentThread().getContextClassLoader().getResource("flink").getPath();
flinkConfiguration.set(ConfigOptions.key("fs.hdfs.hadoopconf").stringType().noDefaultValue(), hadoopconf);
flinkConfiguration.set(CheckpointingOptions.STATE_BACKEND, "filesystem");
flinkConfiguration.set(CheckpointingOptions.MAX_RETAINED_CHECKPOINTS, 20);
flinkConfiguration.set(CheckpointingOptions.CHECKPOINTS_DIRECTORY, "出发checkpoin生成记录文件的存放地址");
ClusterSpecification clusterSpecification = new ClusterSpecification.ClusterSpecificationBuilder().createClusterSpecification();
// 设置用户jar的参数和主类
ApplicationConfiguration appConfig = new ApplicationConfiguration(new String[]{submitVO.toJSONString()}, "com.cestc.bigdataclient.sqlsubmit.SqlSubmit");
YarnClusterDescriptor yarnClusterDescriptor = new YarnClusterDescriptor(flinkConfiguration, yarnConfiguration, yarnClient, clusterInformationRetriever, true);
ClusterClientProvider
clusterClientProvider; try {
clusterClientProvider = yarnClusterDescriptor.deployApplicationCluster(clusterSpecification, appConfig);
} catch (ClusterDeploymentException e) {
e.printStackTrace();
throw new RuntimeException(e);
}
ClusterClient
clusterClient = clusterClientProvider.getClusterClient(); ApplicationId applicationId = clusterClient.getClusterId();
System.out.println(applicationId);
return null;
}
提交任务中碰到的问题及其解决方法
1、无法灵活的去加载配置文件
由于"fs.hdfs.hadoopconf"参数key在org.apache.flink.configuration包中未定义,需要自己定义指定
flinkConfiguration.set(ConfigOptions.key("fs.hdfs.hadoopconf").stringType().noDefaultValue(),"core-site.xml和hdfs-site.xml所在位置");
2、上传到hdfs上的文件当前任务没有使用权限
hadoop fs -chown -R root:hdfs 你的路径
3、任务运行后如果存在部分源码包二开需要做相应的调整
我这边对flink的jdbc连接这块做了二开,故需要将对应jar包整合
4、任务提交成功后运行失败
java.lang.NoSuchMethodException: org.apache.hadoop.yarn.api.records.Resource.setResourceInformation(java.lang.String, org.apache.hadoop.yarn.api.records.ResourceInformation)
这是因为引用flink提供的原生的
flink-shaded-hadoop-2-uber-2.8.3-10.0.jar包中对应的类中没有setResourceInformation方法
如下:
此处需要将flink-shaded-hadoop-2-uber-2.8.3-10.0.jar包替换成hadoop自己的hadoop-yarn-api-3.1.1.jar包来获取该方法:
2、通过java api停止yarn上的flink任务并将savepoint存到对应的指定目录下
关于停止操作里面或多或少有一些小细节需要记录,先上代码:
//前面配置对应相关参数略,可参考提交任务
YarnClusterInformationRetriever clusterInformationRetriever = YarnClientYarnClusterInformationRetriever.create(yarnClient);
//获取flink的配置
Configuration flinkConfiguration = new Configuration();
flinkConfiguration.set(YarnConfigOptions.APPLICATION_ID, "当前需要停止的任务在yarn上的applicationId");
YarnClusterDescriptor yarnClusterDescriptor = new YarnClusterDescriptor(flinkConfiguration, yarnConfiguration, yarnClient, clusterInformationRetriever, true);
ClusterClientProvider
clusterClientProvider; YarnClusterClientFactory clusterClientFactory = new YarnClusterClientFactory();
ApplicationId applicationId = clusterClientFactory.getClusterId(flinkConfiguration);
clusterClientProvider = yarnClusterDescriptor.retrieve(applicationId);
ClusterClient
clusterClient = clusterClientProvider.getClusterClient(); Collection
jobStatusMessages = clusterClient.listJobs().get(); //注意此处的jobId是yarn分配的flink运行空间中flink的jobId
final JobID jobIds;
jobIds = JobID.fromHexString("需要停止的jobId");
CompletableFuture
completableFuture = clusterClient.stopWithSavepoint(jobIds, true, "之前设置的savepoint目录"); String savepoint = completableFuture.get();
System.out.println(savepoint);
停止yarn上flink任务相关小问题
1、 在停止yarn上的相关flink任务之前需要将对应的checlpoint机制打开,否则后续无法续跑。
2、注意JOBID可以使用轮询方法去获取停止,但是个人建议还是记录自己的JobId去定向停止
3、续跑停止任务
yarn上的flink任务的续跑不同于在standalone上运行的flink任务续跑,只要将对应的savepoint获取即可
//其他同提交任务
flinkConfiguration.set(CheckpointingOptions.CHECKPOINTS_DIRECTORY, "你的savepoint的全路径");
续跑yarn上flink任务相关小问题:
1、在yarn上续跑一个flink任务的实质是利用停止或者当前任务失败去出发chekpoint所生成的savepoint源文件来重新提交一个新的flink任务
2、在yarn上停止后得任务不要再用YarnClusterDescriptor得retrieve方法去获取对应任务连接来操作JobGraph去使用triggerSavepoint来进行续跑(此方法适用于standalone模式)