Flink下载地址:https://flink.apache.org/downloads.html
因目前Flink尚未集成hadoop2.9版本,因此选择2.7稳定版进行安装(兼容)
以下操作请在集群的所有节点进行
解压重命名
$ tar -zxvf flink-1.7.1-bin-hadoop27-scala_2.11.tgz /opt/core
$ mv flink-1.7.1 flink
添加环境变量
vi /opt/conf/hsotname_env
#FLINK export FLINK_HOME=/opt/core/flink
使配置文件生效
source /opt/conf/ben_env
Flink配置文件——masters
请根据集群主节点hostname进行配置
hadoop001:8082
Flink配置文件——slaves
请根据集群各节点hostname进行配置
hadoop001
hadoop002
hadoop003
Flink配置文件——flink-conf.yaml
vi conf/flink-conf.yaml
基础配置
参数 | 值 | 说明 |
---|---|---|
jobmanager.rpc.address | hadoop001 | jobmanager所在节点 |
jobmanager.rpc.port | 6123 | jobManager端口,默认为6123 |
jobmanager.heap.size | 2048m | jobmanager可用内存 |
taskmanager.heap.size | 4096m | 每个TaskManager可用内存,根据集群情况指定 |
taskmanager.numberOfTaskSlots | 3 | 每个taskmanager的并行度(5以内) |
parallelism.default | 6 | 启动应用的默认并行度(该应用所使用总的CPU数) |
rest.port | 8082 | Flink web UI默认端口与spark的端口8081冲突,更改为8082 |
history server配置
参数 | 值 | 说明 |
---|---|---|
jobmanager.archive.fs.dir | hdfs://hsotname001/var/log/hadoop-flink | 因为配置了HA,所以hdfs nameservices指定为hsotname001 |
historyserver.web.address | hadoop001 | historyserver web UI地址(需要在本地hosts文件中指定该映射关系) |
historyserver.web.port | 18082 | historyserver web UI端口 |
historyserver.archive.fs.dir | hdfs://hsotname001/var/log/hadoop-flink | 值与“jobmanager.archive.fs.dir”保持一致 |
historyserver.archive.fs.refresh-interval | 10000 | history server页面默认刷新时长 |
添加jar包依赖
cd $FLINK_HOME/lib
添加如下依赖jar包(打包下载地址:https://download.csdn.net/download/lb812913059/10932952):
flink-hadoop-compatibility_2.11-1.7.1.jar
javax.ws.rs-api-2.0.1.jar
jersey-common-2.27.jar
jersey-core-1.9.jar `
若不在lib目录中添加以上jar包,则会在运行flink on yarn时发生如下异常信息:
提示找不到 jersey 类,请检查如下jar包依赖是否正确添加
18/08/25 17:29:28 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/08/25 17:29:28 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
...
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
Start Flink Cluster
[wuhuan@hadoop001~]$ sh $FLINK_HOME/bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host master.
Starting taskexecutor daemon on host slave.
Starting taskexecutor daemon on host slave1.
[wuhuan@hadoop001~]$ jps
4153 StandaloneSessionClusterEntrypoint
3863 TaskManagerRunner
4207 Jps
[wuhuan@hadoop002~]$ jps
6109 TaskManagerRunner
7421 Jps
添加JobManager或TaskManager实例到集群
[wuhuan@hadoop001~]$ jobmanager.sh start
[wuhuan@hadoop001~]$ taskmanager.sh start
备注:flink1.6开始取消了start-local.sh命令
Flink相关命令:https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/deployment/cluster_setup.html
Flink Web UI
查看Flink web UI 界面:https://hadoop001:8082
查看Flink history job web UI 界面:https://hadoop001:18082
flink程序运行结束后可以在completed job中查看到历史job信息:
同时flink的jar可以通过web页面上传:
Flink提交方式
flink同样支持两种提交方式,默认不指定就是客户端方式
如果需要使用集群方式提交的话。可以在提交作业的命令行中指定-d或者–detached 进行进群模式提交。
-d,–detached If present, runs the job indetached mode(分离模式)
客户端提交:
FLINK_HOME/bin/flink run -c com.daxin.batch.App flinkwordcount.jar
客户端会多出来一个CliFrontend进程,就是驱动进程。
集群模式提交:
FLINK_HOME/bin/flink run -d -c com.daxin.batch.App flinkwordcount.jar
程序提交完毕退出客户端,不在打印作业进度等信息!
Flink on yarn
请首先确保Hadoop HDFS、YARN集群模式的正确运行,测试flink on yarn是否可以正常运行:
cd $FLINK_HOME
提交flink任务到yarn
$ ./bin/flink run -m yarn-cluster -yn 2 ./examples/batch/WordCount.jar
Flink on yarn命令:https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/deployment/yarn_setup.html