Ambari2.75集成flink-1.14.4

        ambari集成flink详情可参考:Ambari 2.7.5安装Flink1.13.2_不饿同学的博客-CSDN博客_ambari安装flink

这里说一下安装过程遇到的问题:

1、安装时报错:Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-710.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-710.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']

解决办法:

cd /var/lib/ambari-server/resources/scripts

python configs.py -u admin -p admin -n zsCluster -l hadoop01 -t 8080 -a set -c cluster-env -k ignore_groupsusers_create -v ture

2022-03-25 09:38:14,125 INFO ### Performing "set":
2022-03-25 09:38:14,125 INFO ### new property - "ignore_groupsusers_create":"ture"
2022-03-25 09:38:14,156 INFO ### on (Site:cluster-env, Tag:d88402a7-13d7-4e4d-8d8d-e8007e1319e8)
2022-03-25 09:38:14,170 INFO ### PUTting json into: doSet_version1648172294170011.json
2022-03-25 09:38:14,403 INFO ### NEW Site:cluster-env, Tag:version1648172294170011

其中,zsCluster替换为自己的集群名称;hadoop01为ambari-server所在机器的主机名。

2、安装时报错:KeyError: 'getpwnam(): name not found: flink',...resource_management.core.exceptions.Fail: User 'flink' doesn't exist

Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 51, in _ensure_metadata
    _user_entity = pwd.getpwnam(user)
KeyError: 'getpwnam(): name not found: flink'

The above exception was the cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.1/services/FLINK/package/scripts/flink.py", line 172, in 
    Master().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.1/services/FLINK/package/scripts/flink.py", line 30, in install
    group=params.flink_group
  File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 125, in __new__
    cls(names_list.pop(0), env, provider, **kwargs)
  File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 199, in action_create
    recursion_follow_links=self.resource.recursion_follow_links, safemode_folders=self.resource.safemode_folders)
  File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 53, in _ensure_metadata
    raise Fail("User '{0}' doesn't exist".format(user))
resource_management.core.exceptions.Fail: User 'flink' doesn't exist

解决办法:在需要安装flink的服务器上执行如下命令:

 useradd  -d /home/flink  -g flink flink

 3、启动时报错:JobManager memory configuration failed: Sum of configured JVM Metaspace (256.000mb (268435456 bytes)) and JVM Overhead (192.000mb (201326592 bytes)) exceed configured Total Process Memory (256.000mb (268435456 bytes))

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.configuration.IllegalConfigurationException: JobManager memory configuration failed: Sum of configured JVM Metaspace (256.000mb (268435456 bytes)) and JVM Overhead (192.000mb (201326592 bytes)) exceed configured Total Process Memory (256.000mb (268435456 bytes)).
	at org.apache.flink.runtime.jobmanager.JobManagerProcessUtils.processSpecFromConfigWithNewOptionToInterpretLegacyHeap(JobManagerProcessUtils.java:78)
	at org.apache.flink.client.deployment.AbstractContainerizedClusterClientFactory.getClusterSpecification(AbstractContainerizedClusterClientFactory.java:43)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:602)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$4(FlinkYarnSessionCli.java:860)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
	at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:860)
Caused by: org.apache.flink.configuration.IllegalConfigurationException: Sum of configured JVM Metaspace (256.000mb (268435456 bytes)) and JVM Overhead (192.000mb (201326592 bytes)) exceed configured Total Process Memory (256.000mb (268435456 bytes)).
	at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.deriveJvmMetaspaceAndOverheadWithTotalProcessMemory(ProcessMemoryUtils.java:157)
	at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.deriveProcessSpecWithTotalProcessMemory(ProcessMemoryUtils.java:114)
	at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.memoryProcessSpecFromConfig(ProcessMemoryUtils.java:84)
	at org.apache.flink.runtime.jobmanager.JobManagerProcessUtils.processSpecFromConfig(JobManagerProcessUtils.java:83)
	at org.apache.flink.runtime.jobmanager.JobManagerProcessUtils.processSpecFromConfigWithNewOptionToInterpretLegacyHeap(JobManagerProcessUtils.java:73)
	... 8 more

排查过程:通过官网查看文档:

第一步:搜索错误Ambari2.75集成flink-1.14.4_第1张图片

 第二步:点击“配置信息”

Ambari2.75集成flink-1.14.4_第2张图片

文档说,大多数情况下,只需设置taskmanager.memory.process.size 和 taskmanager.memory.flink.size,然后通过taskmanager.memory.managed.fraction调整jvm堆内存和管理内存的比例。所以就很郁闷,人家都说了通过调整这两个参数就行了,那为啥就启动不了呢?于是乎,赶紧找这两个参数,但是发现没有上面提到的参数,只有这两个:

Ambari2.75集成flink-1.14.4_第3张图片

 Ambari2.75集成flink-1.14.4_第4张图片

官网显示,这两个参数1.11版本开始已经弃用了了了!强调一下我用的flink是最新版(1.14.4)

那就赶紧删除了这两个参数 ,然后再启动试试!然后就报了下面的错误:

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.configuration.IllegalConfigurationException: JobManager memory configuration failed: Either required fine-grained memory (jobmanager.memory.heap.size), or Total Flink Memory size (Key: 'jobmanager.memory.flink.size' , default: null (fallback keys: [])), or Total Process Memory size (Key: 'jobmanager.memory.process.size' , default: null (fallback keys: [])) need to be configured explicitly.
	at org.apache.flink.runtime.jobmanager.JobManagerProcessUtils.processSpecFromConfigWithNewOptionToInterpretLegacyHeap(JobManagerProcessUtils.java:78)
	at org.apache.flink.client.deployment.AbstractContainerizedClusterClientFactory.getClusterSpecification(AbstractContainerizedClusterClientFactory.java:43)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:602)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$4(FlinkYarnSessionCli.java:860)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
	at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:860)
Caused by: org.apache.flink.configuration.IllegalConfigurationException: Either required fine-grained memory (jobmanager.memory.heap.size), or Total Flink Memory size (Key: 'jobmanager.memory.flink.size' , default: null (fallback keys: [])), or Total Process Memory size (Key: 'jobmanager.memory.process.size' , default: null (fallback keys: [])) need to be configured explicitly.
	at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.failBecauseRequiredOptionsNotConfigured(ProcessMemoryUtils.java:129)
	at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.memoryProcessSpecFromConfig(ProcessMemoryUtils.java:86)
	at org.apache.flink.runtime.jobmanager.JobManagerProcessUtils.processSpecFromConfig(JobManagerProcessUtils.java:83)
	at org.apache.flink.runtime.jobmanager.JobManagerProcessUtils.processSpecFromConfigWithNewOptionToInterpretLegacyHeap(JobManagerProcessUtils.java:73)
	... 8 more

关键错误信息:JobManager memory configuration failed: Either required fine-grained memory (jobmanager.memory.heap.size), or Total Flink Memory size (Key: 'jobmanager.memory.flink.size' , default: null (fallback keys: [])), or Total Process Memory size (Key: 'jobmanager.memory.process.size' , default: null (fallback keys: [])) need to be configured explicitly.

翻译之后:JobManager内存配置失败:需要显式配置所需的细粒度内存(jobmanager.memory.heap.size),或总Flink内存大小(Key:' job manager.Memory.Flink.size',默认值:null(回退键:[])),或总进程内存大小(Key:' job manager.Memory.Process.size ',默认值:null(回退键:[]))。

那这就很明朗了,就是要指定jobmanager.memory.heap.size、job manager.Memory.Flink.size、job manager.Memory.Process.size这三个的任意一个;那这三个分别代表什么意思呢?

去官网查阅这三个参数的过程意外发现了:

Ambari2.75集成flink-1.14.4_第5张图片

这更加证明我们的大方向是对的,以下是这三个参数的解释: 

jobmanager.memory.flink.size:JobManager的总Flink内存大小。这包括作业管理器消耗的所有内存,JVM元空间和JVM开销除外。它由JVM堆内存和堆外内存组成.
jobmanager.memory.heap.size:JobManager的JVM堆内存大小。建议的最小JVM堆大小是128.000mb.
jobmanager.memory.process.size:JobManager的总进程内存大小。这包括JobManager JVM进程消耗的所有内存,包括总Flink内存、JVM元空间和JVM开销。在容器化设置中,这应该设置为容器存储器。

感觉还是很迷糊,得结合这张图看:

Ambari2.75集成flink-1.14.4_第6张图片

 从图中可以看出,jobmanager.memory.flink.size是最大的那个,设置好它之后,flink会进一步自行划分,所以那就设置这个参数:jobmanager.memory.flink.size: 1024m;然后启动,又又报了下面的错误:

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.configuration.IllegalConfigurationException: TaskManager memory configuration failed: Either required fine-grained memory (taskmanager.memory.task.heap.size and taskmanager.memory.managed.size), or Total Flink Memory size (Key: 'taskmanager.memory.flink.size' , default: null (fallback keys: [])), or Total Process Memory size (Key: 'taskmanager.memory.process.size' , default: null (fallback keys: [])) need to be configured explicitly.
	at org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:163)
	at org.apache.flink.client.deployment.AbstractContainerizedClusterClientFactory.getClusterSpecification(AbstractContainerizedClusterClientFactory.java:49)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:602)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$4(FlinkYarnSessionCli.java:860)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
	at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:860)
Caused by: org.apache.flink.configuration.IllegalConfigurationException: Either required fine-grained memory (taskmanager.memory.task.heap.size and taskmanager.memory.managed.size), or Total Flink Memory size (Key: 'taskmanager.memory.flink.size' , default: null (fallback keys: [])), or Total Process Memory size (Key: 'taskmanager.memory.process.size' , default: null (fallback keys: [])) need to be configured explicitly.
	at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.failBecauseRequiredOptionsNotConfigured(ProcessMemoryUtils.java:129)
	at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.memoryProcessSpecFromConfig(ProcessMemoryUtils.java:86)
	at org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:160)
	... 8 more

这次轮到TaskManager了,它的错误跟JobManager上面的错误类似,就是那三个参数至少得设置一个,那我们同样设置:taskmanager.memory.flink.size: 1024m。

这次终于起来了!!大功告成!

这个地方启动不了,网上有别的处理方法,要么不生效,要么比较繁琐。授人以鱼不如授人以渔,这才是解决问题的正确打开方式,所以还得是官网呀!

你可能感兴趣的:(ambari,flink,大数据,flink,ambari)