集群中部分nodemanager节点无法启动问题原因

集群启用kerberos+ssl后发现部分NM启动不起来,
CM启动日志提示:
++ printf '! -name %s ' cloudera-config.sh hue.sh impala.sh sqoop.sh supervisor.conf config.zip proc.json '*.log' yarn.keytab '*jceks'
+ find /run/cloudera-scm-agent/process/1393-yarn-NODEMANAGER -type f '!' -path '/run/cloudera-scm-agent/process/1393-yarn-NODEMANAGER/logs/*' '!' -name cloudera-config.sh '!' -name hue.sh '!' -name impala.sh '!' -name sqoop.sh '!' -name supervisor.conf '!' -name config.zip '!' -name proc.json '!' -name '*.log' '!' -name yarn.keytab '!' -name '*jceks' -exec perl -pi -e 's#{{CMF_CONF_DIR}}#/run/cloudera-scm-agent/process/1393-yarn-NODEMANAGER#g' '{}' ';'
Can't open /run/cloudera-scm-agent/process/1393-yarn-NODEMANAGER/container-executor.cfg: Permission denied.
NM启动日志提示:
INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:269)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:562)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:609)
Caused by: java.io.IOException: Linux container executor not configured properly (error=24)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:199)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:267)
        ... 3 more
Caused by: ExitCodeException exitCode=24: Invalid conf file provided : /etc/hadoop/conf.cloudera.yarn/container-executor.cfg

这个是由于 /opt/cloudera/parcels/CDH-5.13.2-1.cdh5.13.2.p0.3/lib/hadoop-yarn/bin/container-executor二进制文件权限不对所致
正确文件权限为:
---Sr-s--- 1 root yarn 53712 Feb  2  2018 /opt/cloudera/parcels/CDH-5.13.2-1.cdh5.13.2.p0.3/lib/hadoop-yarn/bin/container-executor
修改文件权限:
chown -R root:yarn /opt/cloudera/parcels/CDH-5.13.2-1.cdh5.13.2.p0.3/lib/hadoop-yarn/bin/container-executor
chmod 6050 /opt/cloudera/parcels/CDH-5.13.2-1.cdh5.13.2.p0.3/lib/hadoop-yarn/bin/container-executor
重启后NM正常

你可能感兴趣的:(集群中部分nodemanager节点无法启动问题原因)