Ambari2.7.3-HDP3.0.1-Yarn开启CPU调度和隔离

问题:

安装好Yarn组件,Yarn组件对CPU的调度和隔离(CPU Scheduling and Isolation)默认是关闭的。
Ambari Web UI --> YARN --> CONFIGS --> SETTINGS --> CPU
Ambari2.7.3-HDP3.0.1-Yarn开启CPU调度和隔离_第1张图片

开启:

  1. 打开CPU调度和隔离按钮
    Ambari2.7.3-HDP3.0.1-Yarn开启CPU调度和隔离_第2张图片
  2. 重启相关组件
  3. 在启动Yarn组件时,NodeManager报错如下:

Unexpected: Cannot create yarn cgroup Subsystem:cpu Mount points:/proc/mounts User:yarn Path:/sys/fs/cgroup/cpu/yarn

2020-01-01 15:49:35,073 INFO  recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:checkVersion(1662)) - Loaded NM state version info 1.2
2020-01-01 15:49:35,254 INFO  resources.ResourceHandlerModule (ResourceHandlerModule.java:initNetworkResourceHandler(182)) - Using traffic control bandwidth handler
2020-01-01 15:49:35,266 INFO  resources.CGroupsHandlerImpl (CGroupsHandlerImpl.java:initializePreMountedCGroupController(410)) - Initializing mounted controller cpu at /sys/fs/cgroup/cpu/yarn
2020-01-01 15:49:35,266 INFO  resources.CGroupsHandlerImpl (CGroupsHandlerImpl.java:initializePreMountedCGroupController(420)) - Yarn control group does not exist. Creating /sys/fs/cgroup/cpu/yarn
2020-01-01 15:49:35,267 ERROR nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:init(323)) - Failed to bootstrap configured resource subsystems! 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: Unexpected: Cannot create yarn cgroup Subsystem:cpu Mount points:/proc/mounts User:yarn Path:/sys/fs/cgroup/cpu/yarn 
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializePreMountedCGroupController(CGroupsHandlerImpl.java:425)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializeCGroupController(CGroupsHandlerImpl.java:377)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:98)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:87)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.bootstrap(ResourceHandlerChain.java:58)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:320)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:391)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)
2020-01-01 15:49:35,269 INFO  service.AbstractService (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state INITED
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:393)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)
Caused by: java.io.IOException: Failed to bootstrap configured resource subsystems!
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:324)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:391)
	... 3 more
2020-01-01 15:49:35,273 ERROR nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(936)) - Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:393)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)
Caused by: java.io.IOException: Failed to bootstrap configured resource subsystems!
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:324)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:391)
	... 3 more

解决:

  1. 在集群节点执行以下py程序
# -*- coding:utf-8 -*-

import os


def open_cpu_scheduling_and_isolation():
    """开启cpu调度和隔离前的配置,需要在集群各个节点执行"""
    try:
        os.system("mkdir -p /sys/fs/cgroup/cpu/yarn")
        os.system("chown -R yarn:yarn /sys/fs/cgroup/cpu/yarn")
        os.system("mkdir -p /sys/fs/cgroup/memory/yarn")
        os.system("chown -R yarn:yarn /sys/fs/cgroup/memory/yarn")
        os.system("mkdir -p /sys/fs/cgroup/blkio/yarn")
        os.system("chown -R yarn:yarn /sys/fs/cgroup/blkio/yarn")
        os.system("mkdir -p /sys/fs/cgroup/net_cls/yarn")
        os.system("chown -R yarn:yarn /sys/fs/cgroup/net_cls/yarn")
        os.system("mkdir -p /sys/fs/cgroup/devices/yarn")
        os.system("chown -R yarn:yarn /sys/fs/cgroup/devices/yarn")
    except:
        print("Error!!")


if __name__ == '__main__':
    open_cpu_scheduling_and_isolation()

  1. 重启Yarn组件,NodeManager启动成功

你可能感兴趣的:(Ambari,Yarn)