Unhealthy Nodes导致计算能力下降

某天集群出现Unhealthy Nodes导致集群计算能力下降的问题,检查发现该节点比较多磁盘块达到90%的瓶颈了,yarn中有相关的配置,如下:


yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage 90 The maximum percentage of disk space utilization allowed after which a disk is marked as bad. Values can range from 0.0 to 100.0. If the value is greater than or equal to 100, the nodemanager will check for full disk. This applies to yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs.
yarn.nodemanager.disk-health-checker.min-healthy-disks 0.25 The minimum fraction of number of disks to be healthy for the nodemanager to launch new containers. This correspond to both yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs. i.e. If there are less number of healthy local-dirs (or log-dirs) available, then new containers will not be launched on this node.

只要不足25%的磁盘少于90%磁盘使用量,就会不再分配container,防止中间结果和日志没有空间,该节点就Unhealthy了;



你可能感兴趣的:(yarn)