ResourceManager挂了。查看到active的ResourceManager日志有如下内容:
java.lang.OutOfMemoryError: Java heap space
故障的原因是RM的堆内存空间size不够了。
查看到活跃节点RM的最大堆内存大小仍然是默认的1000Mb
[hadoop@my-hdp-01 hadoop]$ ps aux | grep -i resourcemanager | grep -v grep | grep --color Xmx
hadoop 9075 0.0 0.4 2973936 596152 ? Sl Oct07 1:02 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-7.b10.el7.x86_64/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/home/hadoop/hadoop/logs -Dyarn.log.dir=/home/hadoop/hadoop/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-my-hdp-01.log -Dyarn.log.file=yarn-hadoop-resourcemanager-my-hdp-01.log -Dyarn.home.dir= -Dyarn.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/home/hadoop/hadoop/logs -Dyarn.log.dir=/home/hadoop/hadoop/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-my-hdp-01.log -Dyarn.log.file=yarn-hadoop-resourcemanager-my-hdp-01.log -Dyarn.home.dir=/home/hadoop/hadoop -Dhadoop.home.dir=/home/hadoop/hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop/lib/native -classpath /home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/share/hadoop/common/lib/*:/home/hadoop/hadoop/share/hadoop/common/*:/home/hadoop/hadoop/share/hadoop/hdfs:/home/hadoop/hadoop/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop/share/hadoop/hdfs/*:/home/hadoop/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/hadoop/share/hadoop/yarn/*:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop/share/hadoop/mapreduce/*:/home/hadoop/hadoop/share/hadoop/tools/lib/*::/home/hadoop/hadoop/share/hadoop/yarn/*:/home/hadoop/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/hadoop/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
查看到待命节点RM最大堆内存大小也是默认的1000Mb
[hadoop@my-hdp-01 hadoop]$ ssh my-hdp-02 ps aux | grep -i resourcemanager | grep -v grep | grep --color Xmx
hadoop 4919 7.8 0.9 3349280 1248308 ? Sl Oct07 120:31 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-7.b10.el7.x86_64/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/home/hadoop/hadoop/logs -Dyarn.log.dir=/home/hadoop/hadoop/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-my-hdp-02.log -Dyarn.log.file=yarn-hadoop-resourcemanager-my-hdp-02.log -Dyarn.home.dir= -Dyarn.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/home/hadoop/hadoop/logs -Dyarn.log.dir=/home/hadoop/hadoop/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-my-hdp-02.log -Dyarn.log.file=yarn-hadoop-resourcemanager-my-hdp-02.log -Dyarn.home.dir=/home/hadoop/hadoop -Dhadoop.home.dir=/home/hadoop/hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop/lib/native -classpath /home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/share/hadoop/common/lib/*:/home/hadoop/hadoop/share/hadoop/common/*:/home/hadoop/hadoop/share/hadoop/hdfs:/home/hadoop/hadoop/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop/share/hadoop/hdfs/*:/home/hadoop/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/hadoop/share/hadoop/yarn/*:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop/share/hadoop/mapreduce/*:/home/hadoop/hadoop/share/hadoop/tools/lib/*::/home/hadoop/hadoop/share/hadoop/yarn/*:/home/hadoop/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/hadoop/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
[hadoop@my-hdp-01 hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[hadoop@my-hdp-01 hadoop]$ ll | grep --color yarn-env
-rw-rw-r-- 1 hadoop hadoop 2191 Jun 2 2017 yarn-env.cmd
-rw-rw-r-- 1 hadoop hadoop 4567 Jun 2 2017 yarn-env.sh
那么需要修改resourcemanager 的最大heap size,修改之前cp一份作为备份
[hadoop@my-hdp-01 hadoop]$ cp yarn-env.sh yarn-env.sh_20181008
[hadoop@my-hdp-01 hadoop]$ vim yarn-env.sh
# Resource Manager specific parameters
# Specify the max Heapsize for the ResourceManager using a numerical value
# in the scale of MB. For example, to specify an jvm option of -Xmx1000m, set
# the value to 1000.
# This value will be overridden by an Xmx setting specified in either YARN_OPTS
# and/or YARN_RESOURCEMANAGER_OPTS.
# If not specified, the default value will be picked from either YARN_HEAPMAX
# or JAVA_HEAP_MAX with YARN_HEAPMAX as the preferred option of the two.
#export YARN_RESOURCEMANAGER_HEAPSIZE=1000
# Specify the max Heapsize for the timeline server using a numerical value
# in the scale of MB. For example, to specify an jvm option of -Xmx1000m, set
# the value to 1000.
# This value will be overridden by an Xmx setting specified in either YARN_OPTS
# and/or YARN_TIMELINESERVER_OPTS.
# If not specified, the default value will be picked from either YARN_HEAPMAX
# or JAVA_HEAP_MAX with YARN_HEAPMAX as the preferred option of the two.
#export YARN_TIMELINESERVER_HEAPSIZE=1000
# Specify the JVM options to be used when starting the ResourceManager.
# These options will be appended to the options specified as YARN_OPTS
# and therefore may override any similar flags set in YARN_OPTS
#export YARN_RESOURCEMANAGER_OPTS=
# Node Manager specific parameters
# Specify the max Heapsize for the NodeManager using a numerical value
# in the scale of MB. For example, to specify an jvm option of -Xmx1000m, set
# the value to 1000.
# This value will be overridden by an Xmx setting specified in either YARN_OPTS
# and/or YARN_NODEMANAGER_OPTS.
# If not specified, the default value will be picked from either YARN_HEAPMAX
# or JAVA_HEAP_MAX with YARN_HEAPMAX as the preferred option of the two.
#export YARN_NODEMANAGER_HEAPSIZE=1000
修改以后,yarn-env.sh中的YARN_RESOURCEMANAGER_HEAPSIZE 变成:
# Resource Manager specific parameters
# Specify the max Heapsize for the ResourceManager using a numerical value
# in the scale of MB. For example, to specify an jvm option of -Xmx1000m, set
# the value to 1000.
# This value will be overridden by an Xmx setting specified in either YARN_OPTS
# and/or YARN_RESOURCEMANAGER_OPTS.
# If not specified, the default value will be picked from either YARN_HEAPMAX
# or JAVA_HEAP_MAX with YARN_HEAPMAX as the preferred option of the two.
#export YARN_RESOURCEMANAGER_HEAPSIZE=1000
export YARN_RESOURCEMANAGER_HEAPSIZE=2048
再把该配置文件yarn-env.sh同步到集群其他所有节点( kiss-all是我自己封装的批量执行工具, sync-to-others是批量rsync工具, 读者如果有需要可以私聊向我要)
[hadoop@my-hdp-01 hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[hadoop@my-hdp-01 hadoop]$ kiss-all cp /home/hadoop/hadoop/etc/hadoop/yarn-env.sh /home/hadoop/hadoop/etc/hadoop/yarn-env.sh_20181008
[hadoop@my-hdp-01 hadoop]$ sync-to-others yarn-env.sh
重启active状态的RM
[hadoop@my-hdp-01 sbin]$ pwd
/home/hadoop/hadoop/sbin
[hadoop@my-hdp-01 sbin]$ ll yarn-daemon.sh
-rwxrwxr-x 1 hadoop hadoop 4295 Jun 2 2017 yarn-daemon.sh
[hadoop@my-hdp-01 sbin]$ yarn rmadmin -getServiceState rm1
standby
[hadoop@my-hdp-01 sbin]$ yarn rmadmin -getServiceState rm2
active
[hadoop@my-hdp-01 sbin]$ ssh my-hdp-02 "/home/hadoop/hadoop/sbin/yarn-daemon.sh stop resourcemanager && /home/hadoop/hadoop/sbin/yarn-daemon.sh start resourcemanager"
stopping resourcemanager
resourcemanager did not stop gracefully after 5 seconds: killing with kill -9
starting resourcemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-resourcemanager-my-hdp-02.out
验证active节点的RM堆内存大小:配置项生效!
[hadoop@my-hdp-01 sbin]$ ssh my-hdp-02 ps aux | grep -i resourcemanager | grep -v grep | grep --color Xmx
hadoop 21326 200 0.9 4359224 1204520 ? Sl 20:02 0:42 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-7.b10.el7.x86_64/bin/java -Dproc_resourcemanager -Xmx2048m -Dhadoop.log.dir=/home/hadoop/hadoop/logs -Dyarn.log.dir=/home/hadoop/hadoop/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-my-hdp-02.log -Dyarn.log.file=yarn-hadoop-resourcemanager-my-hdp-02.log -Dyarn.home.dir= -Dyarn.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/home/hadoop/hadoop/logs -Dyarn.log.dir=/home/hadoop/hadoop/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-my-hdp-02.log -Dyarn.log.file=yarn-hadoop-resourcemanager-my-hdp-02.log -Dyarn.home.dir=/home/hadoop/hadoop -Dhadoop.home.dir=/home/hadoop/hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop/lib/native -classpath /home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/share/hadoop/common/lib/*:/home/hadoop/hadoop/share/hadoop/common/*:/home/hadoop/hadoop/share/hadoop/hdfs:/home/hadoop/hadoop/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop/share/hadoop/hdfs/*:/home/hadoop/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/hadoop/share/hadoop/yarn/*:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop/share/hadoop/mapreduce/*:/home/hadoop/hadoop/share/hadoop/yarn/*:/home/hadoop/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/hadoop/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
重启standby状态的RM
[hadoop@my-hdp-01 sbin]$ yarn-daemon.sh stop resourcemanager && yarn-daemon.sh start resourcemanager
stopping resourcemanager
starting resourcemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-resourcemanager-my-hdp.out
验证standby节点的RM堆内存大小:配置项生效
[hadoop@my-hdp-01 sbin]$ ps aux | grep -i resourcemanager | grep -v grep | grep --color Xmx
hadoop 16158 336 0.4 4086456 545152 pts/32 Sl 20:03 0:16 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-7.b10.el7.x86_64/bin/java -Dproc_resourcemanager -Xmx2048m -Dhadoop.log.dir=/home/hadoop/hadoop/logs -Dyarn.log.dir=/home/hadoop/hadoop/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-my-hdp.log -Dyarn.log.file=yarn-hadoop-resourcemanager-my-hdp.log -Dyarn.home.dir= -Dyarn.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/home/hadoop/hadoop/logs -Dyarn.log.dir=/home/hadoop/hadoop/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-my-hdp.log -Dyarn.log.file=yarn-hadoop-resourcemanager-my-hdp.log -Dyarn.home.dir=/home/hadoop/hadoop -Dhadoop.home.dir=/home/hadoop/hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop/lib/native -classpath /home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/share/hadoop/common/lib/*:/home/hadoop/hadoop/share/hadoop/common/*:/home/hadoop/hadoop/share/hadoop/hdfs:/home/hadoop/hadoop/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop/share/hadoop/hdfs/*:/home/hadoop/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/hadoop/share/hadoop/yarn/*:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop/share/hadoop/mapreduce/*:/home/hadoop/hadoop/share/hadoop/tools/lib/*::/home/hadoop/hadoop/share/hadoop/yarn/*:/home/hadoop/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/hadoop/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
[hadoop@my-hdp-01 sbin]$ yarn rmadmin -getServiceState rm1
standby
[hadoop@my-hdp-01 sbin]$ yarn rmadmin -getServiceState rm2
active
验证一下RM节点
[hadoop@my-hdp-01 ~]$ date
Mon Oct 8 20:12:18 CST 2018
[hadoop@my-hdp-01 sbin]$ ps aux | grep -v grep | grep --color -i resourcemanager
hadoop 16158 12.9 0.4 4088504 580180 pts/32 Sl 20:03 0:17 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-7.b10.el7.x86_64/bin/java -Dproc_resourcemanager -Xmx2048m -Dhadoop.log.dir=/home/hadoop/hadoop/logs -Dyarn.log.dir=/home/hadoop/hadoop/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-my-hdp.log -Dyarn.log.file=yarn-hadoop-resourcemanager-my-hdp.log -Dyarn.home.dir= -Dyarn.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/home/hadoop/hadoop/logs -Dyarn.log.dir=/home/hadoop/hadoop/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-my-hdp.log -Dyarn.log.file=yarn-hadoop-resourcemanager-my-hdp.log -Dyarn.home.dir=/home/hadoop/hadoop -Dhadoop.home.dir=/home/hadoop/hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop/lib/native -classpath /home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/share/hadoop/common/lib/*:/home/hadoop/hadoop/share/hadoop/common/*:/home/hadoop/hadoop/share/hadoop/hdfs:/home/hadoop/hadoop/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop/share/hadoop/hdfs/*:/home/hadoop/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/hadoop/share/hadoop/yarn/*:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop/share/hadoop/mapreduce/*:/home/hadoop/hadoop/share/hadoop/tools/lib/*::/home/hadoop/hadoop/share/hadoop/yarn/*:/home/hadoop/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/hadoop/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
[hadoop@my-hdp-01 sbin]$ ssh my-hdp-02 ps aux | grep -v grep | grep --color -i resourcemanager
hadoop 21326 30.6 0.9 4381524 1256880 ? Sl 20:02 1:00 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-7.b10.el7.x86_64/bin/java -Dproc_resourcemanager -Xmx2048m -Dhadoop.log.dir=/home/hadoop/hadoop/logs -Dyarn.log.dir=/home/hadoop/hadoop/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-my-hdp-02.log -Dyarn.log.file=yarn-hadoop-resourcemanager-my-hdp-02.log -Dyarn.home.dir= -Dyarn.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/home/hadoop/hadoop/logs -Dyarn.log.dir=/home/hadoop/hadoop/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-my-hdp-02.log -Dyarn.log.file=yarn-hadoop-resourcemanager-my-hdp-02.log -Dyarn.home.dir=/home/hadoop/hadoop -Dhadoop.home.dir=/home/hadoop/hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop/lib/native -classpath /home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/share/hadoop/common/lib/*:/home/hadoop/hadoop/share/hadoop/common/*:/home/hadoop/hadoop/share/hadoop/hdfs:/home/hadoop/hadoop/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop/share/hadoop/hdfs/*:/home/hadoop/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/hadoop/share/hadoop/yarn/*:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop/share/hadoop/mapreduce/*:/home/hadoop/hadoop/share/hadoop/yarn/*:/home/hadoop/hadoop/share/hadoop/yarn/lib/*:/home/hadoop/hadoop/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager