ResourceManager(RM)负责追踪集群的资源和调度应用作业(比如Mapreduce作业)。在Hadoop 2.4之前,ResourceManager是YARN集群的单点。高可用特性就是用来解决单点问题的,通过加入一个Active/Standby的ResourceManager对来解决。
架构图
Setting Up YARN High Availability_第1张图片
环境如下:
Setting Up YARN High Availability_第2张图片
1、编辑yarn-site.xml
加入如下内容:

yarn.resourcemanager.ha.enabled=true
yarn.resourcemanager.cluster-id=cluster1
yarn.resourcemanager.ha.rm-ids=rm1,rm2
yarn.resourcemanager.hostname.rm1=hdp01
yarn.resourcemanager.hostname.rm2=hdp04
yarn.resourcemanager.zk-address=hdp01:2181,hdp02:2181,hdp03:2181,hdp04:2181
yarn.resourcemanager.ha.automatic-failover.enabled=true
yarn.resourcemanager.ha.automatic-failover.embedded=true
yarn.resourcemanager.ha.automatic-failover.zk-base-path=/yarn-leader-election
yarn.resourcemanager.recovery.enabled=true
yarn.resourcemanager.address.rm1=hdp01:8032
yarn.resourcemanager.address.rm2=hdp04:8032
yarn.resourcemanager.scheduler.address.rm1=hdp01:8030
yarn.resourcemanager.scheduler.address.rm2=hdp04:8030
yarn.resourcemanager.resource-tracker.address.rm1=hdp01:8031
yarn.resourcemanager.resource-tracker.address.rm2=hdp04:8031
yarn.resourcemanager.admin.address.rm1=hdp01:8033
yarn.resourcemanager.admin.address.rm2=hdp04:8033
yarn.resourcemanager.webapp.address.rm1=hdp01:8088
yarn.resourcemanager.webapp.address.rm2=hdp04:8088

2、同步yarn-site.xml☞其他节点

[hadoop@hdp01 hadoop]$ for i in {2..4};do scp yarn-site.xml hdp0$i:/u01/hadoop/etc/hadoop;done

3、启动resourcemanager服务

[hadoop@hdp01 hadoop]$ start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /u01/hadoop/logs/yarn-hadoop-resourcemanager-hdp01.out
hdp03.thinkjoy.tt: starting nodemanager, logging to /u01/hadoop/logs/yarn-hadoop-nodemanager-hdp03.out
hdp02.thinkjoy.tt: starting nodemanager, logging to /u01/hadoop/logs/yarn-hadoop-nodemanager-hdp02.out
hdp04.thinkjoy.tt: starting nodemanager, logging to /u01/hadoop/logs/yarn-hadoop-nodemanager-hdp04.out
[hadoop@hdp01 hadoop]$ jps
4592 RunJar
1136 ResourceManager
2690 NameNode
1827 QuorumPeerMain
4087 HMaster
5031 SqoopJettyServer
3528 JobHistoryServer
3001 SecondaryNameNode
1433 Jps
4686 RunJar

4、Standby节点启动服务

[hadoop@hdp04 ~]$ yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /u01/hadoop/logs/yarn-hadoop-resourcemanager-hdp04.out
[hadoop@hdp04 ~]$ jps
2066 DataNode
1592 QuorumPeerMain
2971 RunJar
2604 HRegionServer
17437 ResourceManager
3071 RunJar
17487 Jps

5、Zookeeper验证

[zk: localhost:2181(CONNECTED) 4] ls /
[hivesrv2, zookeeper, yarn-leader-election, oozie, hive_zookeeper_namespace, services, hbase]
[zk: localhost:2181(CONNECTED) 5] ls /yarn-leader-election
[cluster1]
[zk: localhost:2181(CONNECTED) 6] ls /yarn-leader-election/cluster1
[ActiveBreadCrumb, ActiveStandbyElectorLock]
[zk: localhost:2181(CONNECTED) 7] 

Setting Up YARN High Availability_第3张图片
Setting Up YARN High Availability_第4张图片
6、命令行验证

[hadoop@hdp01 hadoop]$ yarn rmadmin -getServiceState rm1
active
[hadoop@hdp01 hadoop]$ yarn rmadmin -getServiceState rm2
standby

参考文献:
1、ResourceManager High Availability