Hadoop本身并没有提供HA的功能,需要借助ZooKeeper来实现Hadoop的HA功能。Hadoop的HA搭建过程是所有Hadoop生态圈组件中最复杂的,本节就来详细说明如何使用ZooKeeper来搭建Hadoop的HA环境。
环境说明:
bigdata131 192.168.126.131
bigdata132 192.168.126.132
bigdata133 192.168.126.133
bigdata134 192.168.126.134
安装介质下载:
zookeeper-3.4.10.tar.gz 提取码:nvv2
hadoop-2.7.3.tar.gz 提取码:r8xo
1.使用ZooKeeper实现Hadoop的HA的原理
通过上图的分析,使用ZooKeeper集群搭建一个最小规模的Hadoop HA集群至少需要4台机器:
Zookeeper集群:
bigdata131
bigdata132
bigdata133
Hadoop集群:
bigdata131 NameNode1 ResourceManager1 Journalnode
bigdata132 NameNode2 ResourceManager2 Journalnode
bigdata133 DataNode1
bigdata134 DataNode2
2.搭建ZooKeeper集群
参考文章《ZooKeeper从入门到精通9:ZooKeeper环境搭建之集群模式》。
3.搭建Hadoop的HA集群环境
以下步骤在bigdata131节点上执行:
3.1上传Hadoop安装包
使用winscp工具将Hadoop安装包上传到bigdata131节点的/root/tools/目录中,该目录是事先创建的。
# ls /root/tools/
hadoop-2.7.3.tar.gz
3.2解压Hadoop安装包
进入/root/tools/目录,将hadoop安装包解压到/root/trainings/目录中,该目录也是事先创建的。
# cd /root/tools/
# tar -zxvf hadoop-2.7.3.tar.gz -C /root/trainings/
3.3配置Hadoop环境变量(4台主机上都做一遍)
# cd /root/trainings/hadoop-2.7.3/
# pwd
/root/trainings/hadoop-2.7.3
# vim /root/.bash_profile
在文件末尾追加如下内容:
HADOOP_HOME=/root/trainings/hadoop-2.7.3
export HADOOP_HOME
PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export PATH
按Esc:wq保存退出,使用source命令使配置文件立即生效:
# source /root/.bash_profile
3.4配置Hadoop HA模式的参数
进入Hadoop配置文件目录:
# cd /root/trainings/hadoop-2.7.3/etc/hadoop/
(1)配置hadoop-env.sh文件:
# echo $JAVA_HOME
/root/trainings/jdk1.8.0_144
# vim hadoop-env.sh
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/root/trainings/jdk1.8.0_144
(2)配置hdfs-site.xml文件(配置nameservice中有几个namenode):
# vim hdfs-site.xml
dfs.nameservices
ns1
dfs.ha.namenodes.ns1
nn1,nn2
dfs.namenode.rpc-address.ns1.nn1
bigdata131:9000
dfs.namenode.http-address.ns1.nn1
bigdata131:50070
dfs.namenode.rpc-address.ns1.nn2
bigdata132:9000
dfs.namenode.http-address.ns1.nn2
bigdata132:50070
dfs.namenode.shared.edits.dir
qjournal://bigdata131:8485;bigdata132:8485;/ns1
dfs.journalnode.edits.dir
/root/trainings/hadoop-2.7.3/journal
dfs.ha.automatic-failover.enabled
true
dfs.client.failover.proxy.provider.ns1
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.fencing.methods
sshfence
shell(/bin/true)
dfs.ha.fencing.ssh.private-key-files
/root/.ssh/id_rsa
dfs.ha.fencing.ssh.connect-timeout
30000
(3)配置core-site.xml文件:
# mkdir /root/trainings/hadoop-2.7.3/tmp
# vim core-site.xml
fs.defaultFS
hdfs://ns1
hadoop.tmp.dir
/root/trainings/hadoop-2.7.3/tmp
ha.zookeeper.quorum
bigdata131:2181,bigdata132:2181,bigdata133:2181
(4)配置mapred-site.xml文件:
将模板文件mapred-site.xml.template拷贝一份重命名为mapred-site.xml然后编辑:
# cp mapred-site.xml.template mapred-site.xml
# vim mapred-site.xml
mapreduce.framework.name
yarn
(5)配置yarn-site.xml文件:
# vim yarn-site.xml
yarn.resourcemanager.ha.enabled
true
yarn.resourcemanager.cluster-id
yrc
yarn.resourcemanager.ha.rm-ids
rm1,rm2
yarn.resourcemanager.hostname.rm1
bigdata131
yarn.resourcemanager.hostname.rm2
bigdata132
yarn.resourcemanager.zk-address
bigdata131:2181,bigdata132:2181,bigdata133:2181
yarn.nodemanager.aux-services
mapreduce_shuffle
(6)配置slaves文件:
# vim slaves
bigdata133
bigdata134
3.5将配置好的hadoop拷贝到其他节点
[root@bigdata131 ~]# scp -r /root/trainings/hadoop-2.7.3/ root@bigdata132:/root/trainings/
[root@bigdata131 ~]# scp -r /root/trainings/hadoop-2.7.3/ root@bigdata133:/root/trainings/
[root@bigdata131 ~]# scp -r /root/trainings/hadoop-2.7.3/ root@bigdata134:/root/trainings/
3.6启动Zookeeper集群
[root@bigdata131 ~]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /root/trainings/zookeeper-3.4.10/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@bigdata132 ~]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /root/trainings/zookeeper-3.4.10/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@bigdata133 ~]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /root/trainings/zookeeper-3.4.10/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
3.7在bigdata131和bigdata132上启动journalnode
[root@bigdata131 ~]# hadoop-daemon.sh start journalnode
starting journalnode, logging to /root/trainings/hadoop-2.7.3/logs/hadoop-root-journalnode-bigdata131.out
[root@bigdata132 ~]# hadoop-daemon.sh start journalnode
starting journalnode, logging to /root/trainings/hadoop-2.7.3/logs/hadoop-root-journalnode-bigdata132.out
3.8格式化HDFS和ZooKeeper(在bigdata131上执行)
[root@bigdata131 ~]# hdfs namenode -format
18/12/02 00:08:47 INFO common.Storage: Storage directory /root/trainings/hadoop-2.7.3/tmp/dfs/name has been successfully formatted.
[root@bigdata131 ~]# scp -r /root/trainings/hadoop-2.7.3/tmp root@bigdata132:/root/trainings/hadoop-2.7.3/
# 格式化zookeeper
[root@bigdata131 ~]# hdfs zkfc -formatZK
18/12/02 00:09:59 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/ns1 in ZK.
3.9启动Hadoop HA集群
在bigdata131上启动Hadoop集群:
[root@bigdata131 ~]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [bigdata131 bigdata132]
bigdata131: starting namenode, logging to /root/trainings/hadoop-2.7.3/logs/hadoop-root-namenode-bigdata131.out
bigdata132: starting namenode, logging to /root/trainings/hadoop-2.7.3/logs/hadoop-root-namenode-bigdata132.out
bigdata133: starting datanode, logging to /root/trainings/hadoop-2.7.3/logs/hadoop-root-datanode-bigdata133.out
bigdata134: starting datanode, logging to /root/trainings/hadoop-2.7.3/logs/hadoop-root-datanode-bigdata134.out
Starting journal nodes [bigdata131 bigdata132 ]
bigdata131: journalnode running as process 1275. Stop it first.
bigdata132: journalnode running as process 1265. Stop it first.
Starting ZK Failover Controllers on NN hosts [bigdata131 bigdata132]
bigdata131: starting zkfc, logging to /root/trainings/hadoop-2.7.3/logs/hadoop-root-zkfc-bigdata131.out
bigdata132: starting zkfc, logging to /root/trainings/hadoop-2.7.3/logs/hadoop-root-zkfc-bigdata132.out
starting yarn daemons
starting resourcemanager, logging to /root/trainings/hadoop-2.7.3/logs/yarn-root-resourcemanager-bigdata131.out
bigdata133: starting nodemanager, logging to /root/trainings/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdata133.out
bigdata134: starting nodemanager, logging to /root/trainings/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdata134.out
[root@bigdata131 ~]# jps
1232 QuorumPeerMain
1939 ResourceManager
1524 NameNode
2200 Jps
1275 JournalNode
1839 DFSZKFailoverController
[root@bigdata132 ~]# jps
1217 QuorumPeerMain
1265 JournalNode
1346 NameNode
1461 DFSZKFailoverController
1518 Jps
[root@bigdata133 ~]# jps
1365 NodeManager
1213 QuorumPeerMain
1469 Jps
1263 DataNode
[root@bigdata134 ~]# jps
1303 NodeManager
1435 Jps
1228 DataNode
在bigdata132上需要单独启动ResourceManager:
[root@bigdata132 ~]# yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /root/trainings/hadoop-2.7.3/logs/yarn-root-resourcemanager-bigdata132.out
[root@bigdata132 ~]# jps
1217 QuorumPeerMain
1265 JournalNode
1346 NameNode
1603 Jps
1556 ResourceManager
1461 DFSZKFailoverController
4.测试Hadoop的HA集群环境
(1)正常情况下
可以看到:ZooKeeper中ns1当前Active的是bigdata131;通过hadoop网页发现:bigdata131是active状态,bigdata132是standby状态。
(2)杀死bigdata131上的NameNode进程,刷新ZooKeeper和网页观察变化
[root@bigdata131 ~]# jps
1232 QuorumPeerMain
1939 ResourceManager
1524 NameNode
2266 Jps
1275 JournalNode
1839 DFSZKFailoverController
[root@bigdata131 ~]# kill -9 1524
可以看到:ZooKeeper中ns1当前Active的变成了bigdata132;通过hadoop网页发现:bigdata131已经无法访问,bigdata132变成active状态。
因此Hadoop HA能够实现正确的失败迁移功能,可以更加高可用的对外提供Hadoop服务了。
至此,使用ZooKeeper搭建Hadoop的HA环境已经介绍完毕。祝你玩得愉快!