新准备一台server8
[zjy@server4 hadoop]$ rm -fr /tmp/*
[zjy@server5 ~]$ rm -fr /tmp/*
[zjy@server6 ~]$ rm -fr /tmp/*
[zjy@server7 ~]$ rm -fr /tmp/*
[zjy@server8 ~]$ rm -fr /tmp/*
[root@server8 ~]# yum install -y nfs-utils
[root@server8 ~]# systemctl start rpcbind
[root@server8 ~]# useradd zjy
[root@server8 ~]# id zjy
uid=1000(zjy) gid=1000(zjy) groups=1000(zjy)
[root@server8 ~]# showmount -e 172.25.60.4
Export list for 172.25.60.4:
/home/zjy *
[root@server8 ~]# mount 172.25.60.4:/home/zjy /home/zjy
下载zookeeper-3.4.14.tar.gz包:
下载地址:
https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.4.14/zookeeper-3.4.14.tar.gz
[zjy@server4 ~]$ tar zxf zookeeper-3.4.14.tar.gz
[zjy@server4 ~]$ du -sh zookeeper-3.4.14
61M zookeeper-3.4.14
环境配置:server4(active)和server8(standby 备用)做namenode的主从,
server5作为日志节点,server6作为ZooKeeper节点,server7作为datanode节点,NodeManager节点
[zjy@server5 conf]$ pwd
/home/zjy/zookeeper-3.4.14/conf
[zjy@server5 conf]$ cp zoo_sample.cfg zoo.cfg
[zjy@server5 conf]$ vim zoo.cfg
server.1=172.25.60.5:2888:3888
server.2=172.25.60.6:2888:3888
server.3=172.25.60.7:2888:3888
2888是数据同步端口 3888是选举端口
[zjy@server5 conf]$ mkdir /tmp/zookeeper
[zjy@server5 conf]$ echo 1 > /tmp/zookeeper/myid
[zjy@server6 ~]$ mkdir /tmp/zookeeper
[zjy@server6 ~]$ echo 2 > /tmp/zookeeper/myid
[zjy@server7 ~]$ mkdir /tmp/zookeeper
[zjy@server7 ~]$ echo 3 > /tmp/zookeeper/myid
启动zookeeper
[zjy@server5 zookeeper-3.4.14]$ bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/zjy/zookeeper-3.4.14/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[zjy@server6 zookeeper-3.4.14]$ bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/zjy/zookeeper-3.4.14/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[zjy@server7 ~]$ cd zookeeper-3.4.14
[zjy@server7 zookeeper-3.4.14]$ bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/zjy/zookeeper-3.4.14/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
查看zookeeper的状态:server6是leader
[zjy@server5 zookeeper-3.4.14]$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/zjy/zookeeper-3.4.14/bin/../conf/zoo.cfg
Mode: follower
[zjy@server6 zookeeper-3.4.14]$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/zjy/zookeeper-3.4.14/bin/../conf/zoo.cfg
Mode: leader
[zjy@server7 zookeeper-3.4.14]$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/zjy/zookeeper-3.4.14/bin/../conf/zoo.cfg
Mode: follower
免密
[zjy@server4 ~]$ ssh-copy-id server8
[zjy@server4 ~]$ ssh server8
Last login: Sat Jun 6 11:16:13 2020
[zjy@server8 ~]$ logout
Connection to server8 closed.
[zjy@server4 hadoop]$ pwd
/home/zjy/hadoop
[zjy@server4 hadoop]$ vim etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://masters</value> # 指定 hdfs 的 namenode 为 masters (名称可自定义)
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>172.25.60.5:2181,172.25.60.6:2181,172.25.60.7:2181</value> # 指定 zookeeper 集群主机地址
</property>
</configuration>
[zjy@server4 hadoop]$ pwd
/home/zjy/hadoop
[zjy@server4 hadoop]$ vim etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
# 指定 hdfs 的 nameservices 为 masters,和 core-site.xml 文件中的设置保持一
致
<property>
<name>dfs.nameservices</name>
<value>masters</value>
</property>
# masters 下面有两个 namenode 节点,分别是 h1 和 h2 (名称可自定义)
<property>
<name>dfs.ha.namenodes.masters</name>
<value>h1,h2</value>
</property>
# 指定 h1 节点的 rpc 通信地址
<property>
<name>dfs.namenode.rpc-address.masters.h1</name>
<value>172.25.60.4:9000</value>
</property>
# 指定 h1 节点的 http 通信地址
<property>
<name>dfs.namenode.http-address.masters.h1</name>
<value>172.25.60.4:9870</value>
</property>
# 指定 h2 节点的 rpc 通信地址
<property>
<name>dfs.namenode.rpc-address.masters.h2</name>
<value>172.25.60.8:9000</value>
</property>
# 指定 h2 节点的 http 通信地址
<property>
<name>dfs.namenode.http-address.masters.h2</name>
<value>172.25.60.8:9870</value>
</property>
# 指定 NameNode 元数据在 JournalNode 上的存放位置
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://172.25.60.5:8485;172.25.60.6:8485;172.25.60.7:8485/masters</value>
</property>
# 指定 JournalNode 在本地磁盘存放数据的位置
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/tmp/journaldata</value>
</property>
# 开启 NameNode 失败自动切换
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
# 配置失败自动切换实现方式
<property>
<name>dfs.client.failover.proxy.provider.masters</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
# 配置隔离机制方法,每个机制占用一行
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
# 使用 sshfence 隔离机制时需要 ssh 免密码
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/zjy/.ssh/id_rsa</value>
</property>
# 配置 sshfence 隔离机制超时时间
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
[zjy@server5 ~]$ hdfs --daemon start journalnode
[zjy@server5 ~]$ jps
7220 Jps
6779 QuorumPeerMain
7181 JournalNode
journaldata数据节点存在
[zjy@server5 ~]$ ll -d /tmp/journaldata/
drwxrwxr-x 2 zjy zjy 6 Jun 6 15:26 /tmp/journaldata/
[zjy@server6 ~]$ hdfs --daemon start journalnode
[zjy@server6 ~]$ jps
6752 QuorumPeerMain
7142 JournalNode
7181 Jps
[zjy@server6 ~]$ ll -d /tmp/journaldata/
drwxrwxr-x 2 zjy zjy 6 Jun 6 15:30 /tmp/journaldata/
[zjy@server7 ~]$ hdfs --daemon start journalnode
[zjy@server7 ~]$ jps
6308 QuorumPeerMain
6710 JournalNode
6749 Jps
[zjy@server4 hadoop]$ bin/hdfs namenode -format
Namenode 数据默认存放在/tmp,需要把数据拷贝到 h2(server8)
[zjy@server4 tmp]$ ls
hadoop-zjy hadoop-zjy-namenode.pid hsperfdata_zjy
[zjy@server4 tmp]$ scp -r hadoop-zjy server8:/tmp/
VERSION 100% 215 261.2KB/s 00:00
seen_txid 100% 2 1.8KB/s 00:00
fsimage_0000000000000000000.md5 100% 62 55.6KB/s 00:00
fsimage_0000000000000000000 100% 398 441.4KB/s 00:00
格式化zookeeper
[zjy@server4 ~]$ hdfs zkfc -formatZK
启动hdfs集群
[zjy@server4 hadoop]$ sbin/start-dfs.sh
查看各节点状态
[zjy@server4 hadoop]$ jps
13649 DFSZKFailoverController
13286 NameNode
13702 Jps
[zjy@server8 ~]$ jps
14608 Jps
14568 DFSZKFailoverController
14446 NameNode
[zjy@server5 ~]$ jps
14514 Jps
14213 QuorumPeerMain
14284 JournalNode
14396 DataNode
[zjy@server6 ~]$ jps
13969 QuorumPeerMain
14267 Jps
14044 JournalNode
14143 DataNode
[zjy@server7 ~]$ jps
14130 DataNode
13929 QuorumPeerMain
14254 Jps
14031 JournalNode
[zjy@server4 hadoop]$ jps
13649 DFSZKFailoverController
13286 NameNode
14090 Jps
[zjy@server4 hadoop]$ kill 13286
[zjy@server4 hadoop]$ jps
13649 DFSZKFailoverController
14107 Jps
[zjy@server6 zookeeper-3.4.14]$ bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 6] get /hadoop-ha/masters/ActiveBreadCrumb
masters_x0012_h2server8 �F(�>
cZxid = 0x100000007
ctime = Sat Jun 06 17:17:37 CST 2020
mZxid = 0x10000000d
mtime = Sat Jun 06 17:49:32 CST 2020
pZxid = 0x100000007
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 28
numChildren = 0
重起server4
[zjy@server4 hadoop]$ hdfs --daemon start namenode
[zjy@server4 hadoop]$ jps
13649 DFSZKFailoverController
14369 NameNode
14445 Jps
[zjy@server4 hadoop]$ pwd
/home/zjy/hadoop
[zjy@server4 hadoop]$ vim etc/hadoop/yarn-site.xml
# 激活 RM 高可用
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
# 指定 RM 的集群 id
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>RM_CLUSTER</value>
</property>
# 定义 RM 的节点
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
# 指定 RM1 的地址
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>172.25.60.4</value>
</property>
# 指定 RM2 的地址
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>172.25.60.8</value>
</property>
# 激活 RM 自动恢复
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
# 配置 RM 状态信息存储方式,有 MemStore 和 ZKStore
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
# 配置为 zookeeper 存储时,指定 zookeeper 集群的地址
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>172.25.60.5:2181,172.25.60.6:2181,172.25.60.7:2181</value>
</property>
启动yarn:server4和server8上运行ResourceManager,server5、server6和server7上运行NodeManager
[zjy@server4 hadoop]$ sbin/start-yarn.sh
Starting resourcemanagers on [ 172.25.60.4 172.25.60.8]
Starting nodemanagers
[zjy@server4 hadoop]$ jps
13649 DFSZKFailoverController
14369 NameNode
14854 ResourceManager
14998 Jps
[zjy@server5 ~]$ jps
14835 Jps
14213 QuorumPeerMain
14284 JournalNode
14396 DataNode
14733 NodeManager
[zk: localhost:2181(CONNECTED) 7] get /yarn-leader-election/RM_CLUSTER/AActiveBreadCrumb
RM_CLUSTER_x0012_rm2(rm2是在server8上)
cZxid = 0x100000019
ctime = Sat Jun 06 18:05:08 CST 2020
mZxid = 0x100000019
mtime = Sat Jun 06 18:05:08 CST 2020
pZxid = 0x100000019
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 17
numChildren = 0