HDFS-HA自动故障转移失效

1 集群配置与规划

HDFS-HA自动故障转移配置参考:https://blog.csdn.net/weixin_38023225/article/details/101346493

集群规划
master-node slave-node1 slave-node2

NameNode

JournalNode

DataNode

ZK

 

NodeManager

NameNode

JournalNode

DataNode

ZK

ResourceManager

NodeManager

 

JournalNode

DataNode

ZK

 

NodeManager

2 问题描述

我本地构建了3个节点的集群,集群规划见上表

启动集群,查看web页面显示正常(master-node为active,slave-node1为standby)

HDFS-HA自动故障转移失效_第1张图片

HDFS-HA自动故障转移失效_第2张图片

验证:将Active NameNode进程kill

[caimh@master-node hadoop-2.7.4]$ jps
4401 NameNode
4913 Jps
3797 JournalNode
4505 DataNode
4218 QuorumPeerMain
4847 DFSZKFailoverController
[caimh@master-node hadoop-2.7.4]$ kill -9 4401
[caimh@master-node hadoop-2.7.4]$ jps
3797 JournalNode
4505 DataNode
4218 QuorumPeerMain
4955 Jps
4847 DFSZKFailoverController

再次查看web页面,master-node(nn1)已经死掉了,slave-node(nn2)仍然是standby,没有转为active

3 原因分析

通过查看日志分析,判定为ssh免密登陆没有配完整。之前只配了master-node到slave-node1的免密登陆,没有配slave-node1到master-node的免密登陆,需要配置互相免密。

HDFS-HA自动故障转移失效_第3张图片

4 处理措施

配置slave-node1到master-node的免密登陆

[caimh@slave-node1 ~]$ ssh-keygen -t rsa    --4个enter
[caimh@slave-node1 ~]$ ssh-copy-id  master-node
[caimh@slave-node1 ~]$ ssh master-node
Last login: Wed Sep 25 06:47:36 2019 from slave-node1

5 验证

关掉hdfs

重启hdfs

查看nn1(active),nn2(standby)

杀掉nn1(active),查看nn2(自动变为active),验证成功

[caimh@master-node hadoop-2.7.4]$ sbin/stop-dfs.sh     --关掉hdfs
[caimh@master-node hadoop-2.7.4]$ jps
8790 Jps
6937 QuorumPeerMain
[caimh@master-node hadoop-2.7.4]$ sbin/start-dfs.sh    --重新启动hdfs
[caimh@master-node hadoop-2.7.4]$ jps                  --nn1
9285 JournalNode
9045 DataNode
9527 Jps
8936 NameNode
6937 QuorumPeerMain
9453 DFSZKFailoverController
[caimh@slave-node1 hadoop-2.7.4]$ jps                  --nn2
4594 QuorumPeerMain
7139 DataNode
7509 Jps
7369 DFSZKFailoverController
7066 NameNode
7229 JournalNode
[caimh@master-node hadoop-2.7.4]$ kill -9 8936        --杀掉nn1
[caimh@master-node hadoop-2.7.4]$ bin/hdfs haadmin -getServiceState nn2    --查看nn2状态,已切换为active
active

HDFS-HA自动故障转移失效_第4张图片

 

你可能感兴趣的:(大数据)