前提:java,ssh,hosts都配置完了。
master: namenode resourcemanager zookeeper zkfc
slave1: datanode journalnode nodemanager zookeeper
slave2: datanode journalnode nodemanager zookeeper
node1: zkfc namenode resourcemanager
1.hdfs-site.xml
dfs.nameservices
myha
dfs.ha.namenodes.myha
nn1,nn2
dfs.namenode.rpc-address.myha.nn1
master:8020
dfs.namenode.rpc-address.myha.nn2
node1:8020
dfs.namenode.http-address.myha.nn1
master:50070
dfs.namenode.http-address.myha.nn2
node1:50070
dfs.namenode.shared.edits.dir
qjournal://node1:8485;slave1:8485;slave2:8485/myha
dfs.client.failover.proxy.provider.myha
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.fencing.methods
shell(/bin/true)
dfs.ha.fencing.ssh.private-key-files
/home/hadoop/.ssh/id_rsa
dfs.journalnode.edits.dir
/usr/hadoop/dfs/journalnode
dfs.ha.automatic-failover.enabled
true
dfs.replication
2
dfs.permissions.enabled
false
2.core-site.xml
fs.defaultFS
hdfs://myha
ha.zookeeper.quorum
master:2181,slave1:2181,slave2:2181
hadoop.tmp.dir
/usr/hadoop/tmp
yarn.resourcemanager.ha.enabled
true
yarn.resourcemanager.cluster-id
myyarn
yarn.resourcemanager.ha.rm-ids
rm1,rm2
yarn.resourcemanager.hostname.rm1
master
yarn.resourcemanager.hostname.rm2
node1
yarn.resourcemanager.zk-address
master:2181,slave1:2181,slave2:2181
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.resourcemanager.ha.automatic-failover.enabled
true
yarn.resourcemanager.ha.automatic-failover.embedded
true
yarn.resourcemanager.ha.id
rm1
mapreduce.framework.name
yarn
5.运行hdfs
1. 先开启 zookeeper
2. 在开启journalnode hadoop-daemon.sh start journalnode
3. namenode格式化 hdfs namenode -format
4. zkfc格式化 hdfs zkfc -formatZK
5. master开启namenode hadoop-daemon.sh start namenode
6. master上开启datanode hadoop-daemons.sh start datanode
7. master,node1上开启zkfc hadoop-daemon.sh start zkfc
8. node1备份 hdfs namenode -bootstrapStandby
9. node1开启namenode hadoop-daemon.sh start namenode
上面命令以后可以用start-dfs.sh来代替。
验证:
[hadoop@master tmp]$ hdfs haadmin -getServiceState nn1
active
[hadoop@master tmp]$ hdfs haadmin -getServiceState nn2
standby
master上kill掉namenode,看看会不会把standby的namenode变为active。
[hadoop@master tmp]$ jps
4833 Jps
4199 ResourceManager
3304 NameNode
2377 QuorumPeerMain
3581 DFSZKFailoverController
[hadoop@master tmp]$ kill -9 3304
[hadoop@master tmp]$ hdfs haadmin -getServiceState nn2
active
[hadoop@master tmp]$ hdfs haadmin -getServiceState nn1
17/08/01 20:30:57 INFO ipc.Client: Retrying connect to server: master/192.168.0.110:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
Operation failed: Call From master/192.168.0.110 to master:8020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
[hadoop@master tmp]$
从上面可以看到是可以的。还可以用hdfs haadmin命令来转换[hadoop@master tmp]$ hadoop-daemon.sh start namenode
starting namenode, logging to /usr/hadoop/logs/hadoop-hadoop-namenode-master.out
[hadoop@master tmp]$ jps
4199 ResourceManager
2377 QuorumPeerMain
5002 Jps
3581 DFSZKFailoverController
4926 NameNode
[hadoop@master tmp]$ hdfs haadmin -getServiceState nn1
standby
[hadoop@master tmp]$ hdfs haadmin -getServiceState nn2
active
[hadoop@master tmp]$ hdfs haadmin -failover --forcefence --forceactive nn2 nn1
forcefence and forceactive flags not supported with auto-failover enabled.
[hadoop@master tmp]$ hdfs haadmin -getServiceState nn2
active
[hadoop@master tmp]$ hdfs haadmin -failover nn2 nn1
Failover to NameNode at master/192.168.0.110:8020 successful
[hadoop@master tmp]$ hdfs haadmin -getServiceState nn2
standby
[hadoop@master tmp]$ hdfs haadmin -getServiceState nn1
active
1. master上运行resourcemanager yarn-daemon.sh start resourcemanager
2. node1上运行resourcemanager yarn-daemon.sh start resourcemanager
3. master上运行nodemanager yarn-daemons.sh start nodemanager
验证:
[hadoop@master tmp]$ yarn rmadmin -getServiceState rm1
active
[hadoop@master tmp]$ yarn rmadmin -getServiceState rm2
standby
[hadoop@master tmp]$ jps
5728 Jps
5399 ResourceManager
2377 QuorumPeerMain
3581 DFSZKFailoverController
4926 NameNode
[hadoop@master tmp]$ kill -9 5399
[hadoop@master tmp]$ yarn rmadmin -getServiceState rm2
standby
[hadoop@master tmp]$ yarn rmadmin -getServiceState rm1
17/08/01 20:40:01 INFO ipc.Client: Retrying connect to server: master/192.168.0.110:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
Operation failed: Call From master/192.168.0.110 to master:8033 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
[hadoop@master tmp]$ yarn rmadmin -getServiceState rm2
active