在这以前我想先分享一下hadoop namenode基于QJM实现HA的原理。
首先作为一个典型的ha集群,要有两个namenode,一个是active状态,对外提供服务,一个是standby状态,随时待命,以便当active状态的namenode出现故障的时候能够提供快速的故障恢复能力。
为了保证active和standby节点的元数据一致性。每一个datanode要同时向两个namenode发送block信息,同时集群还有一组JournalNodes守护进程。当active节点执行任何命名空间的修改的时候,需要讲这些信息持久化到一半以上的JournalNodes上 ,而同时standby节点则从JournalNodes上面读取命名空间信息,更新到自己内部的命名空间。一旦active节点发生错误,standby节点才能保证切换到active。下面引用一张架构图。
为了防止脑裂发生,Hadoop HA保证了同一时间点只有一个namenode处于active状态,并且同时只能有一个namenode向JournalNodes上面提交数据。
主机信息:
IP | 主机名 | 作用 | 备注 |
192.168.2.10 | bi10 | namenode,datanode,JournalNode | 主namenode |
192.168.2.12 | bi12 | namenode,resourcemanager,datanode,JournalNode | 主resourcemanager,副namenode |
192.168.2.13 | bi13 | resourcemanager,datanode,JournalNode | 副resourcemanager |
192.168.4.33 | bi3 | zookeeper | |
192.168.4.34 | bi4 | zookeeper | |
192.168.4.35 | bi5 | zookeeper |
主要目录信息:
主机 | 挂载信息 | 目录分配 | 建立hdfs目录 |
bi10 | /dev/sda /data1 /dev/sdb /data2 /dev/sdc /data3 /dev/sdd /data4 |
mkdir /home/hadoop/work/hadoop-2.6.2/data/hdfs/name/ mkdir /home/hadoop/work/hadoop-2.6.2/data/journal/ mkdir /home/hadoop/work/hadoop-2.6.2/temp/ hadoop.tmp.dir:/home/hadoop/work/hadoop-2.6.2/temp/ dfs.journalnode.edits.dir:/home/hadoop/work/hadoop-2.6.2/data/journal dfs.namenode.name.dir:/home/hadoop/work/hadoop-2.6.2/data/hdfs/name |
mkdir /data1/hdfsdata/ mkdir /data2/hdfsdata/ mkdir /data3/hdfsdata/ mkdir /data4/hdfsdata/ |
bi12 | /dev/sda /data1 /dev/sdb /data2 /dev/sdc /data3 /dev/sdd /data4 |
mkdir /home/hadoop/work/hadoop-2.6.2/data/hdfs/name/ mkdir /home/hadoop/work/hadoop-2.6.2/data/journal/ mkdir /home/hadoop/work/hadoop-2.6.2/temp/ hadoop.tmp.dir:/home/hadoop/work/hadoop-2.6.2/temp/ dfs.journalnode.edits.dir:/home/hadoop/work/hadoop-2.6.2/data/journal dfs.namenode.name.dir:/home/hadoop/work/hadoop-2.6.2/data/hdfs/name |
mkdir /data1/hdfsdata/ mkdir /data2/hdfsdata/ mkdir /data3/hdfsdata/ mkdir /data4/hdfsdata/ |
bi13 | /dev/sda /data1 /dev/sdb /data2 /dev/sdd /data4 /dev/sdc /data3 /dev/sde /data5 /dev/sdf /data6 |
mkdir /home/hadoop/work/hadoop-2.6.2/data/hdfs/name/ mkdir /home/hadoop/work/hadoop-2.6.2/data/journal/ mkdir /home/hadoop/work/hadoop-2.6.2/temp/ hadoop.tmp.dir:/home/hadoop/work/hadoop-2.6.2/temp/ dfs.journalnode.edits.dir:/home/hadoop/work/hadoop-2.6.2/data/journal dfs.namenode.name.dir:/home/hadoop/work/hadoop-2.6.2/data/hdfs/name |
mkdir /data1/hdfsdata/ mkdir /data2/hdfsdata/ mkdir /data3/hdfsdata/ mkdir /data4/hdfsdata/ mkdir /data5/hdfsdata/ mkdir /data6/hdfsdata/ |
hadoop-env.sh配置 java环境
# The java implementation to use. export JAVA_HOME=/home/hadoop/work/jdk1.7.0_75
core-site.xml配置,
<configuration> <!-- 指定hdfs的nameservice为masters --> <property> <name>fs.defaultFS</name> <value>hdfs://masters</value> </property> <!-- 指定hadoop临时目录 --> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/work/hadoop-2.6.2/temp/</value> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>bi3:2181,bi4:2181,bi5:2181</value> </property> </configuration>
hdfs-site.xml配置,
<configuration> <!--指定hdfs的nameservice为masters,需要和core-site.xml中的保持一致 --> <property> <name>dfs.nameservices</name> <value>masters</value> </property> <!-- masters下面有两个NameNode,分别是bi10,bi12 --> <property> <name>dfs.ha.namenodes.masters</name> <value>nn1,nn2</value> </property> <!-- dehadp01的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.masters.nn1</name> <value>bi10:9000</value> </property> <!-- dehadp01的http通信地址 --> <property> <name>dfs.namenode.http-address.masters.nn1</name> <value>bi10:50070</value> </property> <!-- dehadp02的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.masters.nn2</name> <value>bi12:9000</value> </property> <!-- dehadp02的http通信地址 --> <property> <name>dfs.namenode.http-address.masters.nn2</name> <value>bi12:50070</value> </property> <!-- 指定NameNode的元数据在JournalNode上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://bi10:8485;bi12:8485;bi13:8485/masters</value> </property> <!-- 指定JournalNode在本地磁盘存放数据的位置 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/hadoop/work/hadoop-2.6.2/data/journal</value> </property> <!-- 开启NameNode失败自动切换 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 配置失败自动切换实现方式 --> <property> <name>dfs.client.failover.proxy.provider.masters</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行--> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <!-- 使用sshfence隔离机制时需要ssh免登陆 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <!-- 配置sshfence隔离机制超时时间 --> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <property> <name>dfs.blocksize</name> <value>128m</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/work/hadoop-2.6.2/data/hdfs/name</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/data1/hdfsdata,file:/data2/hdfsdata,file:/data3/hdfsdata,file:/data4/hdfsdata</value> </property> </configuration>
yarn-site.xml配置,
<configuration> <!-- Site specific YARN configuration properties --> <!-- 开启RM高可靠 --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 指定RM的cluster id --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>RM_HA_ID</value> </property> <!-- 指定RM的名字 --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!-- 分别指定RM的地址 --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>bi12</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>bi13</value> </property> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <!-- 指定zk集群地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>bi3:2181,bi4:2181,bi5:2181</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
mapred-site.xml配置,指定使用yarn,
<configuration> <!-- 指定mr框架为yarn方式 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
配置slaves文件,
bi10 bi12 bi13
1. 在主namenode(bi10)上,启动三台journalnode
[hadoop@bi10 ~]$ hadoop-daemons.sh start journalnode bi10: starting journalnode, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-journalnode-bi10.out bi12: starting journalnode, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-journalnode-bi12.out bi13: starting journalnode, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-journalnode-bi13.out
2. 在主namenode(bi10)上,格式化主namenode
[hadoop@bi10 ~]$ hdfs namenode -format
3. 在主namenode(bi10)上,格式化zkfc
[hadoop@bi10 ~]$ hdfs zkfc -formatZK
4. 在主namenode(bi10)上,启动主namenode
[hadoop@bi10 ~]$ hadoop-daemon.sh start namenode
5. 在从namenode(bi12)上,同步namenode信息
[hadoop@bi12 ~]$ hdfs namenode -bootstrapStandby
6. 在从namenode(bi12)上,启动从namenode
[hadoop@bi12 ~]$ hadoop-daemon.sh start namenode
查看三台主机的jps进程信息
[hadoop@bi10 ~]$ jps 1914 JournalNode 2294 Jps 2109 NameNode [hadoop@bi12 ~]$ jps 12063 NameNode 12141 Jps 11843 JournalNode [hadoop@bi13 ~]$ jps 22197 JournalNode 22323 Jps
查看zookeeper信息
[zk: localhost:2181(CONNECTED) 13] ls /hadoop-ha [ns1, masters]
7. 分别在bi10和bi12上面启动namenode自动切换
[hadoop@bi10 ~]$ hadoop-daemon.sh start zkfc starting zkfc, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-zkfc-bi10.out
[hadoop@bi12 ~]$ hadoop-daemon.sh start zkfc starting zkfc, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-zkfc-bi12.out
1. 在主namenode(bi10)上面之行
[hadoop@bi10 ~]$ hadoop-daemons.sh start datanode bi10: starting datanode, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-datanode-bi10.out bi12: starting datanode, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-datanode-bi12.out bi13: starting datanode, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-datanode-bi13.out
1. 在主resourcemanager(bi12)上面启动yarn
[hadoop@bi12 ~]$ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/hadoop/work/hadoop-2.6.2/logs/yarn-hadoop-resourcemanager-bi12.out bi10: starting nodemanager, logging to /home/hadoop/work/hadoop-2.6.2/logs/yarn-hadoop-nodemanager-bi10.out bi12: starting nodemanager, logging to /home/hadoop/work/hadoop-2.6.2/logs/yarn-hadoop-nodemanager-bi12.out bi13: starting nodemanager, logging to /home/hadoop/work/hadoop-2.6.2/logs/yarn-hadoop-nodemanager-bi13.out
2. 在从resourcemanager上面(bi13)上面启动resourcemanager
yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /home/hadoop/work/hadoop-2.6.2/logs/yarn-hadoop-resourcemanager-bi13.out
查看三台主机进程
[hadoop@bi10 ~]$ jps 2659 NodeManager 1914 JournalNode 2784 Jps 2347 DFSZKFailoverController 2515 DataNode 2109 NameNode
[hadoop@bi12 ~]$ jps 12063 NameNode 12403 DataNode 11843 JournalNode 12569 ResourceManager 12270 DFSZKFailoverController 12678 NodeManager 13031 Jps
[hadoop@bi13 ~]$ jps 22729 Jps 22383 DataNode 22197 JournalNode 22553 NodeManager 22691 ResourceManager
1. 上传测试文件
[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -mkdir /user [hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -mkdir /user/hadoop [hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -mkdir /user/hadoop/wordcount [hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -mkdir /user/hadoop/wordcount/input [hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -put ./LICENSE.txt /user/hadoop/wordcount/input [hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -ls /user/hadoop/wordcount/input Found 1 items -rw-r--r-- 2 hadoop supergroup 15429 2016-02-16 15:38 /user/hadoop/wordcount/input/LICENSE.txt
2. 之行wordcount测试
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /user/hadoop/wordcount/input /user/hadoop/wordcount/output
3. 查看测试结果
[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -ls /user/hadoop/wordcount/output Found 2 items -rw-r--r-- 2 hadoop supergroup 0 2016-02-16 15:45 /user/hadoop/wordcount/output/_SUCCESS -rw-r--r-- 2 hadoop supergroup 8006 2016-02-16 15:45 /user/hadoop/wordcount/output/part-r-00000 [hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -cat /user/hadoop/wordcount/output/part-r-00000
1. 查看当前namenode状态
[hadoop@bi10 hadoop-2.6.2]$ hdfs haadmin -getServiceState nn1 active [hadoop@bi10 hadoop-2.6.2]$ hdfs haadmin -getServiceState nn2 standby
2. 模拟主namenode崩溃,然后再次查看namenode状态并测试
[hadoop@bi10 hadoop-2.6.2]$ hadoop-daemon.sh stop namenode stopping namenode [hadoop@bi10 hadoop-2.6.2]$ jps 2659 NodeManager 3691 Jps 1914 JournalNode 2347 DFSZKFailoverController 2515 DataNode
[hadoop@bi10 hadoop-2.6.2]$ hdfs haadmin -getServiceState nn1 16/02/16 15:53:48 INFO ipc.Client: Retrying connect to server: bi10/192.168.2.10:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) Operation failed: Call From bi10/192.168.2.10 to bi10:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused [hadoop@bi10 hadoop-2.6.2]$ hdfs haadmin -getServiceState nn2 active [hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -ls Found 1 items drwxr-xr-x - hadoop supergroup 0 2016-02-16 15:45 wordcount
3. 重启bi10上面的namenode,再次查看namenode状态
[hadoop@bi10 hadoop-2.6.2]$ hadoop-daemon.sh start namenode starting namenode, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-namenode-bi10.out [hadoop@bi10 hadoop-2.6.2]$ hdfs haadmin -getServiceState nn1 standby [hadoop@bi10 hadoop-2.6.2]$ hdfs haadmin -getServiceState nn2 active
1. 模拟主resourcemanager崩溃,然后测试
[hadoop@bi12 ~]$ yarn rmadmin -getServiceState rm1 active [hadoop@bi12 ~]$ yarn rmadmin -getServiceState rm2 standby [hadoop@bi12 ~]$ yarn-daemon.sh stop resourcemanager stopping resourcemanager [hadoop@bi12 ~]$ yarn rmadmin -getServiceState rm2 active [hadoop@bi12 ~]$ yarn rmadmin -getServiceState rm1 16/02/16 16:11:36 INFO ipc.Client: Retrying connect to server: bi12/192.168.2.12:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) Operation failed: Call From bi12/192.168.2.12 to bi12:8033 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2. 重启bi12的resourcemanager,查看resourcemanager状态
[hadoop@bi12 ~]$ yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /home/hadoop/work/hadoop-2.6.2/logs/yarn-hadoop-resourcemanager-bi12.out [hadoop@bi12 ~]$ yarn rmadmin -getServiceState rm1 standby [hadoop@bi12 ~]$ yarn rmadmin -getServiceState rm2 active
3. 测试wordcount
[hadoop@bi10 hadoop-2.6.2]$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /user/hadoop/wordcount/input /user/hadoop/wordcount/output1 [hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -ls wordcount/output1 Found 2 items -rw-r--r-- 2 hadoop supergroup 0 2016-02-16 16:14 wordcount/output1/_SUCCESS -rw-r--r-- 2 hadoop supergroup 8006 2016-02-16 16:14 wordcount/output1/part-r-00000
yarn-daemons.sh stop nodemanager yarn-daemons.sh stop resourcemanager hadoop-daemons.sh stop datanode hadoop-daemons.sh stop zkfc hadoop-daemons.sh stop namenode hadoop-daemons.sh stop journalnode