hadoop-2.6.2 namenode resourcemanager ha 环境搭建

    在这以前我想先分享一下hadoop namenode基于QJM实现HA的原理。

    首先作为一个典型的ha集群,要有两个namenode,一个是active状态,对外提供服务,一个是standby状态,随时待命,以便当active状态的namenode出现故障的时候能够提供快速的故障恢复能力。

    为了保证active和standby节点的元数据一致性。每一个datanode要同时向两个namenode发送block信息,同时集群还有一组JournalNodes守护进程。当active节点执行任何命名空间的修改的时候,需要讲这些信息持久化到一半以上的JournalNodes上 ,而同时standby节点则从JournalNodes上面读取命名空间信息,更新到自己内部的命名空间。一旦active节点发生错误,standby节点才能保证切换到active。下面引用一张架构图。

hadoop-2.6.2 namenode resourcemanager ha 环境搭建_第1张图片

    为了防止脑裂发生,Hadoop HA保证了同一时间点只有一个namenode处于active状态,并且同时只能有一个namenode向JournalNodes上面提交数据。

下面我们开始搭建hadoop ha集群:hadoop-2.6.2

    主机信息: 

IP 主机名 作用 备注
192.168.2.10 bi10 namenode,datanode,JournalNode 主namenode
192.168.2.12 bi12 namenode,resourcemanager,datanode,JournalNode 主resourcemanager,副namenode
192.168.2.13 bi13 resourcemanager,datanode,JournalNode 副resourcemanager
192.168.4.33 bi3 zookeeper
192.168.4.34 bi4 zookeeper
192.168.4.35 bi5 zookeeper

    主要目录信息:

主机 挂载信息 目录分配 建立hdfs目录
bi10

/dev/sda  /data1

/dev/sdb  /data2

/dev/sdc  /data3

/dev/sdd  /data4

mkdir /home/hadoop/work/hadoop-2.6.2/data/hdfs/name/

mkdir /home/hadoop/work/hadoop-2.6.2/data/journal/

mkdir /home/hadoop/work/hadoop-2.6.2/temp/

hadoop.tmp.dir:/home/hadoop/work/hadoop-2.6.2/temp/

dfs.journalnode.edits.dir:/home/hadoop/work/hadoop-2.6.2/data/journal

dfs.namenode.name.dir:/home/hadoop/work/hadoop-2.6.2/data/hdfs/name

mkdir /data1/hdfsdata/

mkdir /data2/hdfsdata/

mkdir /data3/hdfsdata/

mkdir /data4/hdfsdata/

bi12

/dev/sda  /data1

/dev/sdb  /data2

/dev/sdc  /data3

/dev/sdd  /data4

mkdir /home/hadoop/work/hadoop-2.6.2/data/hdfs/name/

mkdir /home/hadoop/work/hadoop-2.6.2/data/journal/

mkdir /home/hadoop/work/hadoop-2.6.2/temp/

hadoop.tmp.dir:/home/hadoop/work/hadoop-2.6.2/temp/

dfs.journalnode.edits.dir:/home/hadoop/work/hadoop-2.6.2/data/journal

dfs.namenode.name.dir:/home/hadoop/work/hadoop-2.6.2/data/hdfs/name

mkdir /data1/hdfsdata/

mkdir /data2/hdfsdata/

mkdir /data3/hdfsdata/

mkdir /data4/hdfsdata/

bi13

/dev/sda  /data1

/dev/sdb  /data2

/dev/sdd  /data4

/dev/sdc  /data3

/dev/sde  /data5

/dev/sdf  /data6

mkdir /home/hadoop/work/hadoop-2.6.2/data/hdfs/name/

mkdir /home/hadoop/work/hadoop-2.6.2/data/journal/

mkdir /home/hadoop/work/hadoop-2.6.2/temp/

hadoop.tmp.dir:/home/hadoop/work/hadoop-2.6.2/temp/

dfs.journalnode.edits.dir:/home/hadoop/work/hadoop-2.6.2/data/journal

dfs.namenode.name.dir:/home/hadoop/work/hadoop-2.6.2/data/hdfs/name

mkdir /data1/hdfsdata/

mkdir /data2/hdfsdata/

mkdir /data3/hdfsdata/

mkdir /data4/hdfsdata/

mkdir /data5/hdfsdata/

mkdir /data6/hdfsdata/

    hadoop-env.sh配置 java环境

# The java implementation to use.
export JAVA_HOME=/home/hadoop/work/jdk1.7.0_75

    core-site.xml配置,

<configuration>
  <!-- 指定hdfs的nameservice为masters -->
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://masters</value>
  </property>
  <!-- 指定hadoop临时目录 -->
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/work/hadoop-2.6.2/temp/</value>
  </property>
  <!-- 指定zookeeper地址 -->
  <property>
    <name>ha.zookeeper.quorum</name>
    <value>bi3:2181,bi4:2181,bi5:2181</value>
  </property>
</configuration>

    hdfs-site.xml配置,

<configuration>
  <!--指定hdfs的nameservice为masters,需要和core-site.xml中的保持一致 -->
  <property>
    <name>dfs.nameservices</name>
    <value>masters</value>
  </property>
  <!-- masters下面有两个NameNode,分别是bi10,bi12 -->
  <property>
    <name>dfs.ha.namenodes.masters</name>
    <value>nn1,nn2</value>
  </property>
  <!-- dehadp01的RPC通信地址 -->
  <property>
    <name>dfs.namenode.rpc-address.masters.nn1</name>
    <value>bi10:9000</value>
  </property>
  <!-- dehadp01的http通信地址 -->
  <property>
    <name>dfs.namenode.http-address.masters.nn1</name>
    <value>bi10:50070</value>
  </property>
  <!-- dehadp02的RPC通信地址 -->
  <property>
    <name>dfs.namenode.rpc-address.masters.nn2</name>
    <value>bi12:9000</value>
  </property>
  <!-- dehadp02的http通信地址 -->
  <property>
    <name>dfs.namenode.http-address.masters.nn2</name>
    <value>bi12:50070</value>
  </property>
  <!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
  <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://bi10:8485;bi12:8485;bi13:8485/masters</value>
  </property>
  <!-- 指定JournalNode在本地磁盘存放数据的位置 -->
  <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/home/hadoop/work/hadoop-2.6.2/data/journal</value>
  </property>
  <!-- 开启NameNode失败自动切换 -->
  <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>
  <!-- 配置失败自动切换实现方式 -->
  <property>
    <name>dfs.client.failover.proxy.provider.masters</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
  <!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
  <property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
  </property>
  <!-- 使用sshfence隔离机制时需要ssh免登陆 -->
  <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hadoop/.ssh/id_rsa</value>
  </property>
  <!-- 配置sshfence隔离机制超时时间 -->
  <property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
    <value>30000</value>
  </property>
  <property>
    <name>dfs.blocksize</name>
    <value>128m</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/home/hadoop/work/hadoop-2.6.2/data/hdfs/name</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property> 
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/data1/hdfsdata,file:/data2/hdfsdata,file:/data3/hdfsdata,file:/data4/hdfsdata</value>
  </property>
</configuration>

    yarn-site.xml配置,

<configuration>
  <!-- Site specific YARN configuration properties -->
  <!-- 开启RM高可靠 -->
  <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
  </property>
  <!-- 指定RM的cluster id -->
  <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>RM_HA_ID</value>
  </property>
  <!-- 指定RM的名字 -->
  <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
  </property>
  <!-- 分别指定RM的地址 -->
  <property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>bi12</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>bi13</value>
  </property>
  <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
  </property>
  <!-- 指定zk集群地址 -->
  <property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>bi3:2181,bi4:2181,bi5:2181</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
</configuration>

    mapred-site.xml配置,指定使用yarn,

<configuration>
  <!-- 指定mr框架为yarn方式 -->
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

    配置slaves文件,

bi10
bi12
bi13

开始启动集群

    启动namenode:

    1. 在主namenode(bi10)上,启动三台journalnode

[hadoop@bi10 ~]$ hadoop-daemons.sh start journalnode
bi10: starting journalnode, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-journalnode-bi10.out
bi12: starting journalnode, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-journalnode-bi12.out
bi13: starting journalnode, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-journalnode-bi13.out

    2. 在主namenode(bi10)上,格式化主namenode

[hadoop@bi10 ~]$ hdfs namenode -format

    3. 在主namenode(bi10)上,格式化zkfc

[hadoop@bi10 ~]$ hdfs zkfc -formatZK

    4. 在主namenode(bi10)上,启动主namenode

[hadoop@bi10 ~]$ hadoop-daemon.sh start namenode

    5. 在从namenode(bi12)上,同步namenode信息

[hadoop@bi12 ~]$ hdfs namenode -bootstrapStandby

    6. 在从namenode(bi12)上,启动从namenode

[hadoop@bi12 ~]$ hadoop-daemon.sh start namenode

    查看三台主机的jps进程信息

[hadoop@bi10 ~]$ jps
1914 JournalNode
2294 Jps
2109 NameNode

[hadoop@bi12 ~]$ jps
12063 NameNode
12141 Jps
11843 JournalNode

[hadoop@bi13 ~]$ jps
22197 JournalNode
22323 Jps

    查看zookeeper信息

[zk: localhost:2181(CONNECTED) 13] ls /hadoop-ha
[ns1, masters]

    7. 分别在bi10和bi12上面启动namenode自动切换

[hadoop@bi10 ~]$ hadoop-daemon.sh start zkfc
starting zkfc, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-zkfc-bi10.out
[hadoop@bi12 ~]$ hadoop-daemon.sh start zkfc
starting zkfc, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-zkfc-bi12.out
    启动三台datanode

    1. 在主namenode(bi10)上面之行

[hadoop@bi10 ~]$ hadoop-daemons.sh start datanode
bi10: starting datanode, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-datanode-bi10.out
bi12: starting datanode, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-datanode-bi12.out
bi13: starting datanode, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-datanode-bi13.out
    启动yarn

    1. 在主resourcemanager(bi12)上面启动yarn

[hadoop@bi12 ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/work/hadoop-2.6.2/logs/yarn-hadoop-resourcemanager-bi12.out
bi10: starting nodemanager, logging to /home/hadoop/work/hadoop-2.6.2/logs/yarn-hadoop-nodemanager-bi10.out
bi12: starting nodemanager, logging to /home/hadoop/work/hadoop-2.6.2/logs/yarn-hadoop-nodemanager-bi12.out
bi13: starting nodemanager, logging to /home/hadoop/work/hadoop-2.6.2/logs/yarn-hadoop-nodemanager-bi13.out

    2. 在从resourcemanager上面(bi13)上面启动resourcemanager

yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /home/hadoop/work/hadoop-2.6.2/logs/yarn-hadoop-resourcemanager-bi13.out

    查看三台主机进程

[hadoop@bi10 ~]$ jps
2659 NodeManager
1914 JournalNode
2784 Jps
2347 DFSZKFailoverController
2515 DataNode
2109 NameNode
[hadoop@bi12 ~]$ jps
12063 NameNode
12403 DataNode
11843 JournalNode
12569 ResourceManager
12270 DFSZKFailoverController
12678 NodeManager
13031 Jps
[hadoop@bi13 ~]$ jps
22729 Jps
22383 DataNode
22197 JournalNode
22553 NodeManager
22691 ResourceManager

    集群测试

    wordcount测试

    1. 上传测试文件

[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -mkdir /user
[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -mkdir /user/hadoop
[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -mkdir /user/hadoop/wordcount
[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -mkdir /user/hadoop/wordcount/input
[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -put ./LICENSE.txt /user/hadoop/wordcount/input
[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -ls /user/hadoop/wordcount/input
Found 1 items
-rw-r--r--   2 hadoop supergroup      15429 2016-02-16 15:38 /user/hadoop/wordcount/input/LICENSE.txt

    2. 之行wordcount测试

hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /user/hadoop/wordcount/input /user/hadoop/wordcount/output

    3. 查看测试结果

[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -ls /user/hadoop/wordcount/output
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2016-02-16 15:45 /user/hadoop/wordcount/output/_SUCCESS
-rw-r--r--   2 hadoop supergroup       8006 2016-02-16 15:45 /user/hadoop/wordcount/output/part-r-00000
[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -cat /user/hadoop/wordcount/output/part-r-00000
    namenode冗余测试

    1. 查看当前namenode状态

[hadoop@bi10 hadoop-2.6.2]$ hdfs haadmin -getServiceState nn1
active
[hadoop@bi10 hadoop-2.6.2]$ hdfs haadmin -getServiceState nn2
standby

    2. 模拟主namenode崩溃,然后再次查看namenode状态并测试

[hadoop@bi10 hadoop-2.6.2]$ hadoop-daemon.sh stop namenode
stopping namenode
[hadoop@bi10 hadoop-2.6.2]$ jps
2659 NodeManager
3691 Jps
1914 JournalNode
2347 DFSZKFailoverController
2515 DataNode
[hadoop@bi10 hadoop-2.6.2]$ hdfs haadmin -getServiceState nn1
16/02/16 15:53:48 INFO ipc.Client: Retrying connect to server: bi10/192.168.2.10:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
Operation failed: Call From bi10/192.168.2.10 to bi10:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
[hadoop@bi10 hadoop-2.6.2]$ hdfs haadmin -getServiceState nn2
active
[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -ls 
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2016-02-16 15:45 wordcount

    3. 重启bi10上面的namenode,再次查看namenode状态

[hadoop@bi10 hadoop-2.6.2]$ hadoop-daemon.sh start namenode
starting namenode, logging to /home/hadoop/work/hadoop-2.6.2/logs/hadoop-hadoop-namenode-bi10.out
[hadoop@bi10 hadoop-2.6.2]$ hdfs haadmin -getServiceState nn1
standby
[hadoop@bi10 hadoop-2.6.2]$ hdfs haadmin -getServiceState nn2
active

resourcesmanager冗余测试

    1. 模拟主resourcemanager崩溃,然后测试

[hadoop@bi12 ~]$ yarn rmadmin -getServiceState rm1
active
[hadoop@bi12 ~]$ yarn rmadmin -getServiceState rm2
standby
[hadoop@bi12 ~]$ yarn-daemon.sh stop resourcemanager
stopping resourcemanager
[hadoop@bi12 ~]$ yarn rmadmin -getServiceState rm2
active
[hadoop@bi12 ~]$ yarn rmadmin -getServiceState rm1
16/02/16 16:11:36 INFO ipc.Client: Retrying connect to server: bi12/192.168.2.12:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
Operation failed: Call From bi12/192.168.2.12 to bi12:8033 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

    2. 重启bi12的resourcemanager,查看resourcemanager状态

[hadoop@bi12 ~]$ yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /home/hadoop/work/hadoop-2.6.2/logs/yarn-hadoop-resourcemanager-bi12.out
[hadoop@bi12 ~]$ yarn rmadmin -getServiceState rm1
standby
[hadoop@bi12 ~]$ yarn rmadmin -getServiceState rm2
active

    3. 测试wordcount

[hadoop@bi10 hadoop-2.6.2]$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /user/hadoop/wordcount/input /user/hadoop/wordcount/output1
[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -ls wordcount/output1
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2016-02-16 16:14 wordcount/output1/_SUCCESS
-rw-r--r--   2 hadoop supergroup       8006 2016-02-16 16:14 wordcount/output1/part-r-00000

    

    关闭集群

yarn-daemons.sh stop nodemanager
yarn-daemons.sh stop resourcemanager
hadoop-daemons.sh stop datanode
hadoop-daemons.sh stop zkfc
hadoop-daemons.sh stop namenode
hadoop-daemons.sh stop journalnode


你可能感兴趣的:(NameNode,HA,ResourceManager,hadoop-2.6.2)