hadoop2.6.0 HA 高可靠集群环境搭建

      Hadoop 2.0.0之前,HDFS集群中Namenode是单点故障(SPOF。每个集群有一个节点,如果机器或过程变得不可用,在nn重启之前或者部署到另外一台之前集群将不可用

        两种主要导致集群不可能的途径:

  • 在一个意外事件如机器死机的情况下,集群不可用直到nn重启。

  • 计划维修项目如软件或硬件的升级对Namenode机器会导致集群的停机。

     HDFS的高可用性功能,解决了上述问题,通过提供运行两个冗余节点在同一个集群的主动/被动配置双机热备选项。这允许快速切换到下一个新的节点,一个机器崩溃,或优雅的管理员发起的计划检修的目的故障转移。

      HA环境主要通过两个主备切换的NN来实现高可靠,备用节点通过JNS来保持同步主节点状态,当主节点挂掉后,备用节点变成主节点。

     为了部署一个HA集群,你应该准备以下:

     NameNode机器-机器上运行的活动和备用节点应具有相同的硬件,和等效的硬件将用于在非HA集群。

     journalnode机器-机器上运行你的journalnodesjournalnode守护进程是相对较轻,所以这些程序可以合理地配置与其他Hadoop守护进程的机器,例如节点,JobTracker,或 ResourceManager。注意:必须有至少3 个journalnode守护进程,因为editlog的修改必须大多数JNS。这将允许系统容忍单机故障。你也可以运行超过3 journalnodes,但为了真正提高故障系统可以容忍的数量,你应该运行JNS奇数,(即357,等)。注意,当运行N journalnodes,系统可以容忍最多(n - 1/ 2的失败和继续正常。

        注意不要启动Secondary NameNode,会报错。

 一些配置说明:

hdfs-site.xml

dfs.nameservices -命名空间的逻辑名称

 <property>  
        <name>dfs.nameservices</name>
        <value>hacluster</value> 
    </property>

dfs.ha.namenodes.[nameservice ID] - 命名空间中每个namenode的唯一标识

<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>

dfs.namenode.rpc-address.[nameservice ID].[name node ID] - namenode监听的rpc端口

property>
  <name>dfs.namenode.rpc-address.mycluster.nn1</name>
  <value>machine1.example.com:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.mycluster.nn2</name>
  <value>machine2.example.com:8020</value>
</property>

dfs.namenode.http-address.[nameservice ID].[name node ID] - namenode监听的http端口

<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>machine1.example.com:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>machine2.example.com:50070</value>
</property>

dfs.namenode.shared.edits.dir - NameNode读写JNs组的uri

<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://node1.example.com:8485;node2.example.com:8485;node3.example.com:8485/mycluster</value>
</property>

dfs.client.failover.proxy.provider.[nameservice ID] -HDFS CLIENTS连接到活动nn的j'ava方法

<property>
  <name>dfs.client.failover.proxy.provider.mycluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

sshfence - ssh到活动NN并杀死进程

<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>
<property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/home/exampleuser/.ssh/id_rsa</value>
</property>

The configuration of automatic failover requires the addition of two new parameters to your configuration. In your hdfs-site.xml file, add:

 <property>
   <name>dfs.ha.automatic-failover.enabled</name>
   <value>true</value>
 </property>

This specifies that the cluster should be set up for automatic failover. In your core-site.xml file, add:

 <property>
   <name>ha.zookeeper.quorum</name>
   <value>zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181</value>
 </property>

This lists the host-port pairs running the ZooKeeper service.


core-site.xml

fs.defaultFS - 命名空间

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://mycluster</value>
</property>

dfs.journalnode.edits.dir - JN存储数据本地路径

<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/path/to/journal/node/local/data</value>
</property>


操作系统:   centos 7

jdk:          1.7

hadoop:    2.6.0

zk :          3.4.6

做好ssh免登,zookeeper启动工作。


节点进程列表:

节点 NameNode DataNode JournalNode

DFSZKFailoverController

ResourceManager

NodeManager QuorumPeerMain
zoo1 Y

Y Y
Y
zoo2 Y Y Y Y Y Y Y
zoo3
Y Y

Y Y
zoo4
Y Y

Y

参考配置文件内容如下:

core-site.xml

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/app/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hacluster</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>1031072</value>
    </property>
    <property>
        <name>hadoop.native.lib</name>
        <value>true</value>
        <description>Should native hadoop libraries, if present, be used.</description>
    </property>
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>zoo1:2181,zoo2:2181,zoo3:2181</value>
    </property>
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/app/hadoop-2.6.0/jnd</value>
    </property>
</configuration>

hdfs-site.xml

<configuration>
 <property>  
        <name>dfs.namenode.name.dir</name>  
        <value>/app/dfs/name</value>  
    </property>  
    <property>  
        <name>dfs.datanode.data.dir</name>  
        <value>/app/dfs/data</value>  
    </property>  
    <property>  
        <name>dfs.replication</name>  
        <value>2</value>  
    </property>  
    <property>  
        <name>dfs.webhdfs.enabled</name>  
        <value>true</value>  
    </property>
     <property> 
               <name>dfs.datanode.max.xcievers</name> 
               <value>4096</value> 
     </property>      
     <property>
               <name>dfs.nameservices</name>
               <value>hacluster</value>
     </property>
     <property>
               <name>dfs.ha.namenodes.hacluster</name>
               <value>nn1,nn2</value>
     </property>
     <property>
               <name>dfs.namenode.rpc-address.hacluster.nn1</name>
               <value>zoo1:8020</value>
     </property>
     <property>
                <name>dfs.namenode.rpc-address.hacluster.nn2</name>
                <value>zoo2:8020</value>
     </property>
      <property>
                <name>dfs.namenode.http-address.hacluster.nn1</name>
                <value>zoo1:50070</value>
      </property>
      <property>
                <name>dfs.namenode.http-address.hacluster.nn2</name>
               <value>zoo2:50070</value>
      </property>
      <property>
                <name>dfs.namenode.shared.edits.dir</name>
                <value>qjournal://zoo2:8485;zoo3:8485;zoo4:8485/hacluster</value>
      </property>
      <property>
                 <name>dfs.client.failover.proxy.provider.hacluster</name>
                 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
      </property>
      <property>
                 <name>dfs.namenode.shared.edits.dir</name>
                 <value>qjournal://zoo2:8485;zoo3:8485;zoo4:8485/mycluster</value>
      </property>
      <property>
                 <name>dfs.ha.fencing.methods</name>
                 <value>sshfence</value>
      </property>
      <property>
                 <name>dfs.ha.fencing.ssh.private-key-files</name>
                 <value>/home/hadoop/.ssh/id_rsa</value>
      </property>
      <property>
                  <name>dfs.ha.automatic-failover.enabled</name>
                  <value>true</value>
      </property>
</configuration>

yarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties -->
        <property>
               <name>yarn.resourcemanager.ha.enabled</name>
                <value>true</value>
        </property>
        <property>
                <name>yarn.resourcemanager.cluster-id</name>
                <value>yarn-ha</value>
        </property>
        <property>
                <name>yarn.resourcemanager.ha.rm-ids</name>
                <value>rm1,rm2</value>
        </property>
        <!-- RM-->
        <property>
                <name>yarn.resourcemanager.hostname.rm1</name>
                <value>zoo1</value>
        </property>
        <property>
                <name>yarn.resourcemanager.hostname.rm2</name>
                <value>zoo2</value>
        </property>
        <property>
                <name>yarn.resourcemanager.recovery.enabled</name>
                <value>true</value>
        </property>
        <property>
                 <name>yarn.resourcemanager.zk-address</name>
                 <value>zoo1:2181,zoo2:2181,zoo3:2181</value>
        </property>
         
        <property>
                <name>yarn.resourcemanager.store.class</name>
                <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
</configuration>

maperd-site.xml

<configuration>
   <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
   </property>
</configuration>

 slaves 

zoo2
zoo3
zoo4


启动顺序(严格按照顺序)

1.启动jn  

hadoop-daemon.sh start journalnode

2.在第一个nn上zoo1上格式化hdfs  

hdfs namenode -format

3.启动之前格式化的nn,zoo1上执行

hadoop-daemon.sh start namenode

4.在另外个nn上zoo2运行 

hdfs namenode -bootstrapStandby

5.停止hdfs所有的服务 

stop-dfs.sh

6.初始化zk node,  其中一个nn上执行

hdfs zkfc -formatZk

7.启动hdfs所有的服务

 start-dfs.sh

8.执行

start-yarn.sh

至此各节点上的服务应该都启动完成;

首先看下zoo1上的hdfs页面:

http://zoo1:50070

可以发现zoo1是active

然后再看下zoo2上的页面standyby

下面测试下主备切换

在zoo1上kill掉namenode的进程会发现zoo2变成了active 。

环境搭建成功










你可能感兴趣的:(hadoop,HA,hadoop2.6.0)