2-0三台PC集群的搭建(HA+backupmaster)

说明:

本文档实现hdfs的HA功能,其中zookeeper的配置不变,hbase的配置略作修改,同时启用backup master

主要是修改hadoop的配置,相关内容:

参见:1-0三台PC集群的搭建(noHA)。

 

1.HADOOP 修改

实现HDFS的HA的搭建,

部署目录:/opt/hadoop-2.2.0

(1)创建目录(变化)

Namenode节点(master1,slave1):

mkdir –p /data/hdfs/nn

Journal节点(master1,slave1,slave2)

mkdir –p/data/journal

所有hadoop(master1,slave1-2)节点:

mkdir –p/data/tmp_hadoop   /data/hdfs/dn  /data/log/hadoop-hdfs   /data/log/hadoop-yarn  /data/log/hadoop-mapred  /data/yarn/local  /data/yarn/logs

(2)环境变量

vi/etc/profile.d/hadoop.sh 

添加

#set hadoop environment

export HADOOP_HOME=/opt/hadoop-2.2.0 

export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HADOOP_HOME/sbin

(3)配置文件修改

1)配置masters(变化)

#标识secondarynamendoe,这里采用了HA方案,这个配置缺省。

2) 配置slaves  

#标识集群的datanode

master1

slave1

slave2

3) 配置hadoop-env.sh

export JAVA_HOME=/opt/jdk1.7.0_45

export HADOOP_LOG_DIR=/data/log/hadoop-hdfs

export YARN_LOG_DIR=/data/log/hadoop-yarn

export HADOOP_MAPRED_LOG_DIR=/data/log/hadoop-mapred

4) 配置yarn-env.sh

export JAVA_HOME=/opt/jdk1.7.0_45

5)编辑core-site.xml(变化)

       hadoop.tmp.dir

       /data/tmp_hadoop

        

        

                                            

       fs.defaultFS                          

       hdfs://mycluster

         

      fs.trash.interval

      1440

        

      fs.trash.checkpoint.interval

      1440

        

 

         io.file.buffer.size

         131072

        

        dfs.blocksize

        67108864

       

      dfs.ha.fencing.methods

      sshfence

     

      dfs.ha.fencing.ssh.private-key-files

      /home/hadoop/.ssh/id_rsa

        

 

       ha.zookeeper.quorum

       master1:2181,slave1:2181,slave2:2181

6)编辑hdfs-site.xml(变化)

                                           

       dfs.namenode.name.dir                         

        /data/hdfs/nn

             

             

                                         

        dfs.datanode.data.dir                   

        /data/hdfs/dn

             

             

       dfs.journalnode.edits.dir

       /data/journal

         

 dfs.permissions.superusergroup

 hadoop

        dfs.nameservices

        mycluster

       

       dfs.ha.namenodes.mycluster

       nn1,nn2

      

        dfs.namenode.rpc-address.mycluster.nn1

        master1:8020

      

        dfs.namenode.rpc-address.mycluster.nn2

        slave1:8020

       dfs.namenode.http-address.mycluster.nn1

       master1:50070

     

       dfs.namenode.http-address.mycluster.nn2

       slave1:50070

       dfs.namenode.shared.edits.dir

       qjournal://master1:8485;slave1:8485;slave2:8485/mycluster

      

 

      dfs.client.failover.proxy.provider.mycluster     org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

     

       dfs.ha.automatic-failover.enabled

       true

      

 

        dfs.namenode.handler.count

        100

             

 

        dfs.replication 

        3 

             

 

7)编辑yarn-site.xml

       yarn.nodemanager.local-dirs

       /data/yarn/local

      

      

       yarn.nodemanager.log-dirs

       /data/yarn/logs

      

      

       yarn.nodemanager.remote-app-log-dir

       /tmp/logs

      

       yarn.nodemanager.remote-app-log-dir-suffix

       logs

      

        dfs.datanode.max.xcievers

        4096

     

        yarn.log-aggregation-enable

         true

               

 

 

       yarn.resourcemanager.webapp.address

        slave2:8088

             

       yarn.resourcemanager.admin.address

       slave2:8033

         

        yarn.resourcemanager.address

        slave2:8032

             

        yarn.resourcemanager.scheduler.address

        slave2:8030

             

       yarn.resourcemanager.resource-tracker.address

      slave2:8031

      yarn.resourcemanager.scheduler.class     org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler

        

     yarn.scheduler.minimum-allocation-mb

     512

        

     yarn.scheduler.maximum-allocation-mb

     4096

        

       yarn.nodemanager.aux-services

       mapreduce_shuffle

       yarn.nodemanager.aux-services.mapreduce.shuffle.class

       org.apache.hadoop.mapred.ShuffleHandler

       yarn.nodemanager.resource.memory-mb

       8192

      

       yarn.nodemanager.vmem-pmem-ratio

       2.1

      

       yarn.nodemanager.log.retain-seconds

       10800

      

       yarn.log-aggregation.retain-seconds

       -1

      

       yarn.log-aggregation.retain-check-interval-seconds

       -1

      

8)编辑mapred-site.xml

         mapreduce.framework.name   

         yarn

               

       mapreduce.map.memory.mb

       1536

      

       mapreduce.map.java.opts

       -Xmx1024M

      

       mapreduce.reduce.memory.mb

       3072

      

       mapreduce.reduce.java.opts

       -Xmx2560M

      

       mapreduce.task.io.sort.mb

       512

      

       mapreduce.task.io.sort.factor

       100

      

       mapreduce.reduce.shuffle.parallelcopies

       50

      

   

 

        mapreduce.jobhistory.address   

         slave1:10020                       

        mapreduce.jobhistory.webapp.address

        slave1:19888 

               

        mapreduce.jobhistory.intermediate-done-dir

        /mr-history/tmp  

                                          

        mapreduce.jobhistory.done-dir

        /mr-history/done

                   

 

        mapreduce.shuffle.port

        13562

D)分发软件包到所有hadoop节点上

E)测试:

Step1:初始化:(all代表master1,slave1,slave2)

1)首先必须启动三台机子上的zookeeper

(all)zkServer.sh start

2)初始化zoookeeper(格式化znode)(在nn1,nn2中任意一台上进行都可以)

(nn1)bin /hdfs zkfc-formatZK

3)启动三台机子上的journalnode,

(all)hadoop-daemon.shstart journalnode

 4)(在其中一个namenode上,这里选择nn1)

格式化 NameNode

(nn1)bin/ hadoopnamenode –format

然后在master1上启动namenode

(nn2)sbin/ hadoop-daemon.sh start namenode

5)然后复制该 NameNode的dfs.namenode.name.dir 目录的数据到另外一个 NameNode的同一目录中(可以scp)。(让 NN2 从 NN1 上拉取最新的 FSimage:)

(nn2)  bin/hdfsnamenode -bootstrapStandby [-force | -nonInteractive]  

启动nn2上的namenode

(nn2)sbin/hadoop-daemon.sh startnamenode

6)此刻nn1,nn2都是standby,可以去查看

master1:50070,和slave1:50070,

启动zkfc服务

  (nn1)sbin/hadoop-daemon.shstart zkfc

(nn2)sbin/hadoop-daemon.shstart zkfc

7)人工切换(在nn1和nn2任意一台上进行)

bin /hdfshaadmin -failover  nn1   nn2

                   (standby)  (active)

8)在avtive上kill 掉namenode

刷新原先standbynamenode的网页,standby—active。

Step2:正常启动HA

(all)zkServer.sh start

(nn1或nn2):sbin/start-dfs.sh

启动YARN:(单独配置到master3上)

启动yarn(RM所在机器上,master3)

start-yarn.sh

访问slave2:8088查看RM

 启动JobHistoryServer(JHS所在机器上,master4)

mr-jobhistory-daemon.sh start historyserver

访问slave1:19888查看JHS

6.hbase集群修改

a)       前提:Hadoop集群,zookeeper集群已搭建好。

下载hbase-0.96.0-hadoop2-bin.tar.gz

部署目录:/opt/hbase-0.96.0-hadoop2

b)       Hbase节点上创建目录:

mkdir –p/data/hbase/logs

mkdir –p/data/hbase/tmp_hbase

c)        环境变量:

vi/etc/profile.d/java.sh  (编辑文件)

#set HBase environment

export HBASE_HOME=/opt/hbase-0.96.0-hadoop2

export PATH=$PATH:$HBASE_HOME/bin

export HBASE_HEAPSIZE=4096

d)       修改最大文件句柄限制

HBase是数据库,会在同一时间使用很多的文件句柄。大多数linux系统使用的默认值1024是不能满足的,

(ubuntu为例)

gedit  /etc/security/limits.conf

hadoop  -nofile  32768

 hadoop  soft/hard nproc 32000

gedit  /etc/pam.d/ common-session

session required  pam_limits.so

e)        替换包

在分布式模式下,Hadoop版本必须和HBase下的版本一致。用你运行的分布式Hadoop版本jar文件替换HBaselib目录下的Hadoop jar文件,以避免版本不匹配问题。确认替换了集群中所有HBase下的jar文件。Hadoop版本不匹配问题有不同表现,但看起来都像挂掉了。(habse-0.96.0+hadoop-2.2.0,其中替换17个jar包

1)启用backupHMaster

在conf下新建文件backup-masters,写入作为backupmaster的主机名,这里设定master1为主master,slave1为backup

slave1

2)修改/conf/regionservers文件,增加regionserver

master1

slave1

slave2

3)修改 /conf/hbase-env.sh

export  JAVA_HOME=/opt/jdk1.7.0_45

export HBASE_CLASSPATH=/opt/hadoop-2.2.0/etc/hadoop

export HBASE_MANAGES_ZK=false 

export HBASE_LOG_DIR=/data/hbase/logs

4)修改./conf/hbase-site.xml文件

hbase.master

master1:60000

hbase.rootdir

hdfs://mycluster/hbase

hbase.cluster.distributed

true

hbase.zookeeper.property.clientPort

2181

hbase.zookeeper.quorum

master1,slave1,slave2

The directory shared by regionservers.

 

    zookeeper.session.timeout 

    60000 

 

hbase.tmp.dir

/data/hbase/tmp_hbase

hbase.regionserver.restart.on.zk.expire

true

hbase.regionserver.handler.count

10

处理用户请求的线程数量,默认10

 

5)分发软件包到所有hbase节点上

6)启动hbase

master1

  bin/start-hbase.sh

  访问  http://master1:60010

       slave1:60010

测试:bin/hbase shell

 

你可能感兴趣的:(hadoop集群)