===> 环境架构部署规划:

bigdata1 NameNode ResourceManager Zookeeper JournalNode failOverController

bigdata2 NameNode ResourceManager Zookeeper JournalNode failOverController

bigdata3 DataNode NodeManager Zookeeper

bigdata4 DataNode NodeManager



===> 准备环境:

(*)清除之前的配置

(*)安装JDK、修改/etc/hosts文件、关闭防火墙、免密码登录

###############################################################################

hdfs-site.xml

 

    dfs.nameservices

    mycluster

 

  

 

 

    dfs.ha.namenodes.mycluster

    nn1,nn2

 

  

 

 

    dfs.namenode.rpc-address.mycluster.nn1

    bigdata1:8020

 

 

    dfs.namenode.rpc-address.mycluster.nn2

    bigdata2:8020

 

  

 

 

    dfs.namenode.http-address.mycluster.nn1

    bigdata1:50070

 

 

    dfs.namenode.http-address.mycluster.nn2

    bigdata2:50070

 

  

   

 

    dfs.namenode.shared.edits.dir

    qjournal://bigdata1:8485;bigdata2:8485/mycluster

 


 

 

    dfs.client.failover.proxy.provider.mycluster

    org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

 


 

 

    dfs.ha.fencing.methods

   

    sshfence

    shell(/bin/true)

   

 


 

 

    dfs.ha.fencing.ssh.private-key-files

    /root/.ssh/id_rsa

 


 

 

    dfs.ha.fencing.ssh.connect-timeout

    30000

 

  

   

    dfs.journalnode.edits.dir

    /data/journal

 


 

 

      dfs.ha.automatic-failover.enabled

     true

 

  

###############################################################################

core-site.xml

  hadoop.tmp.dir

  /data/app/hadoop-2.7.1/tmp/


  fs.defaultFS

  hdfs://mycluster


  ha.zookeeper.quorum

  bigdata1,bigdata2,bigdata3


###############################################################################

 mapred-site.xml

  mapreduce.framework.name

  yarn

###############################################################################

yarn-site.xml

  yarn.resourcemanager.ha.enabled

  true


  yarn.resourcemanager.cluster-id

  yrc


  yarn.resourcemanager.ha.rm-ids

  rm1,rm2


  yarn.resourcemanager.hostname.rm1

  bigdata1

  yarn.resourcemanager.hostname.rm2

  bigdata2


  yarn.resourcemanager.zk-address

  bigdata1:2181,bigdata2:2181,bigdata3:2181

  yarn.nodemanager.aux-services

  mapreduce_shuffle


###############################################################################

slaves

bigdata3

bigdata4

###############################################################################

===> 将配置好的安装文件拷贝到其它几台主机上

scp  -r hadoop-2.7.1  bigdata2:/data/app

scp  -r hadoop-2.7.1  bigdata3:/data/app

scp  -r hadoop-2.7.1  bigdata4:/data/app


===> 启动journalnode:

hadoop-daemon.sh start journalnode


===> 格式化NameNode

注意,这里需要创建core-site.xml 文件中 hadoop.tmp.dir 所指定的目录,否则会报错,

此配置文件指定的目录为/data/app/hadoop-2.7.1/tmp/,因此需要先创建目录

mkdir  /data/app/hadoop-2.7.1/tmp/


格式化NameNode

hdfs namenode -format

===> 将 tmp 目录下的 dfs 目录拷贝到 bigdata2 中相同的目录下

scp -r /data/app/hadoop-2.7.1/tmp/dfs  bigdata2:/data/app/hadoop-2.7.1/tmp

===> 格式化 zookeeper(bigdata1):

需启动 zookeeper 才能执行成功,否则会提示:WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

java.net.ConnectException: 拒绝连接

zkServer.sh  start  (在 bigdata1,bigdata2,bigdata3上启动,即,zookeeper 集群所在的机器)

hdfs zkfc -formatZK

===> 至此,环境部署完毕,启动整个集群环境:

1. 启动 zookeeper(bigdata1,bigdata2,bigdata3):

(若不先启动zookeeper,namenode会全部为standby 状态) 

zkServer.sh  start

2. 启动 hdfs 集群:

start-all.sh (在bigdata1上启动)

yarn-daemon.sh  start  resourcemanager   (在 bigdata2 上启动)

===> 各主机执行 jps 状态:

##############################################################

[root@bigdata1 app]# jps

22224 JournalNode

22400 DFSZKFailoverController

22786 Jps

22019 NameNode

21405 QuorumPeerMain

22493 ResourceManager

##############################################################

[root@bigdata2 app]# jps

9408 QuorumPeerMain

9747 DFSZKFailoverController

9637 JournalNode

9929 Jps

9850 ResourceManager

9565 NameNode

##############################################################

[root@bigdata3 app]# jps

7664 DataNode

7531 QuorumPeerMain

7900 Jps

7775 NodeManager

##############################################################

[root@bigdata4 ~]# jps

7698 NodeManager

7587 DataNode

7823 Jps

##############################################################

测试:访问 50070 端口网页,其中有显示namenode 的状态信息(active/ standby)

可以kill 掉 activ 机器的 NameNode 进程,然后查看另一台 NameNode 的状态信息