部署说明:

Hadoop HA和Hadoop集群不是一回事,Hadoop集群是由HDFS集群和Yarn(MapReduce)集群组成,是一个分布式存储HDFS和分布式计算框架集群构成,集群中Datanode和Nodemanager节点可以无限扩展。但是Hadoop集群必须依赖元数据节点Namenode和Yarn资源调度Resourcemanager节点,目前默认情况下这两类节点为单机运行,一旦元数据节点Namenode出现故障,将导致HDFS集群不可用,Resourcemanager节点出现故障,将导致MapReduce任务无法提交,导致分布式计算无法完成。

因此,基于上述原因,可以通过搭建Hadoop Namenode节点和Resourcemanager节点的备用节点来保障Hadoop集群的高可用,一旦Active的Namenode节点故障,Standby Namenode会快速无缝接管Namenode,同样如果Active的Resourcemanager节点出现故障,Standby的resourcemanager也可以快速接管Resourcemanager提供计算服务的资源调度。这就是我们本实验中涉及到的Hadoop HA,而非Hadoop集群。


一、安装规划

IP地址

安装软件

运行进程

说明

192.168.1.31

JDK、Hadoop

Namenode

HDSF NameNode (Active)

192.168.1.32

JDK、Hadoop

Namenode

HDSF NameNode (Standby)

192.168.1.41

JDK、Hadoop

ResourceManager(yarn)

Yarn(Active)

192.168.1.42

JDK、Hadoop

ResourceManager(yarn)

Yarn(Standby)

192.168.1.51

JDK、Hadoop

Datanode、Nodemanager(yarn)

HDSF DataNode

192.168.1.52

JDK、Hadoop

Datanode、Nodemanager(yarn)

HDSF DataNode

192.168.1.53

JDK、Hadoop

Datanode、Nodemanager(yarn)

HDSF DataNode

192.168.1.61

Zookeeper

JournalNode、QuorumPeerMain

存放edit的日志管理系统

192.168.1.62

Zookeeper

JournalNode、QuorumPeerMain

存放edit的日志管理系统

192.168.1.63

Zookeeper

JournalNode、QuorumPeerMain

存放edit的日志管理系统

192.168.1.71

JDK、Hadoop

DFSZKFailoverController(zkfc)

Namenode状态监测控制器

192.168.1.72

JDK、Hadoop

DFSZKFailoverController(zkfc)

Namenode状态监测控制器

二、操作系统安装

操作系统版本:CentOS 6.4 X86_64

安装过程:略

三、JDK安装配置

安装版本:jdk1.7.0_65 32位

安装过程:

[root@hadoop-server01 bin]# mkdir -p /usr/local/apps

[root@hadoop-server01 bin]# ll /usr/local/apps/

total 4

drwxr-xr-x. 8 uucp 143 4096 Apr 10  2015 jdk1.7.0_80

[root@hadoop-server01 bin]# pwd

/usr/local/apps/jdk1.7.0_80/bin

 

[root@hadoop-server01 bin]#vi /etc/profile

export JAVA_HOME=/usr/local/apps/jdk1.7.0_80

export PATH=$PATH:$JAVA_HOME/bin

[root@hadoop-server01 bin]# source /etc/profile

四、基础环境配置

1、修改所有主机的/etc/hosts文件

192.168.1.31    hadoop-namenode01

192.168.1.32    hadoop-namenode02

192.168.1.41    hadoop-resourcemanager01

192.168.1.42    hadoop-resourcemanager02

192.168.1.51    hadoop-datanode01

192.168.1.52    hadoop-datanode02

192.168.1.53    hadoop-datanode03

192.168.1.61    hadoop-zknode01

192.168.1.62    hadoop-zknode02

192.168.1.63    hadoop-zknode03

192.168.1.71    hadoop-zkfcnode01

192.168.1.72    hadoop-zkfcnode02

2、关闭防火墙

service iptabels stop

chkconfig iptablels off

五、配置免密登录

1、要求所有的namenode及yarnnode能免密登录到所有服务器

2、配置示例

[root@hadoop-namenode02 apps]# ssh-keygen

[root@hadoop-namenode01 apps]# ssh-copy-id hadoop-namenode02

[root@hadoop-namenode01 apps]# ssh hadoop-namenode02

六、集群安装部署

1、部署zookeeper集群

(1)部署节点

192.168.1.61

192.168.1.62

192.168.1.63

(2)解压安装包并修改配置文件

[root@hadoop-zknode01 ~]# tar -xvf zookeeper-3.4.5.tar.gz -C /usr/local/apps/

[root@hadoop-zknode01 conf]# cd /usr/local/apps/zookeeper-3.4.5/conf

[root@hadoop-zknode01 conf]# mv zoo_sample.cfg zoo.cfg

[root@hadoop-zknode01 conf]# vi zoo.cfg

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

dataDir=/usr/local/apps/zookeeper-3.4.5/data

# the port at which the clients will connect

clientPort=2181

#

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

#

# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

#

# The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

#autopurge.purgeInterval=1

server.1=hadoop-zknode01:2888:3888

server.2=hadoop-zknode02:2888:3888

server.3=hadoop-zknode03:2888:3888

(3)创建myid文件

[root@hadoop-zknode01 data]# cd /usr/local/apps/zookeeper-3.4.5/data

[root@hadoop-zknode01 data]# echo 1 > myid

(4)拷贝zk到其它节点并修改myid

[root@hadoop-zknode01 apps]# scp -r zookeeper-3.4.5/ hadoop-zknode02:/usr/local/apps/zookeeper-3.4.5/

[root@hadoop-zknode02 data]# echo 2 > myid

[root@hadoop-zknode01 apps]# scp -r zookeeper-3.4.5/ hadoop-zknode03:/usr/local/apps/zookeeper-3.4.5/

[root@hadoop-zknode03 data]# echo 3 > myid

(5)启动zk

[root@hadoop-zknode01 apps]# cd zookeeper-3.4.5/bin/

[root@hadoop-zknode01 bin]# ./zkServer.sh  start

JMX enabled by default

Using config: /usr/local/apps/zookeeper-3.4.5/bin/../conf/zoo.cfg

Starting zookeeper ... STARTED

另外两个节点也启动zk


2、部署Hadoop集群

(1)安装节点

192.168.1.31

192.168.1.32

192.168.1.41

192.168.1.42

192.168.1.51

192.168.1.52

192.168.1.53

(2)解压安装包

[root@hadoop-namenode01 hadoop]# tar -xvf hadoop-2.4.1.tar.gz -C /usr/local/apps/

(3)修改hadoop-env.sh

cd /usr/local/apps/hadoop-2.4.1/etc/hadoop

# The java implementation to use.

export JAVA_HOME=/usr/local/apps/jdk1.7.0_65

# The jsvc implementation to use. Jsvc is required to run secure datanodes.

#export JSVC_HOME=${JSVC_HOME}

(4)修改core-site.xml

cd /usr/local/apps/hadoop-2.4.1/etc/hadoop

fs.defaultFS

hdfs://ns1/

hadoop.tmp.dir

/usr/local/apps/hadoop-2.4.1/tmp

ha.zookeeper.quorum

hadoop-zknode01:2181,hadoop-zknode02:2181,hadoop-zknode03:2181

(5)修改hdfs-site.xml

dfs.nameservices

ns1

dfs.ha.namenodes.ns1

nn1,nn2

dfs.namenode.rpc-address.ns1.nn1

hadoop-namenode01:9000

dfs.namenode.http-address.ns1.nn1

hadoop-namenode01:50070

dfs.namenode.rpc-address.ns1.nn2

hadoop-namenode02:9000

dfs.namenode.http-address.ns1.nn2

hadoop-namenode02:50070

dfs.namenode.shared.edits.dir

qjournal://hadoop-zknode01:8485;hadoop-zknode02:8485;hadoop-zknode03:8485/ns1

dfs.journalnode.edits.dir

/home/hadoop/app/hadoop-2.4.1/journaldata

dfs.ha.automatic-failover.enabled

true

dfs.client.failover.proxy.provider.ns1

org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

dfs.ha.fencing.methods

sshfence

shell(/bin/true)

dfs.ha.fencing.ssh.private-key-files

/root/.ssh/id_rsa

dfs.ha.fencing.ssh.connect-timeout

30000

(6)修改mapred-site.xml

[root@hadoop-namenode01 hadoop]# mv mapred-site.xml.template mapred-site.xml

mapreduce.framework.name

yarn

(7)修改yarn-site.xml

yarn.resourcemanager.ha.enabled

true

yarn.resourcemanager.cluster-id

yrc

yarn.resourcemanager.ha.rm-ids

rm1,rm2

yarn.resourcemanager.hostname.rm1

hadoop-resourcemanager01

yarn.resourcemanager.hostname.rm2

hadoop-resourcemanager02

yarn.resourcemanager.zk-address

hadoop-zknode01:2181,hadoop-zknode02:2181,hadoop-zknode03:2181

yarn.resourcemanager.aux-services

mapreduce_shuffle

(8)修改slaves文件【namenode和yarn节点都要改,datanode和nodemanager都在相同服务器】

[root@hadoop-namenode01 hadoop]# vi slaves

hadoop-datanode01

hadoop-datanode02

hadoop-datanode03

(9)namenode01上hadoop分发到所有服务器

hadoop-namenode02

hadoop-resourcemanager01

hadoop-resourcemanager02

hadoop-datanode01

hadoop-datanode02

hadoop-datanode03

hadoop-zknode01

hadoop-zknode02

hadoop-zknode03

scp -r hadoop-2.4.1/ hadoop-resourcemanager01:/usr/local/apps/

.......

其它节点类似

(10)格式化HDFS

步骤1:启动zk【hadoop-zknode01,hadoop-zknode02,hadoop-zknode03】

确保zk已经启动,这里略

步骤2:启动journalnode【hadoop-zknode01,hadoop-zknode02,hadoop-zknode03】

[root@hadoop-zknode01 sbin]# cd /usr/local/apps/hadoop-2.4.1/sbin

[root@hadoop-zknode01 sbin]# ./hadoop-daemon.sh start journalnode

[root@hadoop-zknode02 sbin]# ./hadoop-daemon.sh start journalnode

[root@hadoop-zknode03 sbin]# ./hadoop-daemon.sh start journalnode

[root@hadoop-zknode03 sbin]# jps

4209 JournalNode

2997 QuorumPeerMain

步骤3:格式化HDFS

--在hadoop-namenode01上格式化

[root@hadoop-namenode01 bin]# cd /usr/local/apps/hadoop-2.4.1/bin

[root@hadoop-namenode01 bin]# ./hdfs namenode -format

--将hadoop-namenode01上的格式化数据拷贝到第二个namenode节点

[root@hadoop-namenode01 hadoop-2.4.1]# scp -r tmp/ hadoop-namenode02:/usr/local/apps/hadoop-2.4.1/

[root@hadoop-namenode01 current]# pwd

/usr/local/apps/hadoop-2.4.1/tmp/dfs/name/current

[root@hadoop-namenode01 current]# ll

total 16

-rw-r--r--. 1 root root 351 Jul  4 06:01 fsimage_0000000000000000000

-rw-r--r--. 1 root root  62 Jul  4 06:01 fsimage_0000000000000000000.md5

-rw-r--r--. 1 root root   2 Jul  4 06:01 seen_txid

-rw-r--r--. 1 root root 204 Jul  4 06:01 VERSION

步骤4:格式化ZKFC【只需要在任何一台naenode执行即可】

[root@hadoop-namenode01 bin]# pwd

/usr/local/apps/hadoop-2.4.1/bin

[root@hadoop-namenode01 bin]# ./hdfs zkfc -formatZK

18/07/04 06:12:16 INFO ha.ActiveStandbyElector: Session connected.

18/07/04 06:12:16 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/ns1 in ZK.

--代表成功在zk里面创建了目录

18/07/04 06:12:16 INFO zookeeper.ZooKeeper: Session: 0x364639c799c0000 closed

18/07/04 06:12:16 INFO zookeeper.ClientCnxn: EventThread shut down


未完待续.....