如题,搭建hadoop的HA集群。
hadoop的ha(High availability) 架构解决了hadoop中namenode可能存在的单点故障问题,两个namenode组成一个联邦,一个为active,另一个为standby状态。yarn集群的HA,则是需要两台resourcemanager机器,一个active一个standby。
7台虚拟机:
hadoopNode01 192.168.9.11 namenode 、 zkfc 内存 1.2G
hadoopNode02 192.168.9.12 namenode 、 zkfc 内存1.2G
hadoopNode03 192.168.9.13 resourcemanager 内存 1G
hadoopNode04 192.168.9.14 resourcemanager 内存1G
hadoopNode05 192.168.9.15 datanode 、nodemanager 、journalnode、zookeeper 内存1.1G
hadoopNode06 192.168.9.16 datanode 、nodemanager 、journalnode 、zookeeper 内存1.1G
hadoopNode07 192.168.9.17 datanode 、nodemanager 、journalnode、zookeeper 内存1.1G
分别在hadoopNode05 、hadoopNode06、hadoopNode07上安装zookeeper。
zookeeper集群是HA的基础,任何其他的集群协调机制(如hdfs,yarn等集群)都是需要依靠zookeeper来实现的。
hadoopNode05上:
cd installpkg;
tar zxvf zookeeper-3.4.12.tar.gz -C ../app/
cd /home/hadoop/app/zookeeper-3.4.12/conf
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg 内容如下
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
#dataDir=/tmp/zookeeper
dataDir=/home/hadoop/app/zookeeper-3.4.12/data
dataLogDir=/home/hadoop/app/zookeeper-3.4.12/dataLog
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=hadoopNode05:2888:3888
server.2=hadoopNode06:2888:3888
server.3=hadoopNode07:2888:3888
cd /home/hadoop/app/zookeeper-3.4.12
hadoopNode06、 hadoopNode07上zookeeper安装与此类似(只是myid文件中的保存的id不同而已):
hadoopNode06 上 echo 2>myid
hadoopNode07上 echo 3>myid
在hadoopNode01、hadoopNode02上安装hadoop
cd installpkg
tar -zxvf hadoop-2.7.6.tar.gz -C ../app/
cd /home/hadoop/app/hadoop-2.7.6/etc/hadoop 修改配置文件
(1) 修改hadoop-env.sh ,手动指定 JAVA_HOME变量值
vi hadoop-env.sh 修改如下
export JAVA_HOME=/home/hadoop/app/jdk1.7.0_80
(2)修改core-site.xml 配置
vi core-site.xml 修改如下
fs.defaultFS
hdfs://ns1/
hadoop.tmp.dir
/home/hadoop/app/hadoop-2.7.6/tmp
ha.zookeeper.quorum
hadoopNode05:2181,hadoopNode06:2181,hadoopNode07:2181
(3) 修改hdfs-site.xml 配置
vi hdfs-site.xml 修改如下:
dfs.replication
3
dfs.nameservices
ns1
dfs.ha.namenodes.ns1
nn1,nn2
dfs.namenode.rpc-address.ns1.nn1
hadoopNode01:9000
dfs.namenode.http-address.ns1.nn1
hadoopNode01:50070
dfs.namenode.rpc-address.ns1.nn2
hadoopNode02:9000
dfs.namenode.http-address.ns1.nn2
hadoopNode02:50070
dfs.namenode.shared.edits.dir
qjournal://hadoopNode05:8485;hadoopNode06:8485;hadoopNode07:8485/ns1
dfs.journalnode.edits.dir
/home/hadoop/app/hadoop-2.7.6/journaldata
dfs.ha.automatic-failover.enabled
true
dfs.client.failover.proxy.provider.ns1
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.fencing.methods
sshfence
shell(/bin/true)
dfs.ha.fencing.ssh.private-key-files
/home/hadoop/.ssh/id_rsa
dfs.ha.fencing.ssh.connect-timeout
30000
heartbeat.recheck.interval
2000
dfs.heartbeat.interval
1
dfs.blockreport.intervalMsec
10000
(4)修改mapred-site.xml配置
vi mapred-site.xml 修改如下
mapreduce.framework.name
yarn
(5)修改yarn-site.xml配置
vi yarn-site.xml 修改如下:
yarn.resourcemanager.ha.enabled
true
yarn.resourcemanager.cluster-id
yrc
yarn.resourcemanager.ha.rm-ids
rm1,rm2
yarn.resourcemanager.hostname.rm1
hadoopNode03
yarn.resourcemanager.hostname.rm2
hadoopNode04
yarn.resourcemanager.zk-address
hadoopNode05:2181,hadoopNode06:2181,hadoopNode07:2181
yarn.nodemanager.aux-services
mapreduce_shuffle
(6) 修改slave 配置 (对于hdfs从节点为datanode,对于yarn从节点为nodemanager)
vi slave 修改如下:
hadoopNode05
hadoopNode06
hadoopNode07
(7) 配置7台机器之间相互ssh免密
分别在7台机器上执行:
ssh-keygen -t rsa
ssh-copy-id hadoopNode01
ssh-copy-id hadoopNode02
ssh-copy-id hadoopNode03
ssh-copy-id hadoopNode04
ssh-copy-id hadoopNode05
ssh-copy-id hadoopNode06
ssh-copy-id hadoopNode07
(8) scp 远程复制及其他配置
cd app/hadoop-2.7.6/
mkdir tmp
hadoop安装完成后,复制hadoopNode01上hadoop-2.7.6到其他6台机器上.
scp -r hadoop-2.7.6 hadoopNode02:/home/hadoop/app/
scp -r hadoop-2.7.6 hadoopNode03:/home/hadoop/app/
scp -r hadoop-2.7.6 hadoopNode04:/home/hadoop/app/
scp -r hadoop-2.7.6 hadoopNode05:/home/hadoop/app/
scp -r hadoop-2.7.6 hadoopNode06:/home/hadoop/app/
scp -r hadoop-2.7.6 hadoopNode07:/home/hadoop/app/
7台机器分别配置环境变量
su root
vi /etc/profile
JAVA_HOME=/home/hadoop/app/jdk1.7.0_80
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
HADOOP_HOME=/home/hadoop/app/hadoop-2.7.6
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME
export HADOOP_HOME
export CLASSPATH
export PATH
source /etc/profile
exit
分别在hadoopNode05、hadoopNode06、hadoopNode07上启动zookeeper :
cd app/zookeeper-3.4.12/bin/
./zkServer.sh start
分别在hadoopNode05、hadoopNode06、hadoopNode07上启动journalnode:
hadoop-daemon.sh start journalnode
在hadoopNode01上执行:
hdfs namenode -format
复制本地格式化的dfs到另一台namenode上。
scp -r tmp/dfs hadoopNode02:/home/hadoop/app/hadoop-2.7.4/tmp/
在 hadoopNode01或hadoopNode02 上任意一台上执行:
hdfs zkfc -formatZK
在 hadoopNode01或hadoopNode02 上任意一台上执行:
start-dfs.sh
在 hadoopNode03(或hadoopNode04)上执行:
start-yarn.sh
在 hadoopNode04(或hadoopNode03)上执行:
yarn-daemon.sh start resourcemanager
HDFS集群
http://192.168.9.11:50070
http://192.168.9.12:50070
YARN集群
http://192.168.9.13:8088/cluster
http://192.168.9.14:8088/cluster