在Hadoop2.0之前,只有一个NameNode,若NameNode机器出现故障,那么整个集群都无法使用。这个架构存在单点故障的隐患。之后推出了HA的架构,即有两个NameNode,一台为active状态,一台为standby状态。active NameNode对外提供服务,standby实时同步了active NameNode的元数据,当active NameNode节点出现故障,standby NameNode节点可立即切换为active状态并对外提供服务。所以,在实际生产中一般采用HA架构。这里用三台机器测试搭建Hadoop高可用集群。
CentOS release 6.8 (Final) 64-bit
(注:lsb_release -a 命令查看操作系统版本,file /bin/ls 命令查看操作系统位数)
JDK 1.8.0_45
hadoop-2.6.0-cdh5.7.0
zookeeper3.4.6.tar.gz
Host | 安装软件 | 进程 |
---|---|---|
hadoop001 | hadoop、zookeeper | NameNode、DFSZKFailoverController、JournalNode、DataNode 、 ResourceManager 、JobHistoryServer、NodeManager 、QuorumPeerMain |
hadoop002 | hadoop、zookeeper | NameNode 、DFSZKFailoverController、JournalNode 、DataNode 、ResourceManager 、NodeManager 、QuorumPeerMain |
hadoop003 | hadoop、zookeeper | JournalNode 、DataNode 、NodeManager、QuorumPeerMain |
名称 | 路径 | 备注 |
---|---|---|
$HADOOP_HOME | /home/hadoop/app/hadoop-2.6.0-cdh5.7.0 | |
Data | $HADOOP_HOME/data | |
Log | $HADOOP_HOME/logs | |
hadoop.tmp.dir | $HADOOP_HOME/tmp | 需要手工创建,权限 777 |
$ZOOKEEPER_HOME | /home/hadoop/app/zookeeper3.4.6 |
1、设置ip地址(3台)
[root@hadoop001 ~]# vi /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
HWADDR=00:0C:29:60:E8:D2
TYPE=Ethernet
UUID=055d1cdb-65d4-406e-b797-f00342d412f7
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO="static"
IPADDR=192.168.137.130
NETMASK=255.255.255.0
GATEWAY=192.168.137.2
DNS1=10.64.0.10
查看ip命令:hostname -i
根据情况修改IPADDR,BOOTPROTO修改为static,ONBOOT修改为yes,修改为后执行
service network restart
然后执行
ifconfig
查看是否成功。
2、关闭防火墙(3台)
执行命令
service iptables stop
然后执行
service iptables status
验证是否关闭
3、关闭防火墙开机自动启动(3台)
执行命令
chkconfig iptables off
然后查看是否生效
chkconfig --list | grep iptables
显示off即为成功关闭。
4、设置主机名(3台)
查看主机名命令
hostname
修改主机名
vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop001
5、修改hosts文件,将ip和hostname绑定(3台)
[root@hadoop001 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.137.130 hadoop001
192.168.137.131 hadoop002
192.168.137.132 hadoop003
修改完成后,ping hadoop001测试是否能够ping通。如能通说明host已经生效。
6、设置3台机器SSH互相通信
这里我用hadoop用户进行操作,之后搭建hadoop集群也用此用户。
先操作hadoop001机器。
进程hadoop用户的家目录
ll -a
即可显示隐藏目录.ssh。先删除这个文件
rm -rf .ssh
然后执行
ssh-keygen
一直按回车键
[hadoop@hadoop001 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/hadoop/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /hadoop/.ssh/id_rsa.
Your public key has been saved in /hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
74:78:50:05:7e:c8:bb:2a:f1:45:c4:0a:9c:38:90:dc hadoop@hadoop001
The key's randomart image is:
+--[ RSA 2048]----+
| ..+ o ..ooo. |
| o E + =o. |
| . .oo* . |
| ..o.o |
| S.. |
| . .. |
| o .. |
| . .. |
| .. |
+-----------------+
待生成.ssh目录后,进入.ssh目录,查看里面的文件
-rw------- 1 hadoop hadoop 668 Sep 27 10:28 id_dsa
-rw-r--r-- 1 hadoop hadoop 606 Sep 27 10:28 id_dsa.pub
然后执行
cat id_rsa.pub >> authorized_keys
将hadoop001的公钥写进authorized_keys中。
然后在hadoop002,hadoop003执行生成.ssh目录操作,将.ssh目录下的
id_dsa.pub文件用scp命令传输给hadoop001。例hadoop002
scp id_dsa.pub hadoop@hadoop001:/home/hadoop/.ssh/id_dsa.pub2
在传输的时候,我们将hadoop002机器上的id_dsa.pub重命名为id_dsa.pub2,以便区分。
以上操作做完后,再执行
cat id_rsa.pub2 >> authorized_keys
cat id_rsa.pub3 >> authorized_keys
将hadoop002,hadoop003的公钥写进authorized_keys中。然后再将authorized_keys用scp命令分发给hadoop002,hadoop003。例如发给hadoop002
scp authorized_keys hadoop@hadoop002:/home/hadoop/.ssh/
分发完成后,在hadoop001机器上执行ssh hadoop002看是否能够免密码登录到hadoop002机器上。如果登录成功,说明SSH配置成功。最好在几台机器上互相登录验证。
7、安装JDK、设置环境变量
这里用root用户,将jdk设置到全局环境中
将jdk包mv到/usr/java下,执行
[root@hadoop001 java]# tar -xzvf jdk-8u45-linux-x64.gz
解压完成后,设置环境变量
vi /etc/profile
添加java环境变量
export JAVA_HOME=/usr/java/jdk1.8.0_45
export PATH=$JAVA_HOME/bin:$PATH
然后执行
source /etc/proflie
使变量生效,然后执行
java -version
查看java是否设置成功。
1、下载解压zookeeper
[hadoop@hadoop001 ~]# cd /home/hadoop/software
[hadoop@hadoop001 ~]# wget https://www.apache.org/dist/zookeeper/zookeeper3.4.6/zookeeper-3.4.6.tar.gz --no-check-certificate
[hadoop@hadoop001 software]# tar -zxvf zookeeper-3.4.6.tar.gz -C /home/hadoop/app/
设置环境变量
因为安装使用hadoop用户,这里在hadoop家目录里修改个人用户环境
vi .bash_profile
export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper-3.4.6
export PATH=$OOKEEPER_HOME/bin:$PATH
然后执行
source .bash_profile
使变量生效
2、修改配置
cd /home/hadoop/app/zookeeper-3.4.6/conf
[hadoop@hadoop001 conf]# ll
total 12
-rw-rw-r--. 1 hadoop hadoop 535 Feb 20 2014 configuration.xsl
-rw-rw-r--. 1 hadoop hadoop 2161 Feb 20 2014 log4j.properties
-rw-rw-r--. 1 hadoop hadoop 922 Feb 20 2014 zoo_sample.cfg
[hadoop@hadoop001 conf]# cp zoo_sample.cfg zoo.cfg
[hadoop@hadoop001 conf]#
[hadoop@hadoop001 conf]#
[hadoop@hadoop001 conf]#
[hadoop@hadoop001 conf]# vi zoo.cfg
修改
dataDir=/home/hadoop/app/zookeeper-3.4.6/data
在最后添加
server.1=hadoop001:2888:3888
server.2=hadoop002:2888:3888
server.3=hadoop003:2888:3888
然后将zoo.cfg分发给hadoop002和hadoop003。
在/home/hadoop/app/zookeeper-3.4.6下,执行
mkdir data
touch data/myid
echo 1 > data/myid
其它两台操作大致相同,不同的地方在于
hadoop002执行
echo 2 > data/myid
hadoop003执行
echo 3 > data.myid
1、解压hadoop
[hadoop@hadoop001 software]$ tar -zxvf hadoop-2.6.0-cdh5.7.0.tar.gz -C /home/hadoop/app/
2、配置环境变量
cd 到hadoop用户家目录
vi .bash_profile
添加
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
export PATH=$HADOOP_HOME/bin:$PATH
3、修改/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop/hadoop-env.sh
添加
export JAVA_HOME="/usr/java/jdk1.8.0_45"
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
4、配置/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop/core-site.xml
fs.defaultFS
hdfs://testha
fs.trash.checkpoint.interval
0
fs.trash.interval
1440
hadoop.tmp.dir
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/tmp
ha.zookeeper.quorum
hadoop001:2181,hadoop002:2181,hadoop003:2181
ha.zookeeper.session-timeout.ms
2000
hadoop.proxyuser.hadoop.hosts
*
hadoop.proxyuser.hadoop.groups
*
io.compression.codecs
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec
5、配置/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop/hdfs-site.xml
dfs.permissions.superusergroup
hadoop
dfs.webhdfs.enabled
true
dfs.namenode.name.dir
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/name
namenode 存放name table(fsimage)本地目录(需要修改)
dfs.namenode.edits.dir
${dfs.namenode.name.dir}
namenode粗放 transaction file(edits)本地目录(需要修改)
dfs.datanode.data.dir
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/data
datanode存放block本地目录(需要修改)
dfs.replication
3
dfs.blocksize
268435456
dfs.nameservices
testha
dfs.ha.namenodes.testha
nn1,nn2
dfs.namenode.rpc-address.testha.nn1
hadoop001:8020
dfs.namenode.rpc-address.testha.nn2
hadoop002:8020
dfs.namenode.http-address.testha.nn1
hadoop001:50070
dfs.namenode.http-address.testha.nn2
hadoop002:50070
dfs.journalnode.http-address
0.0.0.0:8480
dfs.journalnode.rpc-address
0.0.0.0:8485
dfs.namenode.shared.edits.dir
qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/testha
dfs.journalnode.edits.dir
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/jn
dfs.client.failover.proxy.provider.testha
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.fencing.methods
sshfence
dfs.ha.fencing.ssh.private-key-files
/home/hadoop/.ssh/id_rsa
dfs.ha.fencing.ssh.connect-timeout
30000
dfs.ha.automatic-failover.enabled
true
dfs.hosts
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop/slaves
6、配置/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop/yarn-env.sh
添加yarn日志
export YARN_LOG_DIR="/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs"
7、/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop/mapred-site.xml
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
hadoop001:10020
mapreduce.jobhistory.webapp.address
hadoop001:19888
mapreduce.map.output.compress
true
mapreduce.map.output.compress.codec
org.apache.hadoop.io.compress.SnappyCodec
8、修改/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop/slaves
[hadoop@hadoop001 hadoop]# vi slaves
hadoop001
hadoop002
hadoop003
9、创建临时文件夹和分发文件夹
mkdir -p /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/tmp
[hadoop@hadoop001 hadoop]# mkdir -p /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/tmp
[hadoop@hadoop001 hadoop]# chmod -R 777 /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/tmp
[hadoop@hadoop001 etc]# scp -r hadoop hadoop@hadoop002:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc
[hadoop@hadoop001 etc]# scp -r hadoop hadoop@hadoop003:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc
1、启动zookeeper
command: ./zkServer.sh start|stop|status
[hadoop@hadoop001 zookeeper]# $ZOOKEEPER_HOME/bin/zkServer.sh start
[hadoop@hadoop002 zookeeper]# $ZOOKEEPER_HOME/bin/zkServer.sh start
[hadoop@hadoop003 zookeeper]# $ZOOKEEPER_HOME/bin/zkServer.sh start
JMX enabled by default
Using config: /home/hadoop/app/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
[hadoop@hadoop001 zookeeper]#
[hadoop@hadoop002 zookeeper]# $ZOOKEEPER_HOME/bin/zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/app/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader
[hadoop@hadoop002 zookeeper]#
[hadoop@hadoop003 zookeeper]# $ZOOKEEPER_HOME/bin/zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/app/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
[hadoop@hadoop003 zookeeper]#
2、启动 hadoop(HDFS+YARN)
a、格式化前,先在 journalnode 节点机器上先启动 JournalNode 进程
[hadoop@hadoop001 ~]# cd /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/sbin
[hadoop@hadoop001 sbin]# hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-root-journalnode-hadoop001.out
[hadoop@hadoop001 sbin]# jps
4016 Jps
3683 QuorumPeerMain
3981 JournalNode
[hadoop@hadoop001 sbin]#
[hadoop@hadoop002 hadoop]# cd /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/sbin
[hadoop@hadoop002 sbin]# hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-root-journalnode-hadoop002.out
[hadoop@hadoop002 sbin]# jps
9891 Jps
9609 QuorumPeerMain
9852 JournalNode
[hadoop@hadoop002 sbin]#
[hadoop@hadoop003 hadoop]# cd /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/sbin
[hadoop@hadoop003 sbin]# hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-root-journalnode-hadoop003.out
[hadoop@hadoop003 sbin]# jps
4425 JournalNode
4460 Jps
4191 QuorumPeerMain
[hadoop@hadoop003 sbin]#
b、NameNode 格式化
[hadoop@hadoop001 sbin]# cd ../
[hadoop@hadoop001 hadoop]# hadoop namenode -format
……………..
……………..
17/09/02 23:16:50 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1577237506-
192.168.137.130-1504365410166
17/09/02 23:16:50 INFO common.Storage: Storage directory /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/name
has been successfully formatted.
17/09/02 23:16:50 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hadoop/app/hadoop-2.6.0-dh5.7.0/data/dfs/name/current/fsimage.ckpt_0000000000000000000 using no
compression
17/09/02 23:16:50 INFO namenode.FSImageFormatProtobuf: Image file
/home/hadoop/app/hadoop-2.6.0-dh5.7.0/data/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 306 bytes
saved in 0 seconds.
17/09/02 23:16:51 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >=
0
17/09/02 23:16:51 INFO util.ExitUtil: Exiting with status 0
17/09/02 23:16:51 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop001/192.168.137.130
************************************************************/
c、同步 NameNode 元数据
同步 hadoop001 元数据到 hadoop002
主要是:dfs.namenode.name.dir,dfs.namenode.edits.dir 还应该确保共享存储目录下
(dfs.namenode.shared.edits.dir ) 包含 NameNode 所有的元数据。
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]# pwd
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]# scp -r data/ hadoop@hadoop002:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
in_use.lock 100% 14 0.0KB/s 00:00
VERSION 100% 167 0.2KB/s 00:00
seen_txid 100% 2 0.0KB/s 00:00
VERSION 100% 220 0.2KB/s 00:00
fsimage_0000000000000000000.md5 100% 62 0.1KB/s
00:00
fsimage_0000000000000000000 100% 306 0.3KB/s
00:00
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]#
d、初始化 ZFCK
[hadoop@hadoop001 bin]# hdfs zkfc -formatZK
……………..
……………..
17/09/02 23:19:13 INFO ha.ActiveStandbyElector: Session connected.
17/09/02 23:19:13 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.
17/09/02 23:19:13 INFO zookeeper.ZooKeeper: Session: 0x35e42f121f50000 closed
17/09/02 23:19:13 INFO zookeeper.ClientCnxn: EventThread shut down
17/09/02 23:19:13 INFO tools.DFSZKFailoverController: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DFSZKFailoverController at hadoop001/192.168.137.130
************************************************************/
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]#
e、启动 HDFS 分布式存储系统
集群启动,在 hadoop001 执行 start-dfs.sh
集群关闭,在 hadoop001 执行 stop-dfs.sh
#####集群启动############
[hadoop@hadoop001 hadoop]# ./start-dfs.sh
Starting namenodes on [hadoop001 hadoop002]
----------
----------
####单进程启动###########
NameNode(hadoop001, hadoop002):
hadoop-daemon.sh start namenode
DataNode(hadoop001, hadoop002, hadoop003):
hadoop-daemon.sh start datanode
JournamNode(hadoop001, hadoop002, hadoop003):
hadoop-daemon.sh start journalnode
ZKFC(hadoop001, hadoop002):
hadoop-daemon.sh start zkfc
f、验证 namenode,datanode,zkfc是否成功启动
1)在三台机器上jps查看进程是否存在,每台进程参考第二步主机规划
2)进入页面
hadoop001:
http://192.168.137.130:50070/
hadoop002:
http://192.168.137.131:50070/
g、启动 YARN 框架
############集群启动##############
1)hadoop001启动Yarn,所在目录在$HADOOP_HOME/sbin
[hadoop@hadoop001 hadoop]# ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-root-resourcemanagerhadoop001.out
hadoop002: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-rootnodemanager-hadoop002.out
hadoop003: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-rootnodemanager-hadoop003.out
hadoop001: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-rootnodemanager-hadoop001.out
[hadoop@hadoop001 hadoop]#
2)hadoop002 备机启动 RM
[hadoop@hadoop002 ~]# yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-root-resourcemanagerhadoop002.out
[hadoop@hadoop002 ~]#
#####单进程启动######
1) ResourceManager(hadoop001, hadoop002)
yarn-daemon.sh start resourcemanager
2) NodeManager(hadoop001, hadoop002, hadoop003)
yarn-daemon.sh start nodemanager
######关闭#############
[hadoop@hadoop001 sbin]# stop-yarn.sh
#包含 namenode 的 resourcemanager 进程,datanode 的 nodemanager 进程
[hadoop@hadoop002 sbin]# yarn-daemon.sh stop resourcemanager
h、验证 resourcemanager,nodemanager
1)在三台机器上jps查看进程是否成功启动 详细进程参考步骤二
2)进入页面验证
ResourceManger(Active):http://192.168.137.130:8088
ResourceManger(Standby):http://192.168.137.131:8088/cluster/cluster
注意:standby的机器页面访问路径和active有差异
1.关闭 Hadoop(YARN–>HDFS)
[hadoop@hadoop001 sbin]# stop-yarn.sh
[hadoop@hadoop002 sbin]# yarn-daemon.sh stop resourcemanager
[hadoop@hadoop001 sbin]# stop-dfs.sh
2.关闭 Zookeeper
[hadoop@hadoop001 bin]# zkServer.sh stop
[hadoop@hadoop002 bin]# zkServer.sh stop
[hadoop@hadoop003 bin]# zkServer.sh stop
1.启动 Zookeeper
[hadoop@hadoop001 bin]# zkServer.sh start
[hadoop@hadoop002 bin]# zkServer.sh start
[hadoop@hadoop003 bin]# zkServer.sh start
2.启动 Hadoop(HDFS–>YARN)
[hadoop@hadoop001 sbin]# start-dfs.sh
[hadoop@hadoop001 sbin]# start-yarn.sh
[hadoop@hadoop002 sbin]# yarn-daemon.sh start resourcemanager
[hadoop@hadoop001 ~]# $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
HDFS:http://192.168.137.130:50070/
HDFS:http://192.168.137.131:50070/
ResourceManger(Active):http://192.168.137.130:8088
ResourceManger(Standby):http://192.168.137.131:8088/cluster/cluster
JobHistory:http://192.168.137.130:19888/jobhistory
至此,Hadoop HA高可用集群测试搭建完成。