1。刚刚开始搭建看官网,不使用zookeeper,不搭建自动故障转移,以为可以直接配置使用。但是总是会出现这样的情况,使用hdfs dfs -mkdir /lcc 总是把文件建立在本地。最后才知道只要是HA,必须要有zookeeper.
2。建立好zookeeper后,搭建好HA发现还是那个问题。但是使用命令hdfs dfs -mkdir /lcc建立的是本地文件夹,使用hdfs dfs -mkdir hdfs://mycluster/lcc才能建立远程文件夹,因此猜测默认文件访问可能没有修改。
最后查看文档发现
<property>
<name>fs.defaultFSname>
<value>hdfs://myclustervalue>
property>
让我写成了、
<property>
<name>dfs.defaultFSname>
<value>hdfs://myclustervalue>
property>
所以错了。
3.下面开始从新搭建一个完整的HA
首先我们要保证有三台机器,三台机器分别设置如下
4.首先我们要设置好IP与域名的对应关系,保证能联网,IP自己设定
修改主机名
[root@biluos2 ~]# vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=biluos2.com
修改对应关系
[root@biluos2 ~]# vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.10.173 biluos.com biluos
192.168.10.174 biluos1.com biluos1
192.168.10.175 biluos2.com biluos2
5。配置ssh无秘钥登录
biluos.com执行
ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub root@biluos.com
ssh-copy-id -i .ssh/id_rsa.pub root@biluos1.com
ssh-copy-id -i .ssh/id_rsa.pub root@biluos2.com
biluos1.com执行
ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub root@biluos.com
ssh-copy-id -i .ssh/id_rsa.pub root@biluos1.com
ssh-copy-id -i .ssh/id_rsa.pub root@biluos2.com
biluos2.com执行
ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub root@biluos.com
ssh-copy-id -i .ssh/id_rsa.pub root@biluos1.com
ssh-copy-id -i .ssh/id_rsa.pub root@biluos2.com
这里一定要使用域名,不要使用ip,否则会出现莫名其妙的问题。
6。下载zookeeper和hadoop解压到/opt/moudles/目录下,注意下载hadoop是2.7.3建议下载一个高版本的,因为高版本的没有本地库问题。这里没说安装jdk,这个必须安装。
[root@biluos2 ~]# ll /opt/moudles/
total 28
drwxr-xr-x. 10 root root 4096 Jul 31 03:09 hadoop-2.7.3
drwxr-xr-x. 3 root root 4096 Jul 30 03:24 hadoop-2.7.3.data
drwxr-xr-x. 8 root root 4096 Jul 26 08:37 jdk1.8.0_121
drwxr-xr-x. 6 root root 4096 Jul 26 08:46 myhadoopdata
drwxr-xr-x. 4 root root 4096 Jul 30 05:04 myzookeeperdata
drwxr-xr-x. 10 root root 4096 Jul 30 05:27 zookeeper-3.4.6
7。配置环境变量
export JAVA_HOME=/opt/moudles/jdk1.8.0_121
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
export HADOOP_HOME=/opt/moudles/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin
export ZOOKEEPER_HOME=/opt/moudles/zookeeper-3.4.6
export PATH=$PATH:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf
8。配置zookeeper
[root@biluos zookeeper-3.4.6]# vim conf/zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/opt/moudles/myzookeeperdata/data
dataLogDir=/opt/moudles/myzookeeperdata/logs
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
autopurge.snapRetainCount=30
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
autopurge.purgeInterval=24
server.1=biluos.com:2888:3888
server.2=biluos1.com:2888:3888
server.3=biluos2.com:2888:3888
9.上面配置的两个路径dataDir=/opt/moudles/myzookeeperdata/datadataLogDir=/opt/moudles/myzookeeperdata/logs需要自己手动建立,并且在dataDir=/opt/moudles/myzookeeperdata/data这个路径下,建立文件myid,第一个在主机biluos.com里为1,第二个在biluos1.com为2,第三个biluos2.com为3,记住了,一定要正确,不要多字符少,字符。否则会启动不起来。
10。启动,并且验证是否zookeeper是否正常启动,在三台机器每个都执行bin/zkServer.sh start 这个去启动zookeeper,然后验证bin/zkServer.sh status
[root@biluos zookeeper-3.4.6]# bin/zkServer.sh status
JMX enabled by default
Using config: /opt/moudles/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
[root@biluos1 zookeeper-3.4.6]# bin/zkServer.sh status
JMX enabled by default
Using config: /opt/moudles/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader
[root@biluos2 zookeeper-3.4.6]# bin/zkServer.sh status
JMX enabled by default
Using config: /opt/moudles/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
jps命令下出现
[root@biluos ~]# jps
3113 QuorumPeerMain
4862 Jps
[root@biluos1 ~]# jps
3113 QuorumPeerMain
4862 Jps
[root@biluos2 ~]# jps
3113 QuorumPeerMain
4862 Jps
出现了一个领导者,两个跟随者就是成功了。然后不用zookeeper了。
10。配置hadoop环境
hadoop-env.sh
export JAVA_HOME=/opt/moudles/jdk1.8.0_121
配置hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservicesname>
<value>myclustervalue>
property>
<property>
<name>dfs.ha.namenodes.myclustername>
<value>myNameNode1,myNameNode2value>
property>
<property>
<name>dfs.namenode.rpc-address.mycluster.myNameNode1name>
<value>biluos.com:8020value>
property>
<property>
<name>dfs.namenode.servicerpc-address.mycluster.myNameNode1name>
<value>biluos.com:8022value>
property>
<property>
<name>dfs.namenode.http-address.mycluster.myNameNode1name>
<value>biluos.com:50070value>
property>
<property>
<name>dfs.namenode.https-address.mycluster.myNameNode1name>
<value>biluos.com:50470value>
property>
<property>
<name>dfs.namenode.secondary.http-address.mycluster.myNameNode1name>
<value>biluos.com:50090value>
property>
<property>
<name>dfs.namenode.rpc-address.mycluster.myNameNode2name>
<value>biluos1.com:8020value>
property>
<property>
<name>dfs.namenode.servicerpc-address.mycluster.myNameNode2name>
<value>biluos1.com:8022value>
property>
<property>
<name>dfs.namenode.http-address.mycluster.myNameNode2name>
<value>biluos1.com:50070value>
property>
<property>
<name>dfs.namenode.https-address.mycluster.myNameNode2name>
<value>biluos1.com:50470value>
property>
<property>
<name>dfs.namenode.secondary.http-address.mycluster.myNameNode2name>
<value>biluos1.com:50090value>
property>
<property>
<name>dfs.namenode.name.dirname>
<value>/opt/moudles/hadoop-2.7.3.data/ha/data/dfs/namenode/namevalue>
property>
<property>
<name>dfs.namenode.edits.dirname>
<value>/opt/moudles/hadoop-2.7.3.data/ha/data/dfs/namenode/editsvalue>
property>
<property>
<name>dfs.datanode.data.dirname>
<value>/opt/moudles/hadoop-2.7.3.data/ha/data/dfs/dnvalue>
property>
<property>
<name>dfs.datanode.checkpoint.dirname>
<value>/opt/moudles/hadoop-2.7.3.data/ha/data/dfs/secondarynamenode/namevalue>
property>
<property>
<name>dfs.datanode.checkpoint.edits.dirname>
<value>/opt/moudles/hadoop-2.7.3.data/ha/data/dfs/secondarynamenode/editsvalue>
property>
<property>
<name>dfs.namenode.shared.edits.dirname>
<value>qjournal://biluos.com:8485;biluos1.com:8485;biluos2.com:8485/myclustervalue>
property>
<property>
<name>dfs.journalnode.edits.dirname>
<value>/opt/moudles/hadoop-2.7.3.data/ha/data/dfs/jnvalue>
property>
<property>
<name>dfs.client.failover.proxy.provider.myclustername>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue>
property>
<property>
<name>dfs.ha.fencing.methodsname>
<value>sshfencevalue>
property>
<property>
<name>dfs.ha.fencing.ssh.private-key-filesname>
<value>/root/.ssh/id_rsavalue>
property>
<property>
<name>dfs.replicationname>
<value>1value>
property>
<property>
<name>dfs.permissions.enabledname>
<value>falsevalue>
property>
<property>
<name>dfs.ha.automatic-failover.enabledname>
<value>truevalue>
property>
configuration>
注意:上面一些文件夹要在本地自己建立。
mkdir -p /opt/moudles/hadoop-2.7.3.data/ha/data/dfs/namenode/name
mkdir -p /opt/moudles/hadoop-2.7.3.data/ha/data/dfs/namenode/edits
mkdir -p /opt/moudles/hadoop-2.7.3.data/ha/data/dfs/dn
mkdir -p /opt/moudles/hadoop-2.7.3.data/ha/data/dfs/secondarynamenode/name
mkdir -p /opt/moudles/hadoop-2.7.3.data/ha/data/dfs/secondarynamenode/edits
mkdir -p /opt/moudles/hadoop-2.7.3.data/ha/data/dfs/jn
dfs.ha.fencing.methods这个不要写成dfs.ha.fencing.method这个了,我就是写错了,导致集群看起来很正常,但是就是不会自动切换active状态,两个都是standby状态。
配置core-site.xml
<configuration>
<property>
<name>fs.defaultFSname>
<value>hdfs://myclustervalue>
property>
<property>
<name>hadoop.tmp.dirname>
<value>/opt/moudles/hadoop-2.7.3.data/ha/data/tmpvalue>
property>
<property>
<name>fs.trash.intervalname>
<value>10080value>
property>
<property>
<name>ha.zookeeper.quorumname>
<value>biluos.com:2181,biluos1.com:2181,biluos2.com:2181value>
property>
configuration>
注意:fs.defaultFS这个不要写成dfs.defaultFS了,我就是写错了,导致一直使用hdfs dfs -mkdir /lcc 命令结果把文件夹建立本地了。因为配错了。
配置slaves文件
biluos.com
biluos1.com
biluos2.com
11。按照步骤启动
(1)先在每个zookeeper节点启动(我这里是三台都启动)
/opt/moudles/zookeeper-3.4.6/bin/zkServer.sh start
jps如下:
[root@biluos1 ~]# jps
1509 QuorumPeerMain
(2)启动每个zookeeper节点journalnode(我这里是三台都启动)
/opt/moudles/hadoop-2.7.3/sbin/hadoop-daemon.sh start journalnode
jps如下:
[root@biluos1 ~]# jps
1509 QuorumPeerMain
5851 JournalNode
(3)格式化第一条安装namenode的机器,我的是biluos.com
/opt/moudles/hadoop-2.7.3/bin/hadoop namenode -format
不要出错。如果报错
java.io.IOException: Incompatible clusterIDs in /opt/moudles/hadoop-
c31a265c5e6e; datanode clusterID = CID-a2b73025-f5cc-4bf2-8793-
c1ff1a3628bd at org.apache.hadoop.hdfs.server.datanode.DataStorage.
doTransition(DataStorage.java:775)
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.
recoverTransitionRead(DataStorage.java:574)
一定是datanode Id不对了,删除掉重新格式化,或者修改配置文件
rm -rf /opt/moudles/hadoop-2.7.3.data
mkdir -p /opt/moudles/hadoop-2.7.3.data/ha/data/dfs/namenode/name
mkdir -p /opt/moudles/hadoop-2.7.3.data/ha/data/dfs/namenode/edits
mkdir -p /opt/moudles/hadoop-2.7.3.data/ha/data/dfs/dn
mkdir -p /opt/moudles/hadoop-2.7.3.data/ha/data/dfs/secondarynamenode/name
mkdir -p /opt/moudles/hadoop-2.7.3.data/ha/data/dfs/secondarynamenode/edits
mkdir -p /opt/moudles/hadoop-2.7.3.data/ha/data/dfs/jn
(4)在第1台机器上启动nameNode,我的是biluos.com
/opt/moudles/hadoop-2.7.3/sbin/hadoop-daemon.sh start namenode
(5)在第二台机器上进行元数据同步,我的是biluos1.com
/opt/moudles/hadoop-2.7.3/bin/hdfs namenode -bootstrapStandby
启动nameNode
/opt/moudles/hadoop-2.7.3/sbin/hadoop-daemon.sh start namenode
(6)所有节点启动dataNode
/opt/moudles/hadoop-2.7.3/sbin/hadoop-daemon.sh start datanode
(7)此时两个nameNode都启动了,但是都是standby状态
/opt/moudles/hadoop-2.7.3/sbin/hadoop-daemon.sh start datanode
(8)格式化zk,在一台机器上执行就可以了
hdfs zkfc -formatZK
(9)在有nameNode的节点上启动DFSZKFailoverController
/opt/moudles/hadoop-2.7.3/sbin/hadoop-daemon.sh start zkfc
启动完成后,会发现,nameNode其中一个自动转换为active状态了。
(10)最后jps结果如下
[root@biluos moudles]# jps
7536 NameNode
8018 Jps
2024 QuorumPeerMain
7849 DFSZKFailoverController
7673 DataNode
7451 JournalNode
[root@biluos1 ~]# jps
6240 DFSZKFailoverController
1509 QuorumPeerMain
5975 NameNode
6535 Jps
5851 JournalNode
6109 DataNode
[root@biluos1 ~]#
[root@biluos2 ~]# jps
3684 JournalNode
4006 Jps
3770 DataNode
1501 QuorumPeerMain
[root@biluos2 ~]#
这里可能会出现很多问题,但是问题主要都是配置错误,记得关闭防火墙,否则会失败的。