集群中有两个NameNode,两个ResourceManager。实现了NameNode的HA方案以及ResourceManager单点故障的解决。
hadoop2中的NameNode有两个。每一个都有相同的职能。一个是active状态的,一个是standby状态的。当集群运行时,只有active状态的NameNode是正常工作的,standby状态的NameNode是处于待命状态的,时刻同步active状态NameNode的数据。一旦active状态的NameNode不能工作,通过手工或者自动切换,standby状态的NameNode就可以转变为active状态的,就可以继续工作了。这就是高可靠。
http://blog.csdn.net/u011204847/article/details/50926065
虚拟机中必须添加如下配置: (必须和宿主机在同一网段)
在vmware的【编辑】-->【虚拟网络编辑器】设置:将网络连接方式设置为“桥接”,并桥接到宿主机器的网卡(可以是有线或者无线网络)。
设置静态IP地址:
vi /etc/sysconfig/network-scripts/ifcfg-eno16777736 ("eno16777736"这部分名字在不同版本中可能不一样)
修改(添加)如下内容:
BOOTPROTO="static"#dhcp改为static ONBOOT="yes"#开机启用本配置 IPADDR=192.168.1.181#静态IP GATEWAY=192.168.1.1 #默认网关 NETMASK=255.255.255.0 #子网掩码 DNS1=192.168.1.1#DNS 配置
总体示例:
cat /etc/sysconfig/network-scripts/ifcfg-eno16777736 HWADDR="00:15:5D:07:F1:02" TYPE="Ethernet" BOOTPROTO="static" #dhcp改为static DEFROUTE="yes" PEERDNS="yes" PEERROUTES="yes" IPV4_FAILURE_FATAL="no" IPV6INIT="yes" IPV6_AUTOCONF="yes" IPV6_DEFROUTE="yes" IPV6_PEERDNS="yes" IPV6_PEERROUTES="yes" IPV6_FAILURE_FATAL="no" NAME="eth0" UUID="bb3a302d-dc46-461a-881e-d46cafd0eb71" ONBOOT="yes" #开机启用本配置 IPADDR=192.168.1.181 #静态IP GATEWAY=192.168.1.1 #默认网关 NETMASK=255.255.255.0 #子网掩码 DNS1=192.168.1.1 #DNS 配置
修改每台主机的主机名:
vi /etc/sysconfig/network
添加如下内容:
HOSTNAME=master01 //master02,master03........
设置IP绑定:
vi /etc/hosts master01 192.168.1.181 master02 192.168.1.182 master03 192.168.1.186 master04 192.168.1.187 slave01 192.168.1.183 slave02 192.168.1.184 slave03 192.168.1.185
重启网卡:
service network restart
centos7中mini安装的防火墙默认为firewalld:
systemctl stop firewalld systemctl mask firewalld
安装iptables-services并关闭防火墙:
yum install iptables-services service iptables stop chkconfig iptables off #关闭防火墙开机启动 service ip6tables stop chkconfig ip6tables off
关闭selinux:
vi /etc/sysconfig/selinux SELINUX=enforcing -> SELINUX=disabled setenforce 0 getenforce
添加用户组:
root登录条件下: groupadd hadoop
添加用户并分配用户组:
useradd -g hadoop hadoop
修改密码:
passwd hadoop
安装ssh:
yum intsall openssh-server
首先是root用户免密码登录:
root用户登陆后在每台主机上面使用命令:
ssh-keygen -t rsa cd /root/.ssh/
里面有两个文件:
id_rsa.pub id_rsa
在master01主机上面:
mv id_rsa.pub authorized_keys
然后把所有主机里面的id_rsa.pub内容全部保存到这个authorized_keys后把authorized_keys复制到各个节点上。
hadoop用户ssh免密码登录同root用户:
注意:这时id_rsa.pub id_rsa的目录为你hadoop用户下的.ssh目录
所用的软件环境配置示例:
export JAVA_HOME=/usr/local/software/jdk1.8.0_66 export HADOOP_HOME=/usr/local/software/hadoop-2.7.0 export ZOOKEEPER_HOME=/usr/local/software/zookeeper-3.4.6/bin export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin
然后使环境配置生效:
source /etc/profile
Hadoop HA配置文件: // 总共5个:
hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml
hadoop-env.sh:
export JAVA_HOME=/usr/local/software/jdk1.8.0_66
core-site.xml:
fs.defaultFS hdfs://cluster hadoop.tmp.dir /usr/local/software/hadoop-2.7.0/tmp ha.zookeeper.quorum slave01:2181,slave02:2181,slave03:2181
hdfs-site.xml
dfs.nameservices cluster dfs.ha.namenodes.cluster nn1,nn2 dfs.namenode.rpc-address.cluster.nn1 master01:9000 dfs.namenode.http-address.cluster.nn1 master01:50070 dfs.namenode.rpc-address.cluster.nn2 master02:9000 dfs.namenode.http-address.cluster.nn2 master02:50070 dfs.namenode.shared.edits.dir qjournal://slave01:8485;slave02:8485;slave03:8485/cluster dfs.journalnode.edits.dir /usr/local/software/hadoop-2.7.0/journal dfs.ha.automatic-failover.enabled true dfs.client.failover.proxy.provider.cluster org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.fencing.methods sshfence shell(/bin/true) dfs.ha.fencing.ssh.connect-timeout 30000 dfs.ha.fencing.ssh.private-key-files /home/hadoop/.ssh/id_rsa dfs.replication 2
mapred-site.xml
mapreduce.framework.name yarn
yarn-site.xml
yarn.resourcemanager.ha.enabled true yarn.resourcemanager.cluster-id rm-cluster yarn.resourcemanager.ha.rm-ids rm1,rm2 yarn.resourcemanager.hostname.rm1 master01 yarn.resourcemanager.hostname.rm2 master02 yarn.resourcemanager.recovery.enabled true yarn.resourcemanager.store.class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore yarn.resourcemanager.zk-address slave01:2181,slave02:2181,slave03:2181 yarn.nodemanager.aux-services mapreduce_shuffle
vi /usr/local/software/zookeeper-3.4.6/conf/zoo.cfg
修改dataDir=/usr/local/zk/data
在文件最后新增:
server.1=slave01:2888:3888 server.2=slave02:2888:3888 server.3=slave03:2888:3888
创建文件夹
mkdir /usr/local/software/zookeeper-3.4.6/zk/data
在data目录下,创建文件myid,值为1;相应的在slave02和slave03上面创建文件myid,值为2、3
启动zookeeper
bin/zkServer.sh start bin/zkServer.sh status 查看状态(至少两个zookeeper实例开启后才可以查看状态)
首次启动顺序 (zookeeper集群先启动)
启动journalnode:
hadoop-daemon.sh start journalnode //slave01,slave02,slave03
格式化NameNode:
hdfs namenode -format
拷贝格式化后的namenode状态:
scp -r tmp/ hadoop@master02:/usr/local/hadoop-2.6.0/ //从master01拷贝tmp到master02
格式化zookeeper:
hdfs zkfc -formatZK
启动HDFS:
sbin/start-dfs.sh
启动Yarn:
sbin/start-yarn.sh //master01
启动master02主机上的ResourceManager:
yarn-daemon.sh start resourcemanager//master02
sbin/start-dfs.sh sbin/start-yarn.sh //master01 yarn-daemon.sh start resourcemanager//master02
启动后master02是active状态
master01是standby状态。
当kill掉master02进程后,master01会自动变成active状态,从而保证集群的高可用性。
同时master03上面的ResourceManager为active状态:
当浏览master04上面ResourceManager时显示信息为:
“This is standby RM,Redirecting to the current active RM:http://master03:8088/”
启动脚本:
for i in slave01 slave02 slave03 do ssh $i "/usr/local/software/zookeeper-3.4.6/bin/zkServer.sh start" done start-dfs.sh ssh master03 "/usr/local/software/hadoop-2.7.0/sbin/start-yarn.sh" ssh master04 "/usr/local/software/hadoop-2.7.0/sbin/yarn-daemon.sh start resourcemanager"
启动时日志:
查看集群是否正常启动的脚本:
for i in master01 master02 master03 master04 slave01 slave02 slave03 do ssh $i "hostname;source /etc/profile;jps" done
停止脚本:
[hadoop@master01 ~]$ cat stop
ssh master03 "/usr/local/software/hadoop-2.7.0/sbin/stop-yarn.sh" stop-dfs.sh for i in slave01 slave02 slave03 do ssh $i "/usr/local/software/zookeeper-3.4.6/bin/zkServer.sh stop" done
shell 操作:主机间文件传输示例
for i in master02 slave01 slave02 slave03 do scp -rq software hadoop@$i:/usr/local/software/ done for i in master02 slave01 slave02 slave03 do ssh $i "source /etc/profile" done
1. hadoop用户生成的秘钥在/home/hadoop/.ssh目录中,其他用户以此类推。
2.问题:zookeeper在linux服务器上,通过java代码来控制启动与停止。 发现能控制停止,不能控制启动。
解决:
JAVA_HOME=/usr/local/java/jdk1.7.0_76 PATH=$JAVA_HOME/bin:$PATH CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export JAVA_HOME export PATH export CLASSPATH
3:采用shell脚本启动zookeeper,首先新建文件start.sh
写入内容(rh1 rh2 rh3 分别是主机名。此处需要ssh):
#!/bin/sh echo “start zkServer…” for i in rh1 rh2 rh3 do ssh $i “/usr/local/zookeeper3.4/bin/zkServer.sh start” done
写好后保存,加上执行权限:chmod u+x start.sh
运行:./start.sh看见启动成功了,有输出。但是输入jps查看的时候,会发现没有QuorumPeerMain 进程。说明没有启动成功。
原因:
首先知道交互式shell和非交互式shell、登录shell和非登录shell是有区别的
在登录shell里,环境信息需要读取/etc/profile和~ /.bash_profile, ~/.bash_login, and ~/.profile按顺序最先的一个,并执行其中的命令。除非被 —noprofile选项禁止了;在非登录shell里,环境信息只读取 /etc/bash.bashrc和~/.bashrc
手工执行是属于登陆shell,脚本执行数据非登陆shell,而我的linux环境配置中只对/etc/profile进行了jdk1.6等环境的配
置,所以脚本执行/usr/local/zookeeper3.4/bin/zkServer.sh start 启动zookeeper失败了
解决方法:
把profile的配置信息echo到.bashrc中 echo ‘source /etc/profile’ >~/.bashrc
在/zookeeper/bin/zkEnv.sh的中开始位置添加 export JAVA_HOME=/usr/local/jdk1.6(就像hadoop中对hadoop-env.sh的配置一样)
采用shell脚本启动zookeeper,首先新建文件start.sh
写入内容(rh1 rh2 rh3 分别是主机名。此处需要ssh):#!/bin/sh
echo “start zkServer就可以了。
总结的解决方法(下面3个方法任选1):
1、脚本代码中添加“source /etc/profile;” 改为:ssh crxy$i “source /etc/profile;/usr/local/zookeeper/bin/zkServer.sh start”
2、把profile的配置信息echo到.bashrc中 echo ‘source /etc/profile’ >~/.bashrc
3、在/zookeeper/bin/zkEnv.sh的中开始位置添加 export JAVA_HOME=/usr/local/jdk1.7.0_45(就像hadoop中对hadoop-env.sh的配置一样)