Hadoop2.6.5高可用集群搭建

软件环境:

linux系统: CentOS6.7
Hadoop版本: 2.6.5
zookeeper版本: 3.4.8

主机配置:

一共m1, m2, m3这五部机, 每部主机的用户名都为centos
192.168.179.201: m1 
192.168.179.202: m2 
192.168.179.203: m3 

m1: Zookeeper, Namenode, DataNode, ResourceManager, NodeManager, Master, Worker
m2: Zookeeper, Namenode, DataNode, ResourceManager, NodeManager, Worker
m3: Zookeeper, DataNode, NodeManager, Worker


前期准备

1.配置主机IP:

sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0

2.配置主机名:

sudo vi /etc/sysconfig/network

3.配置主机名和IP的映射关系:

sudo vi /etc/hosts

4.关闭防火墙

(1)临时关闭:

service iptables   stop
service iptables   status

(2)开机时自动关闭:

chkconfig iptables   off
chkconfig iptables   --list




搭建步骤:

一.安装配置Zookeeper集群(在m3.m4,m5三部主机上)

1.解压

tar  -zxvf zookeeper-3.4.8.tar.gz  -C  /home/hadoop/soft/zookeeper


2.配置环境变量

vi  /etc/profile
## Zookeeper
export   ZK_HOME=/home/centos/soft/zookeeper
export   CLASSPATH=$CLASSPATH:$ZK_HOME/lib
export   PATH=$PATH:$ZK_HOME/sbin:$ZK_HOME/bin
source  /etc/profile


3.修改配置

(1)配置zoo.cfg文件

cd  /home/centos/soft/zookeeper/conf/
cp  zoo_sample.cfg  zoo.cfg
vi  zoo.cfg
## 修改dataDir此项配置
dataDir=/home/centos/soft/zookeeper/tmp
## 添加以下三项配置
server.1=m3:2888:3888
server.2=m4:2888:3888
server.3=m5:2888:3888

(2)创建tmp目录

mkdir /home/centos/soft/zookeeper/tmp

(3)编辑myid文件

touch   /home/centos/soft/zookeeper/tmp/myid
echo  1  >   /home/centos/soft/zookeeper/tmp/myid            ## 在m3主机上myid=1


4.配置zookeeper日志存放位置

  1. 编辑zkEnv.sh文件
vi  /home/centos/soft/zookeeper/bin/zkEnv.sh
# 编辑下列该项配置
if   [ "x${ZOO_LOG_DIR}" = "x" ]
then
    ZOO_LOG_DIR="/home/centos/soft/zookeeper/logs"            ## 修改此项
fi


(5)创建logs目录

mkdir /home/centos/soft/zookeeper/logs

5. 拷贝到其他主机并修改myid

(1)拷贝到其他主机

scp -r /home/centos/soft/zookeeper/ m4:/home/centos/soft/
scp -r /home/centos/soft/zookeeper/ m5:/home/centos/soft/

(2)修改myid

echo 2 > /home/centos/soft/zookeeper/tmp/myid     ## m4主机
echo 3 > /home/centos/soft/zookeeper/tmp/myid     ## m5主机





二.安装配置hadoop集群

1.解压

tar  -zxvf  hadoop-2.6.5.tar.gz  -C  /home/centos/soft/hadoop


2.将Hadoop配置进环境变量

vi   /etc/profile
## Java
export JAVA_HOME=/home/centos/soft/jdk
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin

## Hadoop
export HADOOP_USER_NAME=centos
export HADOOP_HOME=/home/centos/soft/hadoop
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib
export PATH=$PATH:$HADOOP_HOME/bin
source /etc/profile

3.编辑hadoop-env.sh文件

1.编辑hadoop-env.sh文件
export  JAVA_HOME=/home/centos/soft/jdk

2.编辑core-site.xml文件
<configuration>
<property><name>fs.defaultFSname>
   <value>hdfs://ns1value>
property>
<property><name>hadoop.tmp.dirname>
   <value>/home/centos/soft/hadoop/tmpvalue>
property>
<property><name>ha.zookeeper.quorumname>
   <value>m3:2181,m4:2181,m5:2181value>
property>

<property>
      <name>hadoop.proxyuser.centos.hostsname>
      <value>*value>
property>
<property><name>hadoop.proxyuser.centos.groupsname>
      <value>*value>
property>
configuration>

3.编辑hdfs-site.xml文件
<configuration>
<property><name>dfs.nameservicesname>
   <value>ns1value>
property>
<property><name>dfs.ha.namenodes.ns1name>
   <value>nn1,nn2value>
property>
<property><name>dfs.namenode.rpc-address.ns1.nn1name>
   <value>m1:9000value>
property>
<property><name>dfs.namenode.http-address.ns1.nn1name>
   <value>m1:50070value>
property>
<property><name>dfs.namenode.rpc-address.ns1.nn2name>
   <value>m2:9000value>
property>
<property><name>dfs.namenode.http-address.ns1.nn2name>
   <value>m2:50070value>
property>
<property><name>dfs.namenode.shared.edits.dirname>
   <value>qjournal://m3:8485;m4:8485;m5:8485/ns1value>
property>
<property><name>dfs.journalnode.edits.dirname>
   <value>/home/centos/soft/hadoop/journalvalue>
property>
<property><name>dfs.namenode.name.dirname>
   <value>/home/centos/soft/hadoop/tmp/dfs/namevalue>
property>
<property><name>dfs.datanode.data.dirname>
 <value>/home/centos/soft/hadoop/tmp/dfs/datavalue>
property>
<property><name>dfs.replicationname>
   <value>1value>
property>
<property><name>dfs.ha.automatic-failover.enabledname>
   <value>truevalue>
property>
<property><name>dfs.webhdfs.enabledname>
   <value>truevalue>
property>
<property><name>dfs.client.failover.proxy.provider.ns1name>
   <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue>
property>
<property><name>dfs.ha.fencing.methodsname>
   <value>
         sshfence
         shell(/bin/true)
   value>
property>
<property><name>dfs.ha.fencing.ssh.private-key-filesname>
   <value>/home/centos/.ssh/id_rsavalue>
property>
<property><name>dfs.ha.fencing.ssh.connect-timeoutname>
   <value>30000value>
property>
<property><name>dfs.permissionsname>
   <value>falsevalue>
property>
<property><name>heartbeat.recheck.intervalname>
   <value>2000value>
property>
<property><name>dfs.heartbeat.intervalname>
      <value>1value>
property>
<property>
      <name>dfs.blockreport.intervalMsecname>
      <value>3600000value>
      <description>Determines block reporting interval in milliseconds.description>
property>
configuration>

4.编辑mapred-site.xml文件
<configuration>
<property>
      <name>mapreduce.framework.namename>
      <value>yarnvalue>
property>
<property>
      <name>mapreduce.jobhistory.addressname>
      <value>0.0.0.0:10020value>
      <description>MapReduce JobHistory Server IPC host:portdescription>
property>
<property>
      <name>mapreduce.jobhistory.webapp.addressname>
      <value>0.0.0.0:19888value>
      <description>MapReduce JobHistory Server Web UI host:portdescription>
property>
<property>
      <name>mapreduce.task.io.sort.mbname>
      <value>1value>
property>
<property>
      <name>yarn.app.mapreduce.am.staging-dirname>
      <value>/uservalue>
property>
<property>
      <name>mapreduce.jobhistory.intermediate-done-dirname>
      <value>/user/history/done_intermediatevalue>
property>
<property>
      <name>mapreduce.jobhistory.done-dirname>
      <value>/user/historyvalue>
property>
configuration>    

5.编辑yarn-site.xml文件
<configuration>
<property>
      <name>yarn.resourcemanager.ha.enabledname>
      <value>truevalue>
property>
<property>
      <name>yarn.resourcemanager.cluster-idname>
      <value>yrcvalue>
property>
<property>
      <name>yarn.resourcemanager.ha.rm-idsname>
      <value>rm1,rm2value>
property>
<property>
      <name>yarn.resourcemanager.hostname.rm1name>
      <value>m1value>
property>
<property>
      <name>yarn.resourcemanager.hostname.rm2name>
      <value>m2value>
property>
<property>
      <name>yarn.resourcemanager.zk-addressname>
      <value>m3:2181,m4:2181,m5:2181value>
property>
<property>
      <name>yarn.nodemanager.aux-servicesname>
      <value>mapreduce_shuffle,spark_shufflevalue>
property>
<property>
      <name>yarn.nodemanager.resource.memory-mbname>
      <value>2048value>
property>
<property>
      <name>yarn.scheduler.maximum-allocation-mbname>
      <value>4096value>
property>
<property>
      <name>yarn.nodemanager.log-dirsname>
      <value>/home/centos/soft/hadoop/logsvalue>
property>
<property>
      <name>yarn.resourcemanager.scheduler.classname>
      <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulervalue>
property>
<property>  
      <name>yarn.nodemanager.aux-services.spark_shuffle.classname>  
      <value>org.apache.spark.network.yarn.YarnShuffleServicevalue>  
property> 
<property>
      <name>yarn.nodemanager.pmem-check-enabledname>
      <value>falsevalue>
      <description>是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是truedescription>
property>
<property>
      <name>yarn.nodemanager.vmem-check-enabledname>
      <value>falsevalue>
      <description>是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是truedescription>
property>
<property>
      <name>spark.shuffle.service.portname>
      <value>7337value>
property>
configuration>
6.编辑slaves文件

编辑slaves文件, slaves是指定子节点的位置, 在HDFS上为DataNode的节点位置, 在YARN上为NodeManager的节点位置, 以你的实际情况而定

m3
m4
m5





三.初始化Hadoop

1. 配置主机之间免密码登陆

(1)在m1上生产一对密匙

ssh-keygen -t rsa

(2)将公钥拷贝到其他节点,包括本主机

ssh-coyp-id 127.0.0.1
ssh-coyp-id localhost
ssh-coyp-id m1
ssh-coyp-id m2
ssh-coyp-id m3

(3)在其他主机上重复(1)(2)的操作



2.将配置好的hadoop拷贝到其他节点

scp -r /home/centos/soft/hadoop m2:/home/centos/soft/
scp -r /home/centos/soft/hadoop m3:/home/centos/soft/
scp -r /home/centos/soft/hadoop m4:/home/centos/soft/
scp -r /home/centos/soft/hadoop m5:/home/centos/soft/


注意:严格按照下面的步骤

3.启动zookeeper集群(分别在m3、m4、m5上启动zk)

  1. 启动zookeeper服务
cd /home/centos/soft/zookeeper-3.4.5/bin/
./zkServer.sh start
  1. 查看状态:一个leader,两个follower
./zkServer.sh status

4.启动journalnode (分别在m3、m4、m5主机上执行, 必须在HDFS格式化前执行, 不然会报错)

(1)启动JournalNode服务

cd /home/centos/soft/hadoop
sbin/hadoop-daemon.sh start journalnode

(2)运行jps命令检验,m3、m4、m5上多了JournalNode进程

jps


5.格式化HDFS(在m1上执行即可)

(1)在m1上执行命令:

hdfs namenode -format

(2)格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的是/home/centos/soft/hadoop/tmp,然后将m1主机上的/home/centos/soft/hadoop下的tmp目录拷贝到m2主机上的/home/centos/soft/hadoop目录下

scp -r /home/centos/soft/hadoop/tmp/ m2:/home/centos/soft/hadoop/

6.格式化ZK(在m1上执行)

hdfs zkfc -formatZK

7.启动HDFS(在m1上执行)

sbin/start-dfs.sh

8.启动YARN(在m1,m2上执行)

sbin/start-yarn.sh


至此,Hadoop-2.6.5配置完毕!!!








四.检验Hadoop集群搭建成功

0.在Windows下编辑hosts文件, 配置主机名与IP的映射(此步骤可跳过)**

C:\Windows\System32\drivers\etc\hosts

192.168.179.201     m1
192.168.179.202     m2
192.168.179.203     m3
192.168.179.204     m4
192.168.179.205     m5

1.可以统计浏览器访问:

http://m1:50070
    NameNode 'm1:9000' (active)
http://m2:50070
    NameNode 'm2:9000' (standby)

2.验证HDFS HA

(1)首先向hdfs上传一个文件

hadoop fs -put /etc/profile /profile

(2)查看是否已上传到HDFS上

hadoop fs -ls /

(3)然后再kill掉active的NameNode

kill -9 of NN>

(4)通过浏览器访问:http://m2:50070

NameNode 'm2:9000' (active)             ## 主机m2上的NameNode变成了active

(5)执行命令:

hadoop fs -ls /                          ## 看之前在m1上传的文件是否还存在!!!

(6)手动在m1上启动挂掉的NameNode

sbin/hadoop-daemon.sh start namenode

(7)通过浏览器访问:http://m1:50070

NameNode 'm1:9000' (standby)

3.验证YARN:

  1. 用浏览器访问: http://m1:8088, 查看是否有NodeManager服务在运行
  2. 运行一下hadoop提供的demo中的WordCount程序, 在linux上执行以下命令
hadoop jar /home/centos/soft/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount InputParameter OutputParameter
在http://m1:8088 上是否有application在运行,若有则YARN没问题


OK,大功告成!!!




你可能感兴趣的:(技术博客)