Hadoop2.6.5高可用集群搭建

软件环境:

linux系统: CentOS6.7
Hadoop版本: 2.6.5
zookeeper版本: 3.4.8


主机配置:

一共m1, m2, m3这五部机, 每部主机的用户名都为centos
192.168.179.201: m1 
192.168.179.202: m2 
192.168.179.203: m3 

m1: Zookeeper, Namenode, DataNode, ResourceManager, NodeManager, Master, Worker
m2: Zookeeper, Namenode, DataNode, ResourceManager, NodeManager, Worker
m3: Zookeeper, DataNode, NodeManager, Worker


前期准备

1.配置主机IP:
sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0
2.配置主机名:
sudo vi /etc/sysconfig/network
3.配置主机名和IP的映射关系:
sudo vi /etc/hosts
4.关闭防火墙
  1. 临时关闭:
service iptables   stop
service iptables   status
  1. 开机时自动关闭:
chkconfig iptables   off
chkconfig iptables   --list


搭建步骤:

一.安装配置Zookeeper集群(在m1,m2,m3三部主机上)
1.解压
tar  -zxvf zookeeper-3.4.8.tar.gz  -C  /home/hadoop/soft/zookeeper

2.配置环境变量
vi  /etc/profile
## Zookeeper
export   ZK_HOME=/home/centos/soft/zookeeper
export   CLASSPATH=$CLASSPATH:$ZK_HOME/lib
export   PATH=$PATH:$ZK_HOME/sbin:$ZK_HOME/bin
source  /etc/profile

3.修改配置
  1. 配置zoo.cfg文件
cd  /home/centos/soft/zookeeper/conf/
cp  zoo_sample.cfg  zoo.cfg
vi  zoo.cfg
## 修改dataDir此项配置
dataDir=/home/centos/soft/zookeeper/tmp
## 添加以下三项配置
server.1=m1:2888:3888
server.2=m2:2888:3888
server.3=m3:2888:3888
  1. 创建tmp目录
mkdir /home/centos/soft/zookeeper/tmp
  1. 编辑myid文件
touch   /home/centos/soft/zookeeper/tmp/myid
echo  1  >   /home/centos/soft/zookeeper/tmp/myid            ## 在m1主机上myid=1
  1. 配置zookeeper日志存放位置
  2. 编辑zkEnv.sh文件
vi  /home/centos/soft/zookeeper/bin/zkEnv.sh
# 编辑下列该项配置
if   [ "x${ZOO_LOG_DIR}" = "x" ]
 then
        ZOO_LOG_DIR="/home/centos/soft/zookeeper/logs"            ## 修改此项
fi
  1. 创建logs目录
mkdir /home/centos/soft/zookeeper/logs

5. 拷贝到其他主机并修改myid
  1. 拷贝到其他主机
scp -r /home/centos/soft/zookeeper/ m2:/home/centos/soft/
scp -r /home/centos/soft/zookeeper/ m3:/home/centos/soft/
  1. 修改myid
echo 2 > /home/centos/soft/zookeeper/tmp/myid     ## m2主机
echo 3 > /home/centos/soft/zookeeper/tmp/myid     ## m3主机




二.安装配置hadoop集群(在m1上操作)

1.解压
tar  -zxvf  hadoop-2.6.5.tar.gz  -C  /home/centos/soft/hadoop

2.将Hadoop配置进环境变量
vi   /etc/profile
## Java
export JAVA_HOME=/home/centos/soft/jdk
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin

## Hadoop
export HADOOP_USER_NAME=centos
export HADOOP_HOME=/home/centos/soft/hadoop
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib
export PATH=$PATH:$HADOOP_HOME/bin
source /etc/profile

3.修改Hadoop的配置文件 (切记:不可在配置文件中使用变量, 如$HADOOP_HOME, 不然会死的很惨), hadoop所有的配置文件都在${HADOOP_HOME}/etc/hadoop目录下, 一共有6个配置文件需要改**
  1. 编辑$hadoop-env.sh文件
export  JAVA_HOME=/home/centos/soft/jdk
  1. 编辑core-site.xml文件

fs.defaultFShdfs://ns1

hadoop.tmp.dir/home/centos/soft/hadoop/tmp

ha.zookeeper.quorumm1:2181,m2:2181,m3:2181



      hadoop.proxyuser.centos.hosts
      *

hadoop.proxyuser.centos.groups
      *


  1. 编辑hdfs-site.xml文件

dfs.nameservicesns1

dfs.ha.namenodes.ns1nn1,nn2

dfs.namenode.rpc-address.ns1.nn1m1:9000

dfs.namenode.http-address.ns1.nn1m1:50070

dfs.namenode.rpc-address.ns1.nn2m2:9000

dfs.namenode.http-address.ns1.nn2m2:50070

dfs.namenode.shared.edits.dirqjournal://m1:8485;m28485;m3:8485/ns1

dfs.journalnode.edits.dir/home/centos/soft/hadoop/journal

dfs.namenode.name.dir/home/centos/soft/hadoop/tmp/dfs/name

dfs.datanode.data.dir/home/centos/soft/hadoop/tmp/dfs/data

dfs.replication1

dfs.ha.automatic-failover.enabledtrue

dfs.webhdfs.enabledtrue

dfs.client.failover.proxy.provider.ns1org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

dfs.ha.fencing.methods
         sshfence
         shell(/bin/true)
   

dfs.ha.fencing.ssh.private-key-files/home/centos/.ssh/id_rsa

dfs.ha.fencing.ssh.connect-timeout30000

dfs.permissionsfalse

heartbeat.recheck.interval2000

dfs.heartbeat.interval
      1


      dfs.blockreport.intervalMsec
      3600000
      Determines block reporting interval in milliseconds.


```

4. 编辑mapred-site.xml文件
```


      mapreduce.framework.name
      yarn


      mapreduce.jobhistory.address
      0.0.0.0:10020
      MapReduce JobHistory Server IPC host:port


      mapreduce.jobhistory.webapp.address
      0.0.0.0:19888
      MapReduce JobHistory Server Web UI host:port


      mapreduce.task.io.sort.mb
      1


      yarn.app.mapreduce.am.staging-dir
      /user


      mapreduce.jobhistory.intermediate-done-dir
      /user/history/done_intermediate


      mapreduce.jobhistory.done-dir
      /user/history

    
```

5. 编辑yarn-site.xml文件
```


      yarn.resourcemanager.ha.enabled
      true


      yarn.resourcemanager.cluster-id
      yrc


      yarn.resourcemanager.ha.rm-ids
      rm1,rm2


      yarn.resourcemanager.hostname.rm1
      m1


      yarn.resourcemanager.hostname.rm2
      m2


      yarn.resourcemanager.zk-address
      m1:2181,m2:2181,m3:2181


      yarn.nodemanager.aux-services
      mapreduce_shuffle,spark_shuffle


      yarn.nodemanager.resource.memory-mb
      2048


      yarn.scheduler.maximum-allocation-mb
      4096


      yarn.nodemanager.log-dirs
      /home/centos/soft/hadoop/logs


      yarn.resourcemanager.scheduler.class
      org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler

  
      yarn.nodemanager.aux-services.spark_shuffle.class  
      org.apache.spark.network.yarn.YarnShuffleService  
 

      yarn.nodemanager.pmem-check-enabled
      false
      是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true


      yarn.nodemanager.vmem-check-enabled
      false
      是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true


      spark.shuffle.service.port
      7337


```

6. 编辑slaves文件, slaves是指定子节点的位置, 在HDFS上为DataNode的节点位置, 在YARN上为NodeManager的节点位置, 以你的实际情况而定
```
m1
m2
m3
```


****


##三.初始化Hadoop #####1. 配置主机之间免密码登陆 1. 在m1上生产一对密匙 ``` ssh-keygen -t rsa ``` 2. 将公钥拷贝到其他节点,包括本主机 ``` ssh-coyp-id 127.0.0.1 ssh-coyp-id localhost ssh-coyp-id m1 ssh-coyp-id m2 ssh-coyp-id m3 ``` 3. 在其他主机上重复(1)(2)的操作 --- #####2.将配置好的hadoop拷贝到其他节点 ``` scp -r /home/centos/soft/hadoop m2:/home/centos/soft/ scp -r /home/centos/soft/hadoop m3:/home/centos/soft/ ``` ---
####注意:严格按照下面的步骤 #####3.启动zookeeper集群(分别在m1、m2、m3上启动zk) 1. 启动zookeeper服务 ``` cd /home/centos/soft/zookeeper-3.4.8/bin/ ``` ``` ./zkServer.sh start ``` 2. 查看状态:一个leader,两个follower ``` ./zkServer.sh status ``` ---- #####4.启动journalnode (分别在m1、m2、m3主机上执行, 必须在HDFS格式化前执行, 不然会报错) 1. 启动JournalNode服务 ``` cd /home/centos/soft/hadoop ``` ``` sbin/hadoop-daemon.sh start journalnode ``` 2. 运行jps命令检验,m1、m2、m3上多了JournalNode进程 ``` jps ``` --- #####5.格式化HDFS(在m1上执行即可) 1. 在m1上执行命令: ``` hdfs namenode -format ``` 2. 格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的是/home/centos/soft/hadoop/tmp,然后将m1主机上的/home/centos/soft/hadoop下的tmp目录拷贝到m2主机上的/home/centos/soft/hadoop目录下 ``` scp -r /home/centos/soft/hadoop/tmp/ m2:/home/centos/soft/hadoop/ ``` --- #####6.格式化ZK(在m1上执行) ``` hdfs zkfc -formatZK ``` --- #####7.启动HDFS(在m1上执行) ``` sbin/start-dfs.sh ``` --- #####8.启动YARN(在m1,m2上执行) ``` sbin/start-yarn.sh ``` --- ##### 至此,Hadoop-2.6.5配置完毕!!! ---

###四.检验Hadoop集群搭建成功 ######0.在Windows下编辑hosts文件, 配置主机名与IP的映射(此步骤可跳过)** ``` C:\Windows\System32\drivers\etc\hosts 192.168.179.201 m1 192.168.179.202 m2 192.168.179.203 m3 ``` ---- ######1.可以统计浏览器访问: ``` http://m1:50070 NameNode 'm1:9000' (active) http://m2:50070 NameNode 'm2:9000' (standby) ``` --- ######2.验证HDFS HA 1. 首先向hdfs上传一个文件 ``` hadoop fs -put /etc/profile /profile ``` 2. 查看是否已上传到HDFS上 ``` hadoop fs -ls / ``` 3. 然后再kill掉active的NameNode ``` kill -9 ``` 4. 通过浏览器访问:http://m2:50070 ``` NameNode 'm2:9000' (active) ## 主机m2上的NameNode变成了active ``` 5. 执行命令: ``` hadoop fs -ls / ## 看之前在m1上传的文件是否还存在!!! ``` 6. 手动在m1上启动挂掉的NameNode ``` sbin/hadoop-daemon.sh start namenode ``` 7. 通过浏览器访问:http://m1:50070 ``` NameNode 'm1:9000' (standby) ``` --- ######3.验证YARN: 1. 用浏览器访问: http://m1:8088, 查看是否有NodeManager服务在运行 2. 运行一下hadoop提供的demo中的WordCount程序, 在linux上执行以下命令 ``` hadoop jar /home/centos/soft/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount InputParameter OutputParameter ``` 在http://m1:8088 上是否有application在运行,若有则YARN没问题 ---
######OK,大功告成!!!


你可能感兴趣的:(Hadoop2.6.5高可用集群搭建)