Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)


1、主机规划

序号 主机名 IP地址 角色
1 nn-1 192.168.9.21 NameNode、mr-jobhistory、zookeeper、JournalNode
2 nn-2 192.168.9.22 Secondary NameNodeJournalNode
3 dn-1 192.168.9.23 DataNode、JournalNode、zookeeper、ResourceManager、NodeManager
4 dn-2 192.168.9.24 DataNode、zookeeper、NodeManager
5 dn-3 192.168.9.25 DataNode、NodeManager
集群说明:
(1)、对于集群规模小于7台和以下的, 可以不做NameNode HA。
(2)、HA的集群,  JournalNode节点要在3个以上, 建议设置成5个节点。 JournalNode是轻量级服务,  为了本地性, 其中两个 JournalNode和两台NameNode节点复用。其他 JournalNode和分散在其他节点上。
3 HA的集群, zookeeper 节点要在3个以上, 建议设置成5个或者7个节点。 zookeeper可以和DataNode节点复用。
(4 HA的集群, ResourceManager建议单独一个节点。对于较大规模的集群,且有空闲的主机资源, 可以考虑设置ResourceManager的HA。


2、主机环境设置

2.1 配置JDK


卸载OpenJDK:
     
     
     
     
  1. --查看java版本
  2. [root@dtgr ~]# java -version
  3. java version "1.7.0_45"
  4. OpenJDK Runtime Environment (rhel-2.4.3.3.el6-x86_64 u45-b15)
  5. OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)
  6. --查看安装源
  7. [root@dtgr ~]# rpm -qa | grep java
  8. java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
  9. -- 卸载
  10. [root@dtgr ~]# rpm -e --nodeps java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
  11. --验证是否卸载成功
  12. [root@dtgr ~]# rpm -qa | grep java
  13. [root@dtgr ~]# java -version
  14. -bash: /usr/bin/java: 没有那个文件或目录

安装jdk:
    
    
    
    
  1. -- 下载并解压java源码包
  2. [root@dtgr java]# mkdir /usr/local/java
  3. [root@dtgr java]# mv jdk-7u79-linux-x64.tar.gz /usr/local/java
  4. [root@dtgr java]# cd /usr/local/java
  5. [root@dtgr java]# tar xvf jdk-7u79-linux-x64.tar.gz
  6. [root@dtgr java]# ls
  7. jdk1.7.0_79 jdk-7u79-linux-x64.tar.gz
  8. [root@dtgr java]#
  9. --- 添加环境变量
  10. [root@dtgr java]# vim /etc/profile
  11. [root@dtgr java]# tail /etc/profile
  12. export JAVA_HOME=/usr/local/java/jdk1.7.0_79
  13. export JRE_HOME=/usr/local/java/jdk1.7.0_79/jre
  14. export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:$CLASSPATH
  15. export PATH=$JAVA_HOME/bin:$PATH
  16. -- 生效环境变量
  17. [root@dtgr ~]# source /etc/profile
  18. -- 验证
  19. [root@dtgr ~]# java -version
  20. java version "1.7.0_79"
  21. Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
  22. Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
  23. [root@dtgr ~]# javac -version
  24. javac 1.7.0_79

2.2 修改主机名和配置主机名解析
在所有节点按照规划修改主机名, 并将主机名加入/etc/hosts文件。
修改主机名:
    
    
    
    
  1. [root@dn-3 ~]# cat /etc/sysconfig/network
  2. NETWORKING=yes
  3. HOSTNAME=dn-3
  4. [root@dn-3 ~]# hostname dn-3

配置/etc/hosts, 并分发到所有节点:
     
     
     
     
  1. [root@dn-3 ~]# cat /etc/hosts
  2. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
  3. ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
  4. 192.168.9.21 nn-1
  5. 192.168.9.22 nn-2
  6. 192.168.9.23 dn-1
  7. 192.168.9.24 dn-2
  8. 192.168.9.25 dn-3

2.3 新建hadoop账户

用户和组均为hadoop, 密码为hadoop, home目录为/hadoop。
    
    
    
    
  1. [root@dn-3 ~]# useradd -d /hadoop hadoop

2.4 配置ntp时钟同步

将nn-1主机作为时钟源)
#vi  /etc/ntp.conf
#server 0.centos.pool.ntp.org
#server 1.centos.pool.ntp.org
#server 2.centos.pool.ntp.org
server nn-1

配置ntp服务自启动
#chkconfig ntpd on
启动ntp服务
#service ntpd start

2.5 关闭防火墙iptables和selinux

(1)、关闭iptables
    
    
    
    
  1. [root@dn-3 ~]# service iptables stop
  2. [root@dn-3 ~]# chkconfig iptables off
  3. [root@dn-3 ~]# chkconfig --list | grep iptables
  4. iptables 0:关闭 1:关闭 2:关闭 3:关闭 4:关闭 5:关闭 6:关闭
  5. [root@dn-3 ~]#

(2)、关闭selinux
    
    
    
    
  1. [root@dn-3 ~]# setenforce 0
  2. setenforce: SELinux is disabled
  3. [root@dn-3 ~]# vim /etc/sysconfig/selinux
SELINUX=disabled

2.6 设置ssh无密码登陆

(1)、在所有节点生成密钥
所有节点, 切换到hadoop用户下, 生成密钥,一路回车:
    
    
    
    
  1. [hadoop@nn-1 ~]$ ssh-keygen -t rsa

(2)、在nn-1上面,将公钥复制到文件authorized_keys中:
命令:$ ssh  主机名   'cat ./.ssh/id_rsa.pub' >> authorized_keys
将上面的命令的主机名替换成实际的主机名, 在nn-1上面将所有的主机都执行一次,包括自己, 如下示例:
    
    
    
    
  1. [hadoop@nn-1 ~]$ ssh nn-1 'cat ./.ssh/id_rsa.pub' >> authorized_keys
  2. hadoop@nn-1's password:
  3. [hadoop@nn-1 ~]$

(3)、设置权限
    
    
    
    
  1. [hadoop@nn-1 .ssh]$ chmod 644 authorized_keys

(4)、将authorized_keys分发到所有节点: $HOME/.ssh/ 。
如下示例:
    
    
    
    
  1. [hadoop@nn-1 .ssh]$ scp authorized_keys hadoop@nn-2:/hadoop/.ssh/

3、安装配置Hadoop


说明: 先在nn-1上面修改配置, 配置完毕批量分发到其他节点。

3.1 上传hadoop、zookeeper安装包

复制安装包到/hadoop目录下。
解压安装包: [hadoop@nn-1 ~]$ tar -xzvf hadoop2-js-0121.tar.gz

3.2 修改hadoop-env.sh

    
    
    
    
  1. export JAVA_HOME=/usr/local/java/jdk1.7.0_79
  2. export HADOOP_HEAPSIZE=2000
  3. export HADOOP_NAMENODE_INIT_HEAPSIZE=10000
  4. export HADOOP_OPTS="-server $HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
  5. export HADOOP_NAMENODE_OPTS="-Xmx15000m -Xms15000m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger
  6. =${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"

参数说明参考: http://blog.csdn.net/fenglibing/article/details/31051225


3.3 修改core-site.xml

     
     
     
     
  1. fs.defaultFS
  2. hdfs://dpi
  3. io.file.buffer.size
  4. 131072
  5. hadoop.tmp.dir
  6. file:/hadoop/hdfs/temp
  7. Abase for other temporary directories.
  8. hadoop.proxyuser.hduser.hosts
  9. *
  10. hadoop.proxyuser.hduser.groups
  11. *
  12. ha.zookeeper.quorum
  13. dn-1:2181,dn-2:2181,dn-3:2181

3.4 修改hdfs-site.xml

    
    
    
    
  1. dfs.namenode.secondary.http-address
  2. nn-1:9001
  3. dfs.namenode.name.dir
  4. file:/hadoop/hdfs/name
  5. dfs.datanode.data.dir
  6. file:/hadoop/hdfs/data,file:/hadoopdata/hdfs/data
  7. dfs.replication
  8. 3
  9. dfs.webhdfs.enabled
  10. true
  11. dfs.nameservices
  12. dpi
  13. dfs.ha.namenodes.dpi
  14. nn-1,nn-2
  15. dfs.namenode.rpc-address.dpi.nn-1
  16. nn-1:9000
  17. dfs.namenode.http-address.dpi.nn-1
  18. nn-1:50070
  19. dfs.namenode.rpc-address.dpi.nn-2
  20. nn-2:9000
  21. dfs.namenode.http-address.dpi.nn-2
  22. nn-2:50070
  23. dfs.namenode.servicerpc-address.dpi.nn-1
  24. nn-1:53310
  25. dfs.namenode.servicerpc-address.dpi.nn-2
  26. nn-2:53310
  27. dfs.ha.automatic-failover.enabled
  28. true
  29. dfs.namenode.shared.edits.dir
  30. qjournal://nn-1:8485;nn-2:8485;dn-1:8485/dpi
  31. dfs.client.failover.proxy.provider.dpi
  32. org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
  33. dfs.journalnode.edits.dir
  34. /hadoop/hdfs/journal
  35. dfs.ha.fencing.methods
  36. sshfence
  37. dfs.ha.fencing.ssh.private-key-files
  38. /hadoop/.ssh/id_rsa

参数说明参考: http://www.aboutyun.com/thread-10572-1-1.html

新建配置文件中的目录:
    
    
    
    
  1. mkdir -p /hadoop/hdfs/name
  2. mkdir -p /hadoop/hdfs/data
  3. mkdir -p /hadoop/hdfs/temp
  4. mkdir -p /hadoop/hdfs/journal
  5. 授权:chmod 755 /hadoop/hdfs
  6. mkdir -p /hadoopdata/hdfs/data
  7. chmod 755 /hadoopdata/hdfs

属主和属组修改为:hadoop:hadoop


3.5 修改mapred-site.xml


     
     
     
     
  1. mapreduce.framework.name
  2. yarn
  3. mapreduce.jobhistory.address
  4. nn-1:10020
  5. mapreduce.jobhistory.webapp.address
  6. nn-1:19888


3.6 修改yarn-site.xml

    
    
    
    
  1. yarn.nodemanager.aux-services
  2. mapreduce_shuffle
  3. yarn.nodemanager.aux-services.mapreduce.shuffle.class
  4. org.apache.hadoop.mapred.ShuffleHandler
  5. yarn.resourcemanager.address
  6. dn-1:8032
  7. yarn.resourcemanager.scheduler.address
  8. dn-1:8030
  9. yarn.resourcemanager.resource-tracker.address
  10. dn-1:8031
  11. yarn.resourcemanager.admin.address
  12. dn-1:8033
  13. yarn.resourcemanager.webapp.address
  14. dn-1:8088

3.7 修改slaves

将所有的DataNode节点加入到slaves文件中:
    
    
    
    
  1. dn-1
  2. dn-2
  3. dn-3


3.8 修改yarn-env.sh

     
     
     
     
  1. # some Java parameters
  2. # export JAVA_HOME=/home/y/libexec/jdk1.6.0/
  3. if [ "$JAVA_HOME" != "" ]; then
  4. #echo "run java in $JAVA_HOME"
  5. JAVA_HOME=/usr/local/java/jdk1.7.0_79
  6. fi
  7. JAVA_HEAP_MAX=-Xmx15000m
  8. YARN_HEAPSIZE=15000
  9. export YARN_RESOURCEMANAGER_HEAPSIZE=5000
  10. export YARN_TIMELINESERVER_HEAPSIZE=10000
  11. export YARN_NODEMANAGER_HEAPSIZE=10000

3.9 分发配置好的hadoop目录到所有节点

     
     
     
     
  1. [hadoop@nn-1 ~]$ scp -rp hadoop hadoop@nn-2:/hadoop
  2. [hadoop@nn-1 ~]$ scp -rp hadoop hadoop@dn-1:/hadoop
  3. [hadoop@nn-1 ~]$ scp -rp hadoop hadoop@dn-2:/hadoop
  4. [hadoop@nn-1 ~]$ scp -rp hadoop hadoop@dn-3:/hadoop

4 安装配置zookeeper

切换到hadoop目录下面, 根据规划, 三台zookeeper节点为:nn-1, dn-1, dn-2。
先在nn-1节点配置zookeeper, 然后分发至三个zookeeper节点:
4.1 在 nn-1上传并解压zookeeper

4.2 修改配置文件/hadoop/zookeeper/conf/zoo.cfg

    
    
    
    
  1. dataDir=/hadoop/zookeeper/data/
  2. dataLogDir=/hadoop/zookeeper/log/
  3. # the port at which the clients will connect
  4. clientPort=2181
  5. server.1=nn-1:2887:3887
  6. server.2=dn-1:2888:3888
  7. server.3=dn-2:2889:3889

4.3 从nn-1分发配置的zookeeper目录到其他节点

    
    
    
    
  1. [hadoop@nn-1 ~]$ scp -rp zookeeper hadoop@dn-1:/hadoop
  2. [hadoop@nn-1 ~]$ scp -rp zookeeper hadoop@dn-2:/hadoop

4.4 在所有zk节点创建目录

    
    
    
    
  1. [hadoop@dn-1 ~]$ mkdir /hadoop/zookeeper/data/
  2. [hadoop@dn-1 ~]$ mkdir /hadoop/zookeeper/log/

4.5 修改myid

在所有zk节点, 切换到目录/hadoop/zookeeper/data,创建myid文件:
注意:myid文件的内容为zoo.cfg文件中配置的server.后面的数字(即nn-1为1,dn-1为2,dn-2为3)。
在nn-1节点的myid内容为:
    
    
    
    
  1. [hadoop@nn-1 data]$ echo 1 > /hadoop/zookeeper/data/myid

其他zk节点也安要求创建myid文件。


4.6 设置环境变量

    
    
    
    
  1. $ echo "export ZOOKEEPER_HOME=/hadoop/zookeeper" >> $HOME/.bash_profile
  2. $ echo "export PATH=$ZOOKEEPER_HOME/bin:\$PATH" >> $HOME/.bash_profile
  3. $ source $HOME/.bash_profile


5 集群启动

5.1 启动zookeeper

根据规划, zk的节点为nn-1、dn-1和dn-2, 在这三台节点分别启动zk:

启动命令:
    
    
    
    
  1. [hadoop@nn-1 ~]$ /hadoop/zookeeper/bin/zkServer.sh start
  2. JMX enabled by default
  3. Using config: /hadoop/zookeeper/bin/../conf/zoo.cfg
  4. Starting zookeeper ... STARTED

查看进程, 可以看到QuorumPeerMain:
    
    
    
    
  1. [hadoop@nn-1 ~]$ jps
  2. 9382 QuorumPeerMain
  3. 9407 Jps

查看状态, 可以看到Mode: follower, 说明这是zk的从节点:
    
    
    
    
  1. [hadoop@nn-1 ~]$ /hadoop/zookeeper/bin/zkServer.sh status
  2. JMX enabled by default
  3. Using config: /hadoop/zookeeper/bin/../conf/zoo.cfg
  4. Mode: follower

查看状态, 可以看到 Mode: leader , 说明这是zk的leader节点:
    
    
    
    
  1. [hadoop@dn-1 data]$ /hadoop/zookeeper/bin/zkServer.sh status
  2. JMX enabled by default
  3. Using config: /hadoop/zookeeper/bin/../conf/zoo.cfg
  4. Mode: leader

5.2 格式化zookeeper集群(只做一次)(机器nn-1上执行)


    
    
    
    
  1. [hadoop@nn-1 ~]$ /hadoop/hadoop/bin/hdfs zkfc -formatZK
中间有个交互的步骤, 输入Y:
Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第1张图片
 
进入zk, 查看是否创建成功:
    
    
    
    
  1. [hadoop@nn-1 bin]$ ./zkCli.sh
Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第2张图片
 

5.3 启动zkfc(机器nn-1,nn-2上执行)

    
    
    
    
  1. [hadoop@nn-1 ~]$ /hadoop/hadoop/sbin/hadoop-daemon.sh start zkfc
  2. starting zkfc, logging to /hadoop/hadoop/logs/hadoop-hadoop-zkfc-nn-1.out

使用jps, 可以看到进程DFSZKFailoverController:
    
    
    
    
  1. [hadoop@nn-1 ~]$ jps
  2. 9681 Jps
  3. 9638 DFSZKFailoverController
  4. 9382 QuorumPeerMain

 

5.4 启动journalnode

根据规划, 启动journalnode节点为nn-1、nn-2和dn-1, 在这三个节点分别使用如下的命令启动服务:
    
    
    
    
  1. [hadoop@nn-1 ~]$ /hadoop/hadoop/sbin/hadoop-daemon.sh start journalnode
  2. starting journalnode, logging to /hadoop/hadoop/logs/hadoop-hadoop-journalnode-nn-1.out

使用jps命令可以看到进程JournalNode:
    
    
    
    
  1. [hadoop@nn-1 ~]$ jps
  2. 9714 JournalNode
  3. 9638 DFSZKFailoverController
  4. 9382 QuorumPeerMain
  5. 9762 Jps

5.5 格式化namenode(机器nn-1上执行)

    
    
    
    
  1. [hadoop@nn-1 ~]$ /hadoop/hadoop/bin/hadoop namenode -format

查看日志信息:
Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第3张图片
 

5.6 启动namenode(机器nn-1上执行)

    
    
    
    
  1. [hadoop@nn-1 ~]$ /hadoop/hadoop/sbin/hadoop-daemon.sh start namenode
  2. starting namenode, logging to /hadoop/hadoop/logs/hadoop-hadoop-namenode-nn-1.out
使用jps命令可以看到进程NameNode:
    
    
    
    
  1. [hadoop@nn-1 ~]$ jps
  2. 9714 JournalNode
  3. 9638 DFSZKFailoverController
  4. 9382 QuorumPeerMain
  5. 10157 NameNode
  6. 10269 Jps

5.7 格式化secondnamnode(机器nn-2上执行)

    
    
    
    
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs namenode -bootstrapStandby
部分日志如下:
Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第4张图片
 

5.8 启动namenode(机器nn-2上执行)

    
    
    
    
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/sbin/hadoop-daemon.sh start namenode
  2. starting namenode, logging to /hadoop/hadoop/logs/hadoop-hadoop-namenode-nn-2.out
使用jps命令可以看到进程NameNode:
    
    
    
    
  1. [hadoop@nn-2 ~]$ jps
  2. 53990 NameNode
  3. 54083 Jps
  4. 53824 JournalNode
  5. 53708 DFSZKFailoverController

5.9 启动datanode(机器dn-1到dn-3上执行)

    
    
    
    
  1. [hadoop@dn-1 ~]$ /hadoop/hadoop/sbin/hadoop-daemon.sh start datanode
使用jps可以看到DataNode进程:
    
    
    
    
  1. [hadoop@dn-1 temp]$ jps
  2. 57007 Jps
  3. 56927 DataNode
  4. 56223 QuorumPeerMain


5.10 启动resourcemanager

根据规划,resourcemanager服务在节点dn-1上面, 在dn-1上面启动resourcemanager:
    
    
    
    
  1. [hadoop@dn-1 ~]$ /hadoop/hadoop/sbin/yarn-daemon.sh start resourcemanager
  2. starting resourcemanager, logging to /hadoop/hadoop/logs/yarn-hadoop-resourcemanager-dn-1.out

使用jps, 可以看到进程ResourceManager:
    
    
    
    
  1. [hadoop@dn-1 ~]$ jps
  2. 57173 QuorumPeerMain
  3. 58317 Jps
  4. 57283 JournalNode
  5. 58270 ResourceManager
  6. 58149 DataNode

5.11 启动jobhistory

根据规划, jobhistory服务在nn-1上面, 使用如下命令启动:
    
    
    
    
  1. [hadoop@nn-1 ~]$ /hadoop/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
  2. starting historyserver, logging to /hadoop/hadoop/logs/mapred-hadoop-historyserver-nn-1.out

使用jps, 可以看到进程JobHistoryServer:
    
    
    
    
  1. [hadoop@nn-1 ~]$ jps
  2. 11210 JobHistoryServer
  3. 9714 JournalNode
  4. 9638 DFSZKFailoverController
  5. 9382 QuorumPeerMain
  6. 11039 NameNode
  7. 11303 Jps

5.12 启动NodeManager

根据规划, dn-1、dn-2和dn-3是nodemanager, 在这三个节点启动NodeManager:
    
    
    
    
  1. [hadoop@dn-1 ~]$ /hadoop/hadoop/sbin/yarn-daemon.sh start nodemanager
  2. starting nodemanager, logging to /hadoop/hadoop/logs/yarn-hadoop-nodemanager-dn-1.out

使用jps可以看到进程NodeManager:
    
    
    
    
  1. [hadoop@dn-1 ~]$ jps
  2. 58559 NodeManager
  3. 57173 QuorumPeerMain
  4. 58668 Jps
  5. 57283 JournalNode
  6. 58270 ResourceManager
  7. 58149 DataNode


6、安装后查看和验证


6.1 HDFS相关操作命令

查看NameNode状态的命令
     
     
     
     
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs haadmin -getServiceState nn-1

手工切换,将active的NameNode从nn-1切换到nn-2 。
      
      
      
      
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs haadmin -DfSHAadmin -failover nn-1 nn-2
Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第5张图片
 
NameNode健康检查:
      
      
      
      
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs haadmin -checkHealth nn-1
 将其中一台NameNode给kill后, 查看健康状态:
Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第6张图片
 


查看所有的DataNode列表:
     
     
     
     
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs dfsadmin -report | more
Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第7张图片
 
查看正常DataNode列表:
        
        
        
        
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs dfsadmin -report -live
  2. 17/03/01 22:49:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Configured Capacity: 224954695680 (209.51 GB)
  4. Present Capacity: 180557139968 (168.16 GB)
  5. DFS Remaining: 179963428864 (167.60 GB)
  6. DFS Used: 593711104 (566.21 MB)
  7. DFS Used%: 0.33%
  8. Under replicated blocks: 2
  9. Blocks with corrupt replicas: 0
  10. Missing blocks: 0
  11. -------------------------------------------------
  12. Live datanodes (3):
  13. Name: 192.168.9.23:50010 (dn-1)
  14. Hostname: dn-1
  15. Rack: /rack2
  16. Decommission Status : Normal
  17. Configured Capacity: 74984898560 (69.84 GB)
  18. DFS Used: 197902336 (188.73 MB)
  19. Non DFS Used: 14869356544 (13.85 GB)
  20. DFS Remaining: 59917639680 (55.80 GB)
  21. DFS Used%: 0.26%
  22. DFS Remaining%: 79.91%
  23. Configured Cache Capacity: 0 (0 B)
  24. Cache Used: 0 (0 B)
  25. Cache Remaining: 0 (0 B)
  26. Cache Used%: 100.00%
  27. Cache Remaining%: 0.00%
  28. Xceivers: 1
  29. Last contact: Wed Mar 01 22:49:42 CST 2017

查看异常DataNode列表:
        
        
        
        
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs dfsadmin -report -dead

获取指定DataNode信息(运行时间及版本等):
      
      
      
      
  1. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs haadmin -checkHealth nn-2
  2. 17/03/01 22:55:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. [hadoop@nn-2 ~]$ /hadoop/hadoop/bin/hdfs haadmin -checkHealth nn-1
  4. 17/03/01 22:55:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


6.2 YARN相关的命令

查看resourceManager状态的命令:
    
    
    
    
  1. [hadoop@dn-1 hadoop]$ yarn rmadmin -getServiceState rm1
  2. active
  3. [hadoop@dn-1 hadoop]$ yarn rmadmin -getServiceState rm2
  4. standby

查看所有的yarn节点:
     
     
     
     
  1. [hadoop@dn-1 hadoop]$ /hadoop/hadoop/bin/yarn node -all -list
  2. 17/03/01 23:06:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Total Nodes:3
  4. Node-Id Node-State Node-Http-Address Number-of-Running-Containers
  5. dn-2:55506 RUNNING dn-2:8042 0
  6. dn-1:56447 RUNNING dn-1:8042 0
  7. dn-3:37533 RUNNING dn-3:8042 0

查看正常的yarn节点:
      
      
      
      
  1. [hadoop@dn-1 hadoop]$ /hadoop/hadoop/bin/yarn node -list
  2. 17/03/01 23:07:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Total Nodes:3
  4. Node-Id Node-State Node-Http-Address Number-of-Running-Containers
  5. dn-2:55506 RUNNING dn-2:8042 0
  6. dn-1:56447 RUNNING dn-1:8042 0
  7. dn-3:37533 RUNNING dn-3:8042 0

查看指定节点的信息:
/hadoop/hadoop/bin/yarn node -status
      
      
      
      
  1. [hadoop@dn-1 hadoop]$ /hadoop/hadoop/bin/yarn node -status dn-2:55506
  2. 17/03/01 23:08:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Node Report :
  4. Node-Id : dn-2:55506
  5. Rack : /default-rack
  6. Node-State : RUNNING
  7. Node-Http-Address : dn-2:8042
  8. Last-Health-Update : 星期三 01/三月/17 11:06:21:373CST
  9. Health-Report :
  10. Containers : 0
  11. Memory-Used : 0MB
  12. Memory-Capacity : 8192MB
  13. CPU-Used : 0 vcores
  14. CPU-Capacity : 8 vcores
  15. Node-Labels :

查看当前运行的MapReduce任务:
      
      
      
      
  1. [hadoop@dn-2 ~]$ /hadoop/hadoop/bin/yarn application -list
  2. 17/03/01 23:10:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
  4. Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
  5. application_1488375590901_0004 QuasiMonteCarlo MAPREDUCE hadoop default RUNNING UNDEFINED


6.3 使用自带的例子测试

    
    
    
    
  1. [hadoop@dn-1 ~]$ cd hadoop/
  2. [hadoop@dn-1 hadoop]$
  3. [hadoop@dn-1 hadoop]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 200

    
    
    
    
  1. [hadoop@dn-1 hadoop]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 200
  2. Number of Maps = 2
  3. Samples per Map = 200
  4. 17/02/28 01:51:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  5. Wrote input for Map #0
  6. Wrote input for Map #1
  7. Starting Job
  8. 17/02/28 01:51:15 INFO input.FileInputFormat: Total input paths to process : 2
  9. 17/02/28 01:51:15 INFO mapreduce.JobSubmitter: number of splits:2
  10. 17/02/28 01:51:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1488216892564_0001
  11. 17/02/28 01:51:16 INFO impl.YarnClientImpl: Submitted application application_1488216892564_0001
  12. 17/02/28 01:51:16 INFO mapreduce.Job: The url to track the job: http://dn-1:8088/proxy/application_1488216892564_0001/
  13. 17/02/28 01:51:16 INFO mapreduce.Job: Running job: job_1488216892564_0001
  14. 17/02/28 01:51:24 INFO mapreduce.Job: Job job_1488216892564_0001 running in uber mode : false
  15. 17/02/28 01:51:24 INFO mapreduce.Job: map 0% reduce 0%
  16. 17/02/28 01:51:38 INFO mapreduce.Job: map 100% reduce 0%
  17. 17/02/28 01:51:49 INFO mapreduce.Job: map 100% reduce 100%
  18. 17/02/28 01:51:49 INFO mapreduce.Job: Job job_1488216892564_0001 completed successfully
  19. 17/02/28 01:51:50 INFO mapreduce.Job: Counters: 49
  20. File System Counters
  21. FILE: Number of bytes read=50
  22. FILE: Number of bytes written=326922
  23. FILE: Number of read operations=0
  24. FILE: Number of large read operations=0
  25. FILE: Number of write operations=0
  26. HDFS: Number of bytes read=510
  27. HDFS: Number of bytes written=215
  28. HDFS: Number of read operations=11
  29. HDFS: Number of large read operations=0
  30. HDFS: Number of write operations=3
  31. Job Counters
  32. Launched map tasks=2
  33. Launched reduce tasks=1
  34. Data-local map tasks=2
  35. Total time spent by all maps in occupied slots (ms)=25604
  36. Total time spent by all reduces in occupied slots (ms)=7267
  37. Total time spent by all map tasks (ms)=25604
  38. Total time spent by all reduce tasks (ms)=7267
  39. Total vcore-seconds taken by all map tasks=25604
  40. Total vcore-seconds taken by all reduce tasks=7267
  41. Total megabyte-seconds taken by all map tasks=26218496
  42. Total megabyte-seconds taken by all reduce tasks=7441408
  43. Map-Reduce Framework
  44. Map input records=2
  45. Map output records=4
  46. Map output bytes=36
  47. Map output materialized bytes=56
  48. Input split bytes=274
  49. Combine input records=0
  50. Combine output records=0
  51. Reduce input groups=2
  52. Reduce shuffle bytes=56
  53. Reduce input records=4
  54. Reduce output records=0
  55. Spilled Records=8
  56. Shuffled Maps =2
  57. Failed Shuffles=0
  58. Merged Map outputs=2
  59. GC time elapsed (ms)=419
  60. CPU time spent (ms)=6940
  61. Physical memory (bytes) snapshot=525877248
  62. Virtual memory (bytes) snapshot=2535231488
  63. Total committed heap usage (bytes)=260186112
  64. Shuffle Errors
  65. BAD_ID=0
  66. CONNECTION=0
  67. IO_ERROR=0
  68. WRONG_LENGTH=0
  69. WRONG_MAP=0
  70. WRONG_REDUCE=0
  71. File Input Format Counters
  72. Bytes Read=236
  73. File Output Format Counters
  74. Bytes Written=97
  75. Job Finished in 35.466 seconds
  76. Estimated value of Pi is 3.17000000000000000000

6.4 查看NameNode

 链接分别为:
http://192.168.9.21:50070/
http://192.168.9.22:50070/

192.168.9.21和 192.168.9.22分别为NameNode和Secondary NameNode的地址。
Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第8张图片
 
Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第9张图片
 



6.5 查看NameNode 的HA切换是否正常

将nn-1上状态为active的NameNode进程kill, 查看nn-2上的NameNode能否从standby切换为active:
Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第10张图片
 

切换成功: http://192.168.9.22:50070/
Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第11张图片
 


6.6 查看RM页面

http://192.168.9.23:8088/

其中192.168.9.23为Resource服务所在的节点。
Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第12张图片
 



7、安装Spark


规划, 在现有的Hadoop集群安装spark集群:
master节点: nn-1
worker节点: nn-2、dn-1、dn-2、dn-3。

7.1 安装配置Scala

上传安装包到nn-1的/hadoop目录下面,解压:
    
    
    
    
  1. [hadoop@nn-1 ~]$ tar -xzvf spark-1.6.0-bin-hadoop2.6.tgz
环境变量后面统一配置。

7.2 安装spark


上传安装包spark-1.6.0-bin-hadoop2.6.tgz到nn-1的目录/hadoop下面, 解压
    
    
    
    
  1. [hadoop@nn-1 ~]$ tar -xzvf spark-1.6.0-bin-hadoop2.6.tgz

进入目录:/hadoop/spark-1.6.0-bin-hadoop2.6/conf
复制生成文件spark-env.sh和slaves:
    
    
    
    
  1. [hadoop@nn-1 conf]$ pwd
  2. /hadoop/spark-1.6.0-bin-hadoop2.6/conf
  3. [hadoop@nn-1 conf]$ cp spark-env.sh.template spark-env.sh
  4. [hadoop@nn-1 conf]$ cp slaves.template slaves
编辑 spark-env.sh, 加入如下内容:
    
    
    
    
  1. export JAVA_HOME=/usr/local/java/jdk1.7.0_79
  2. export SCALA_HOME=/hadoop/scala-2.11.7
  3. export SPARK_HOME=/hadoop/spark-1.6.0-bin-hadoop2.6
  4. export SPARK_MASTER_IP=nn-1
  5. export SPARK_WORKER_MEMORY=2g
  6. export HADOOP_CONF_DIR=/hadoop/hadoop/etc/hadoop
SPARK_WORKER_MEMORY根据实际情况配置。

编辑 spark-env.sh, 加入如下内容: slaves
     
     
     
     
  1. nn-2
  2. dn-1
  3. dn-2
  4. dn-3
slaves指定的是worker节点。

7.3 配置环境变量

    
    
    
    
  1. [hadoop@nn-1 ~]$ vim .bash_profile
追加如下内容:
    
    
    
    
  1. export HADOOP_HOME=/hadoop/hadoop
  2. export SCALA_HOME=/hadoop/scala-2.11.7
  3. export SPARK_HOME=/hadoop/spark-1.6.0-bin-hadoop2.6
  4. export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH

7.4 分发上面配置好的scala和spark目录到其他节点

    
    
    
    
  1. [hadoop@nn-1 bin]$ cd /hadoop
  2. [hadoop@nn-1 ~]$ scp -rp spark-1.6.0-bin-hadoop2.6 hadoop@dn-1:/hadoop
  3. [hadoop@nn-1 ~]$ scp -rp scala-2.11.7 hadoop@dn-1:/hadoop

7.5 启动Spark集群

    
    
    
    
  1. [hadoop@nn-1 ~]$ /hadoop/spark-1.6.0-bin-hadoop2.6/sbin/start-all.sh

在nn-1和其他slaves节点查看进程:
在nn-1节点, 可以看到Master进程:
    
    
    
    
  1. [hadoop@nn-1 ~]$ jps
  2. 2473 JournalNode
  3. 2541 NameNode
  4. 4401 Jps
  5. 2399 DFSZKFailoverController
  6. 2687 JobHistoryServer
  7. 2775 Master
  8. 2351 QuorumPeerMain

slaves节点可以看到Worker进程:
     
     
     
     
  1. [hadoop@dn-1 ~]$ jps
  2. 2522 NodeManager
  3. 3449 Jps
  4. 2007 QuorumPeerMain
  5. 2141 DataNode
  6. 2688 Worker
  7. 2061 JournalNode
  8. 2258 ResourceManager

查看spark页面:
http://192.168.9.21:8080/

Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第13张图片
 

7.6 运行测试案例

./bin/spark-submit --class org.apache.spark.examples.SparkPi \

                   --master yarn --deploy-mode cluster \

                   --driver-memory 100M \

                   --executor-memory 200M \

                   --executor-cores 1 \

                   --queue default \

                   lib/spark-examples*.jar 10

或者:

./bin/spark-submit --class org.apache.spark.examples.SparkPi \

                   --master yarn --deploy-mode cluster \

                   --executor-cores 1 \

                   --queue default \

                   lib/spark-examples*.jar 10


Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第14张图片
  Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第15张图片
 
Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第16张图片
 



8、配置机架感知

在nn-1和nn-2节点的配置文件/hadoop/hadoop/etc/hadoop/core-site.xml加入如下配置:
    
    
    
    
  1. <property>
  2. <name>topology.script.file.namename>
  3. <value>/hadoop/hadoop/etc/hadoop/RackAware.pyvalue>
  4. property>
新增文件:/hadoop/hadoop/etc/hadoop/RackAware.py,内容如下:
    
    
    
    
  1. #!/usr/bin/python
  2. #-*-coding:UTF-8 -*-
  3. import sys
  4. rack = {"dn-1":"rack2",
  5. "dn-2":"rack1",
  6. "dn-3":"rack1",
  7. "192.168.9.23":"rack2",
  8. "192.168.9.24":"rack1",
  9. "192.168.9.25":"rack1",
  10. }
  11. if __name__=="__main__":
  12. print "/" + rack.get(sys.argv[1],"rack0")
设置权限:
    
    
    
    
  1. [root@nn-1 hadoop]# chmod +x RackAware.py
  2. [root@nn-1 hadoop]# ll RackAware.py
  3. -rwxr-xr-x 1 hadoop hadoop 294 3 1 21:24 RackAware.py

重启nn-1和nn-2上的NameNode服务:
    
    
    
    
  1. [hadoop@nn-1 ~]$ hadoop-daemon.sh stop namenode
  2. stopping namenode
  3. [hadoop@nn-1 ~]$ hadoop-daemon.sh start namenode
  4. starting namenode, logging to /hadoop/hadoop/logs/hadoop-hadoop-namenode-nn-1.out

查看日志:
    
    
    
    
  1. [root@nn-1 logs]# pwd
  2. /hadoop/hadoop/logs
  3. [root@nn-1 logs]# vim hadoop-hadoop-namenode-nn-1.log

Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知)_第17张图片
 


使用命令查看拓扑:
      
      
      
      
  1. [hadoop@dn-3 ~]$ hdfs dfsadmin -printTopology
  2. 17/03/02 00:21:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Rack: /rack1
  4. 192.168.9.24:50010 (dn-2)
  5. 192.168.9.25:50010 (dn-3)
  6. Rack: /rack2
  7. 192.168.9.23:50010 (dn-1)







你可能感兴趣的:(Apache Hadoop集群安装(NameNode HA + SPARK + 机架感知))