hadoop高可用分布式集群搭建

目录

一、HA介绍
二、环境准备
    2.1 机器准备及节点规划
    2.2 /etc/hosts配置
    2.3 配置ssh免密登录
三、安装](#3)
    3.1 安装zookeeper
    3.2 安装包准备
    3.3 新建/opt/hadoop目录
    3.4 上传安装包
    3.5 解压并创建目录
    3.6 配置环境变量
    3.7 集群配置
        3.7.1 配置hadoop-env.sh
        3.7.2 配置core-site.xml文件
        3.7.3 配置hdfs-site.xml文件
        3.7.4 创建并修改mapred-site.xml文件
        3.7.5 修改yarn-site.xml文件
        3.7.6 修改slaves文件
    3.8 集群初始化和启动
    3.9 集群日常启动与关闭
        3.9.1 正常的启动顺序
        3.9.2 正常的关闭顺序

一、HA介绍

Hadoop的HA包含了HDFS的HA、YARN的HA,HA架构和方案详见: https://www.jianshu.com/p/7c697f146674

二、环境准备

2.1 机器准备及节点规划

host ip os 节点规划-hdfs 节点规划-yarn 节点规划-zookeeper
hadoop-1 192.168.90.131 Ubuntu 18.04.2 LTS NameNode(active) 、DFSZKFailoverController ResourceManager(standby) zookeeper
hadoop-2 192.168.90.132 Ubuntu 18.04.2 LTS NameNode(standby)、DFSZKFailoverController ResourceManager(active) zookeeper
hadoop-3 192.168.90.133 Ubuntu 18.04.2 LTS DateNode 、JournalNode NodeManager zookeeper
hadoop-4 192.168.90.134 Ubuntu 18.04.2 LTS DateNode 、JournalNode NodeManager zookeeper(observer)
hadoop-5 192.168.90.135 Ubuntu 18.04.2 LTS DateNode 、JournalNode NodeManager

2.2 /etc/hosts配置

在每台机器上编辑/etc/hosts文件, 加入如下内容:

192.168.90.131 hadoop-1                                                                                             
192.168.90.132 hadoop-2                                                                                                 
192.168.90.133 hadoop-3                                                                                                 
192.168.90.134 hadoop-4                                                                                                 
192.168.90.135 hadoop-5  

2.3 配置ssh免密登录

  1. 对每台机器生成秘钥文件,以root用户登录,生成空字符串秘钥,执行:

    ssh-keygen -t rsa -P ''
    
    

    提示Enter file in which to save the key (/root/.ssh/id_rsa):,直接回车(用默认文件/root/.ssh/id_rsa)。执行完成后在/root/.ssh目录下有3个文件:authorized_keys、id_rsa、id_rsa.pub。(如果没有authorized_keys可以通过touch authorized_keys手动生成)。

  2. 将所有机器上的id_rsa.pub的内容合并到authorized_keys文件中(这里“合并”的意思是每台机器上authorized_keys文件包含所有机器上id_rsa.pub文件的内容)。

  3. 在每台机器上用ssh命令测试登录其它机器
    如果提示类如如下认证信息:

The authenticity of host 'hadoop-2 (192.168.90.132)' can't be established.
ECDSA key fingerprint is SHA256:XEhSC0caRxdbv0eHNBo8c7VULr7vhj5pM2bt3frOEAA.
Are you sure you want to continue connecting (yes/no)?

输入yes回车即可,后面可以直接ssh登录。

如果是非root用户(例如本文后面建立的hadoop用户)ssh免密登录,需要在.ssh目录下执行chmod 600 authorized_keys, 将authorized_keys权限改为600 。

三、安装

3.1 安装zookeeper

本文用hadoop-1 ~ hadoop-4安装zookeeper集群,参考: https://www.jianshu.com/p/e9becafcbaa7

3.2 安装包准备

下载hadoop安装包,地址:https://archive.apache.org/dist/hadoop/common/hadoop-2.8.3/

3.3 新建/opt/hadoop目录

 mkdir /opt/hadoop

3.4 上传安装包

上传hadoop-2.8.3.tar.gz到/opt/hadoop目录

3.5 解压并创建目录:

 cd /opt/hadoop
 tar -zxvf hadoop-2.8.3.tar.gz 
 mkdir hdfs
 cd hdfs
 mkdir data name tmp pid journalnode logs
 
 cd ../
 mkdir yarn
 cd yarn 
 mkdir logs local staging

创建的hdfs下的date、name、tmp、pid在后续配置中会用到,分别用来配置hdfs的data目录、hdfs的namenode目录、hadoop的tmp目录、pid文件存放目录、journalnode存储目录、日志存储目录。yarn下目录用于配置yarn运行的相关目录。

3.6 配置环境变量

export HADOOP_HOME=/opt/hadoop/hadoop-2.8.3                                                                                
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH  

3.7 集群配置

涉及文件如下表所示:

文件 说明
hadoop-env.sh hadoop运行环境配置
core-site.xml Common组件,定义系统级别参数,如hdfs url等
hdfs-site.xml HDFS组件
mapred-site.xml MapReduce组件
yarn-site.xml YARN组件
slaves slaves节点
3.7.1 配置hadoop-env.sh

编辑该文件,在文件开始处设置JAVA_HOME环境变量(和/etc/profile中的一致),如:

JAVA_HOME=/opt/jdk/jdk1.8.0_231

完整配置可参考:


# The java implementation to use.
JAVA_HOME=/opt/jdk/jdk1.8.0_231

#export JSVC_HOME=${JSVC_HOME}

export HADOOP_LOGFILE=${USER}-hadoop.log 
export HADOOP_ROOT_LOGGER=INFO,DRFA,console 
export HADOOP_MAPRED_ROOT_LOGGER=INFO,DRFA,console
export HDFS_AUDIT_LOGGER=WARN,DRFA,console
export HADOOP_SECURITY_LOGGER=INFO,DRFA,console

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
export HADOOP_HOME=/opt/hadoop/hadoop-2.8.3
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME

# Extra Java CLASSPATH elements.  Automatically insert capacity-scheduler.                                                 
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do                                                                 
  if [ "$HADOOP_CLASSPATH" ]; then                                                                                         
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f                                                                           
  else                                                                                                                     
    export HADOOP_CLASSPATH=$f                                                                                             
  fi                                                                                                                       
done 

# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=3072
#export HADOOP_NAMENODE_INIT_HEAPSIZE=""

# Enable extra debugging of Hadoop's JAAS binding, used to set up  
# Kerberos security.                                                                                                    
# export HADOOP_JAAS_DEBUG=true 

# Extra Java runtime options.  Empty by default.
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:GCPauseIntervalMillis=100 -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-WARN,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-WARN,NullAppender} $HADOOP_NAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:GCPauseIntervalMillis=100 -Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"

export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-WARN,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-WARN,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"

export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"
export HADOOP_PORTMAP_OPTS="-Xmx3072m $HADOOP_PORTMAP_OPTS"

# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS"                                                                            
# set heap args when HADOOP_HEAPSIZE is empty                                                                              
if [ "$HADOOP_HEAPSIZE" = "" ]; then                                                                                       
  export HADOOP_CLIENT_OPTS="-Xmx3072m $HADOOP_CLIENT_OPTS"                                                                 
fi


#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS"

# On secure datanodes, user to run the datanode as after dropping privileges.
# This **MUST** be uncommented to enable secure HDFS if using privileged ports
# to provide authentication of data transfer protocol.  This **MUST NOT** be
# defined if SASL is configured for authentication of data transfer protocol
# using non-privileged ports.
export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}

# Where log files are stored.  $HADOOP_HOME/logs by default.
export HADOOP_LOG_DIR=/opt/hadoop/hdfs/logs

# Where log files are stored in the secure data environment.
export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}

###
# HDFS Mover specific parameters
###
# Specify the JVM options to be used when starting the HDFS Mover.
# These options will be appended to the options specified as HADOOP_OPTS
# and therefore may override any similar flags set in HADOOP_OPTS
#
# export HADOOP_MOVER_OPTS=""

###
# Advanced Users Only!clusterManager
###

# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by 
#       the user that will run the hadoop daemons.  Otherwise there is the
#       potential for a symlink attack.
export HADOOP_PID_DIR=/opt/hadoop/hdfs/pid
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}

# A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*

#for f in $HADOOP_HOME/share/hadoop/tools/lib/*.jar; do
#  if [ "$HADOOP_CLASSPATH" ]; then
#    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
#  else
#    export HADOOP_CLASSPATH=$f
#  fi
#done

3.7.2 配置core-site.xml文件

配置可参考:




 
    
    
   
    fs.defaultFS  
    hdfs://ns 
    
    
   
    hadoop.tmp.dir  
    /opt/hadoop/hdfs/tmp 
    
    
   
    io.file.buffer.size  
    4096 
    
    
   
    fs.trash.checkpoint.interval  
    0 
    
    
    
   
    fs.trash.interval  
    1440 
    
    
   
    ha.zookeeper.quorum  
    hadoop-1:2181,hadoop-2:2181,hadoop-3:2181,hadoop-4:2181 
    
    
   
    ha.zookeeper.session-timeout.ms  
    2000 
    
    
   
    hadoop.proxyuser.hadoop.hosts  
    * 
    
    
   
    hadoop.proxyuser.hadoop.groups  
    * 
    
    
   
    io.compression.codecs  
    org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec, org.apache.hadoop.io.compress.SnappyCodec 
   


3.7.3 配置hdfs-site.xml文件

配置可参考:




 
    
   
    dfs.nameservices  
    ns 
    
    
   
    dfs.ha.namenodes.ns  
    nn1,nn2 
    
    
   
    dfs.namenode.rpc-address.ns.nn1  
    hadoop-1:9000 
    
    
   
    dfs.namenode.http-address.ns.nn1  
    hadoop-1:50070 
    
    
   
    dfs.namenode.rpc-address.ns.nn2  
    hadoop-2:9000 
    
    
   
    dfs.namenode.http-address.ns.nn2  
    hadoop-2:50070 
    
    
   
    dfs.namenode.shared.edits.dir  
    qjournal://hadoop-3:8485;hadoop-4:8485;hadoop-5:8485/ns 
    
    
    
   
    dfs.journalnode.edits.dir  
    /opt/hadoop/hdfs/journalnode 
    
    
   
    dfs.ha.automatic-failover.enabled  
    true 
    
    
   
    dfs.client.failover.proxy.provider.ns  
    org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider 
    
    
   
    dfs.ha.fencing.methods  
    sshfence
    shell(/bin/true) 
    
    
   
    dfs.ha.fencing.ssh.private-key-files  
    /root/.ssh/id_rsa 
    
    
   
    dfs.namenode.name.dir  
    /opt/hadoop/hdfs/name 
    
    
   
    dfs.datanode.data.dir  
    /opt/hadoop/hdfs/data 
    
    
   
    dfs.replication  
    2 
    
    
   
    dfs.webhdfs.enabled  
    true 
    
    
   
    dfs.client.slow.io.warning.threshold.ms  
    90000 
    
    
   
    dfs.heartbeat.interval  
    8 
    
    
   
    dfs.namenode.heartbeat.recheck-interval  
    90000 
    

    
   
    dfs.namenode.checkpoint.preiod  
    3600 
  

    
   
    dfs.namenode.checkpoint.txns  
    1000000 
  
  
    
   
    dfs.blockreport.intervalMsec  
    1800000 
    
    
   
    dfs.datanode.directoryscan.interval  
    1800 
    
   
    dfs.datanode.max.xcievers  
    8000 
    
    
   
    dfs.hosts  
    /opt/hadoop/hadoop-2.8.3/etc/hadoop/slaves 
    
    
    
   
    dfs.balance.bandwidthPerSec  
    10485760 
    
    
   
    dfs.blocksize  
    67108864 
    
    
   
    dfs.namenode.handler.count  
    64 
    
    
   
    dfs.datanode.max.transfer.threads  
    36867 
    
    
   
    dfs.datanode.directoryscan.threads  
    18 
    
    
   
    dfs.datanode.handler.count  
    128 
    
    
   
    dfs.datanode.slow.io.warning.threshold.ms  
    1000 
   



3.7.4 创建并修改mapred-site.xml文件

先执行下面命令创建mapred-site.xml文件。

cp mapred-site.xml.template mapred-site.xml

编辑该文件,配置可参考:




 
    
   
    mapreduce.framework.name  
    yarn 
    
    
   
    mapred.local.dir  
    /opt/hadoop/yarn/local 
    
    
   
    mapreduce.map.java.opts  
    -Xmx4096m 
    
    
   
    mapreduce.map.memory.mb  
    4096 
    
    
   
    mapreduce.reduce.java.opts  
    -Xmx4096m 
    
    
   
    mapreduce.reduce.memory.mb  
    4096 
    
    
    
   
    mapreduce.jobhistory.cleaner.interval-ms  
    604800000 
    
    
   
    mapreduce.jobhistory.joblist.cache.size  
    20000 
    
    
   
    mapreduce.jobhistory.datestring.cache.size  
    200000 
    
   
    mapreduce.jobhistory.cleaner.enable  
    true 
    
    
   
    mapreduce.jobhistory.max-age-ms  
    604800000 
   




3.7.5 修改yarn-site.xml文件



 
   
    yarn.nodemanager.aux-services  
    mapreduce_shuffle 
    
    
   
    yarn.resourcemanager.ha.enabled  
    true 
    
    
   
    yarn.resourcemanager.cluster-id  
    hdcluster 
    
    
   
    yarn.resourcemanager.ha.rm-ids  
    rm1,rm2 
    
    
   
    yarn.resourcemanager.hostname.rm1  
    hadoop-1 
    
   
    yarn.resourcemanager.hostname.rm2  
    hadoop-2 
    
    
   
    yarn.resourcemanager.webapp.address.rm1  
    hadoop-1:8088 
    
   
    yarn.resourcemanager.webapp.address.rm2  
    hadoop-2:8088 
    
    
   
    yarn.resourcemanager.webapp.https.address.rm1  
    hadoop-1:5005 
    
   
    yarn.resourcemanager.webapp.https.address.rm2  
    hadoop-2:5005 
    
    
   
    yarn.resourcemanager.scheduler.address.rm1  
    hadoop-1:5001 
    
   
    yarn.resourcemanager.scheduler.address.rm2  
    hadoop-2:5001 
    
    
   
    yarn.resourcemanager.admin.address.rm1  
    hadoop-1:5003 
    
   
    yarn.resourcemanager.admin.address.rm2  
    hadoop-2:5003 
    
    
   
    yarn.resourcemanager.zk-address  
    hadoop-1:2181,hadoop-2:2181,hadoop-3:2181,hadoop-4:2181 
    
    
   
    yarn.nm.liveness-monitor.expiry-interval-ms  
    100000 
    
    
   
    yarn.scheduler.fair.user-as-default-queue  
    false 
    
    
   
    yarn.nodemanager.container-executor.class  
    org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor 
    
    
   
    yarn.nodemanager.pmem-check-enabled  
    true 
    
    
   
    yarn.resourcemanager.ha.automatic-failover.enabled  
    true 
    
    
    
   
    yarn.nodemanager.resource.memory-mb  
    4096 
    
    
    
   
    yarn.scheduler.maximum-allocation-mb  
    12288 
    
    
   
    yarn.scheduler.maximum-allocation-vcores  
    32 
    
    
   
    yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms  
    5000 
    
    
   
    yarn.resourcemanager.recovery.enabled  
    true 
    
    
   
    yarn.resourcemanager.store.class  
    org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore 
    
    
   
    yarn.resourcemanager.connect.retry-interval.ms  
    2000 
    
    
   
    yarn.resourcemanager.resource-tracker.address.rm1  
    hadoop-1:5002 
    
   
    yarn.resourcemanager.resource-tracker.address.rm2  
    hadoop-2:5002 
    
    
   
    yarn.resourcemanager.zk.state-store.address  
    hadoop-1:2181,hadoop-2:2181,hadoop-3:2181,hadoop-4:2181 
    
    
   
    yarn.nodemanager.vmem-pmem-ratio  
    8 
    
    
   
    yarn.log-aggregation-enable  
    true 
    
    
   
    yarn.log.server.url  
    http://hadoop-1:19888/jobhistory/logs 
    
    
   
    yarn.log-aggregation.retain-seconds  
    604800 
    
   
    yarn.nodemanager.log-dirs  
    /opt/hadoop/yarn/logs 
    
    
   
    yarn.nodemanager.delete.debug-delay-sec  
    600 
    
    
   
    yarn.nodemanager.remote-app-log-dir  
    /yarn-logs 
    
    
   
    yarn.nodemanager.remote-app-log-dir-suffix  
    logs 
    
    
   
    yarn.nodemanager.local-dirs  
    /opt/hadoop/yarn/local 
    
    
   
    yarn.app.mapreduce.am.staging-dir  
    /opt/hadoop/yarn/staging 
    
   
    yarn.nodemanager.aux-services.mapreduce.shuffle.class  
    org.apache.hadoop.mapred.ShuffleHandler 
    
   
    yarn.resourcemanager.scheduler.class  
    org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler 
    
    
    
   
    yarn.nodemanager.resource.cpu-vcores  
    8 
    
    
   
    yarn.resourcemanager.am.max-attempts  
    5 
    
    
   
    yarn.scheduler.minimum-allocation-mb  
    128 
    
    
    
    
   
    yarn.resourcemanager.ha.automatic-failover.embedded  
    true 
    
    
   
    yarn.resourcemanager.nodemanagers.heartbeat-interval-ms  
    1000 
    
    
   
    yarn.nodemanager.linux-container-executor.group  
    hadoop 
    
   
    yarn.nodemanager.resource.percentage-physical-cpu-limit  
    100 
    
   
    yarn.scheduler.minimum-allocation-vcores  
    1 
    
    
   
    yarn.nodemanager.log.retain-seconds  
    604800 
    
    
   
    yarn.nodemanager.vmem-check-enabled  
    false 
    
    
   
    yarn.resourcemanager.max-completed-applications  
    150 
    
    
    
   
    yarn.log-aggregation.retain-check-interval-seconds  
    604800 
    
    
    
   
    yarn.nodemanager.linux-container-executor.resources-handler.class  
    org.apache.hadoop.yarn.server.nodemanager.util.DefaultLCEResourcesHandler 
   



3.7.6 修改slaves文件

该文件配置hdfs的数据存储节点(datanodes):

hadoop-3
hadoop-4
hadoop-5

3.8 集群初始化和启动

# 1. 启动zookeeper集群(hadoop-1、hadoop-2、hadoop-3、hadoop-4)
    zkServer.sh start
    #运行jps命令,对应机器多了QuorumPeerMain的进程

# 2. 启动journalnode(hadoop-3、hadoop-4、hadoop-5)
    hadoop-daemon.sh start journalnode
    #运行jps命令可以看到多了JournalNode进程
    #ps: 在任意一台机器启动会远程拉起所有机器的进程

# 3. 格式化namenode(hadoop-1)
    hdfs namenode -format

# 4. 格式化ZKFC(初始化 HA 状态到 zk)(hadoop-1)
    hdfs zkfc -formatZK 

# 5. 启动 namenode1(hadoop-1)
    hadoop-daemon.sh start namenode
    #运行jps命令可以看到多了NameNode进程

# 6. 同步 namenode(hadoop-2)
    hdfs namenode -bootstrapStandby

# 7. 启动 namenode2(hadoop-2)
    hadoop-daemon.sh start namenode
    #运行jps命令可以看到多了NameNode进程

# 8. 启动ZookeeperFailoverController(hadoop-1,hadoop-2)
    hadoop-daemon.sh start zkfc
    #运行jps命令可以看到多了DFSZKFailoverController进程.
    #哪台机器先启动zkfc,哪台就是active

# 9. 启动 datanode(hadoop-3、hadoop-4、hadoop-5)
    hadoop-daemon.sh start datanode
    #运行jps命令,多了DataNode进程

# 10. 启动 resourcemanager(hadoop-2,hadoop-1)
    yarn-daemon.sh start resourcemanager
    #启动时先启动hadoop-2的rm,这样将hadoop-2的rm置为active(也可以通过命令手动切换)
    #运行jps,多了ResourceManager进程

# 11. 启动 nodemanager(hadoop-3、hadoop-4、hadoop-5)
    yarn-daemon.sh start nodemanager
    #运行jps,多了NodeManager进程

# 12. 启动 historyserver(hadoop-1,hadoop-2)
    mr-jobhistory-daemon.sh start historyserver
    #运行jps,多了JobHistoryServer进程

3.9 集群日常启动与关闭

上述流程为刚初始化并启动集群的流程,日常启动无需格式化NameNode,否则数据会丢失。

3.9.1 正常的启动顺序
# 1. 启动 zookeeper(hadoop-1~hadoop-4)
    zkServer.sh start

# 2. 启动 journalnode(hadoop-3~hadoop-5)
    hadoop-daemons.sh start journalnode

# 3. 启动 namenode(hadoop-1, hadoop-2)
    hadoop-daemon.sh start namenode

# 4. 启动ZookeeperFailoverController(hadoop-1,hadoop-2)
    hadoop-daemon.sh start zkfc
    #ps: 哪台机器先启动zkfc,哪台就是active

# 5. 启动 datanode(hadoop-3~hadoop-5)
    hadoop-daemons.sh start datanode

# 6. 启动 resourcemanager(hadoop-2,hadoop-1)
    yarn-daemon.sh start resourcemanager
    #ps: 先启动的为active
# 7. 启动 nodemanager(hadoop-3~hadoop-5)
    yarn-daemon.sh start nodemanager

# 8. 启动 historyserver(hadoop-1,hadoop-2)
    mr-jobhistory-daemon.sh start historyserver
    

上述流程的6和7可以改为:先在某个RM节点上运行start-yarn.sh(启动active RM和所有NM),再在另一个RM节点上运行yarn-daemon.sh start resourcemanager(启动standby RM)

3.9.2 正常的关闭顺序

在开启相关服务的机器上执行:

# 1. 关闭 historyserver(hadoop-1,hadoop-2)
    mr-jobhistory-daemon.sh stop historyserver
    
# 2. 关闭 nodemanager(hadoop-3~hadoop-5)
    yarn-daemon.sh stop nodemanager

# 3. 关闭 resourcemanager(hadoop-2,hadoop-1)
    yarn-daemon.sh stop resourcemanager
    
# 4. 关闭 datanode (hadoop-3~hadoop-5)
    hadoop-daemons.sh stop datanode

# 5. 关闭 ZookeeperFailoverController (hadoop-1,hadoop-2)
    hadoop-daemon.sh stop zkfc
    
# 6. 关闭 namenode(hadoop-1, hadoop-2
    hadoop-daemon.sh stop namenode

# 7. 关闭 journalnode(hadoop-3~hadoop-5)
    hadoop-daemons.sh stop journalnode

# 8. 关闭 zookeeper(hadoop-1~hadoop-4)
    zkServer.sh stop

上述流程的2和3可以改为:现在standby RM上执行yarn-daemon.sh stop resourcemanager(关闭standby RM),再在active RM上执行stop-yarn.sh(关闭active RM和所有NM).

四、验证高可用

相关命令:

# hdfs查看nn命令
hdfs haadmin -getServiceState nn1

# hdfs 切换为active命令
hdfs haadmin -transitionToActive --forcemanual nn1

# hdfs 切换为standby命令
hdfs haadmin -transitionToStandby --forcemanual nn2

# yarn查看rm命令
yarn rmadmin -getServiceState rm1

# yarn切换为 standby 状态
yarn rmadmin -transitionToStandby --forcemanual rm2

# yarn切换为 active 状态
yarn rmadmin -transitionToActive --forcemanual rm1

验证方法:在active NameNode节点将NameNode kill掉,看standby NameNode是否变成active状态;active Resoucemanager 节点将Resoucemanager进程kill掉,看standby Resoucemanager是否变成active状态。

你可能感兴趣的:(hadoop高可用分布式集群搭建)