此方案注意问题
hdfs-site.xml文件中的dfs.ha.fencing.methods参数为shell而非sshfence.因为sshfence存在主节点所在主机宕机(主机宕机而非停止服务)无法切换问题.但是百度到的大部分Hadoop HA相关文章都是使用的sshfence方式.
官方参考文档:
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
官网原文:The sshfence option SSHes to the target node and uses fuser to kill the process listening on the service’s TCP port. In order for this fencing option to work, it must be able to SSH to the target node without providing a passphrase. Thus, one must also configure the dfs.ha.fencing.ssh.private-key-files option, which is a comma-separated list of SSH private key files. For example:
当active的namenode连接失败后,standby会去ssh到之前active的namenode节点,然后再次杀一遍进程.
问题在于,如果之前active的namenode节点挂掉了,standby的节点ssh不到这个节点,也就无法切换.所以这种高可用只适用于预防namenode服务死掉而不能承受节点死掉.如果active的节点挂掉了,那么,从节点的zkfc日志中将能看到java.net.NoRouteToHostException: No route to host (Host unreachable)的异常.
注意点2: 为了防止脑裂问题,Hadoop HA也采用了半数机制.即(n - 1) / 2.所以,至少需要运行三个journalnode服务,也就是三个主节点(一个主节点两个备用节点).
官方原文:
- JournalNode machines - the machines on which you run the JournalNodes. The JournalNode daemon is relatively lightweight, so these daemons may reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager. Note: There must be at least 3 JournalNode daemons, since edit log modifications must be written to a majority of JNs. This will allow the system to tolerate the failure of a single machine. You may also run more than 3 JournalNodes, but in order to actually increase the number of failures the system can tolerate, you should run an odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when running with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to function normally.
文件中列出了每个文件的所有配置,很多配置并不是高可用的(如配置每个节点可以申请到的资源)
很多解释为了准确性粘贴了官方原文,译文可以用 https://fanyi.youdao.com/
关闭防火墙,关闭selinux,确保节点通信正常
修改主机名(因机器数量较少所以很多组件共用一个节点)(y因高可用要求所以准备三个主节点)
IP地址 | 主机名 | 描述 |
---|---|---|
20.88.10.31 | emr-header-01 | 主节点,zookeeper节点 |
20.88.10.32 | emr-header-02 | 备用主节点,zookeeper节点 |
20.88.10.33 | emr-worker-01 | 备用主节点,zookeeper节点,工作节点 |
20.88.10.34 | emr-worker-02 | 工作节点 |
修改hosts文件和主机名
hostnamectl set-hostname emr-header-01;bash
hosts文件内容添加(每个节点都一样)
此项必操作
因为Hadoop配置中指定了主节点及从节点主机名,如果没有加hosts,解析主机名会失败
20.88.10.31 emr-header-01
20.88.10.32 emr-header-02
20.88.10.33 emr-worker-01
20.88.10.34 emr-worker-02
节点互相免密(emr-header-01节点执行)
Hadoop启动时会ssh连接到其他节点,如果不做免秘钥会提示输入密码
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
分发到每个主机
sshpass -psinobase@123 ssh-copy-id -i /root/.ssh/id_dsa.pub "-o StrictHostKeyChecking=no" root@emr-header-01
sshpass -psinobase@123 ssh-copy-id -i /root/.ssh/id_dsa.pub "-o StrictHostKeyChecking=no" root@emr-header-02
sshpass -psinobase@123 ssh-copy-id -i /root/.ssh/id_dsa.pub "-o StrictHostKeyChecking=no" root@emr-worker-01
sshpass -psinobase@123 ssh-copy-id -i /root/.ssh/id_dsa.pub "-o StrictHostKeyChecking=no" root@emr-worker-02
安装系统centos7,用户为root权限(没有也可以),所有包安装目录为/opt
所有节点均需安装
按照版本所需下载对应jdk,下载略过,Java官网:https://www.oracle.com/java/technologies/downloads/
tar xf jdk-8u181-linux-x64.tar.gz -C /opt/
echo 'export JAVA_HOME=/opt/jdk1.8.0_181/
export PATH=${PATH}:${JAVA_HOME}/bin' >>/etc/profile
source /etc/profile
java -version
zookeeper安装于emr-header-01,emr-header-02,emr-worker-01三个节点
zookeeper官方下载地址:https://zookeeper.apache.org/releases.html
emr-header-01节点操作:
# 下载后上传,此处略过
tar xf zookeeper-3.4.13.tar.gz -C /opt/
修改配置
cd /opt/zookeeper-3.4.13/conf/
vim zoo.cfg
全部配置:
minSessionTimeout=16000
maxSessionTimeout=300000
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper/data
dataLogDir=/data/zookeeper/datalog
clientPort=2181
server.1=emr-header-01:2888:3888
server.2=emr-header-02:2888:3888
server.3=emr-worker-01:2888:3888
添加变量
echo 'ZK_HOME=/opt/zookeeper-3.4.13' >>/etc/profile
echo 'PATH=$PATH:${ZK_HOME}/bin/' >>/etc/profile
source /etc/profile
分发到其他节点
scp -r /opt/* emr-header-02:/opt/
scp -r /opt/* emr-worker-01:/opt/
scp -r /opt/* emr-worker-02:/opt/
scp /etc/profile emr-header-02:/etc/
scp /etc/profile emr-worker-01:/etc/
scp /etc/hosts emr-header-02:/etc/
scp /etc/hosts emr-worker-01:/etc/
scp /etc/hosts emr-worker-02:/etc/
所有节点执行
mkdir -p /data/zookeeper/data
cd /data/zookeeper/data
注意:此处每个节点执行内容不同
每个文件的数字全局唯一同时不能随便写
如emr-header-01的数字1对应的是上面配置中的server.1=emr-header-01:2888:3888 的 1
# emr-header-01节点执行
echo "1" >myid
# emr-header-02节点执行
echo "2" >myid
# emr-worker-01节点执行
echo "3" >myid
启动,所有节点执行
source /etc/profile;zkServer.sh start
检查状态
zkServer.sh status
三个节点执行后有两个是
Mode: follower
一个Mode: leader
即为成功
官网地址:https://archive.apache.org/dist/hadoop/common/
此处使用清华源下载,地址为:https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz
emr-header-01执行
curl -O https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz
tar xf hadoop-2.10.1.tar.gz -C /opt/
cd /opt/hadoop-2.10.1/etc/hadoop/
检查是否有fuser
命令,没有则安装(所有节点)
如果是sshfence方式回使用到这个命令,因为他为了防止脑裂要ssh连接到已经down掉的NameNode节点重新杀一遍NameNode的进程.而杀进程的时候回用到这条命令,如果没有的话sshfence方式在主节点执行命令
hadoop-daemon,.sh stop namenode
停止掉NameNode节点也是无法切换过去的,更不用说停掉主节点的主机.
yum install -y psmisc
如果主机名和解析不同,需要修改文件中对应的内容
注意
yarn-site.xml
文件中的yarn.nodemanager.resource.memory-mb
和yarn.nodemanager.resource.cpu-vcores
参数,改参数用于该节点可被调用的最大CPU数量和内存大小,此处设置为主机内存一半.
配置中除了高可用相关配置还有其他配置,如开启日志收集设置节点最大可用资源等,可以单独加入配置也可以直接覆盖之前所有配置,Hadoop版本1.10.1测试可用.
hadoop-env.sh
第一行插入,或者修改文件中的JAVA_HOME
变量
source /etc/profile
slaves
文件内容(删除原有的localhost)
# 此文件中写所有从节点别名
emr-worker-01
emr-worker-02
core-site.xml
文件内容,注意文件中的
不要重复
<configuration>
<property>
<name>fs.defaultFSname>
<value>hdfs://emr-header-01:9000value>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.description>
property>
<property>
<name>hadoop.tmp.dirname>
<value>/data/hadoop/tmpvalue>
<description>A base for other temporary directories.description>
property>
<property>
<name>fs.defaultFSname>
<value>hdfs://myclustervalue>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.description>
property>
<property>
<name>hadoop.tmp.dirname>
<value>/data/tmpvalue>
<description>A base for other temporary directories.description>
property>
<property>
<name>ha.zookeeper.quorumname>
<value>emr-header-01:2181,emr-header-02:2181,emr-worker-01:2181value>
<description>A list of ZooKeeper server addresses, separated by commas, that are to be used by the ZKFailoverController in automatic failover.description>
property>
configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
<description>A comma separated list of services where service name should only contain a-zA-Z0-9_ and can not start with numbersdescription>
property>
<property>
<name>yarn.resourcemanager.hostnamename>
<value>emr-header-01value>
<description>The hostname of the RM.description>
property>
<property>
<name>yarn.nodemanager.resource.cpu-vcoresname>
<value>2value>
<description>Number of vcores that can be allocated for containers. This is used by the RM scheduler when allocating resources for containers. This is not used to limit the number of CPUs used by YARN containers. If it is set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically determined from the hardware in case of Windows and Linux. In other cases, number of vcores is 8 by default.description>
property>
<property>
<name>yarn.nodemanager.resource.memory-mbname>
<value>4096value>
<description>Amount of physical memory, in MB, that can be allocated for containers. If set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically calculated(in case of Windows and Linux). In other cases, the default is 8192MB.description>
property>
<property>
<name>yarn.scheduler.maximum-allocation-vcoresname>
<value>4value>
<description>The maximum allocation for every container request at the RM in terms of virtual CPU cores. Requests higher than this will throw an InvalidResourceRequestException.description>
property>
<property>
<name>yarn.scheduler.maximum-allocation-mbname>
<value>3072value>
<description>The maximum allocation for every container request at the RM in MBs. Memory requests higher than this will throw an InvalidResourceRequestException.description>
property>
<property>
<name>yarn.scheduler.minimum-allocation-vcoresname>
<value>1value>
<description> The minimum allocation for every container request at the RM in terms of virtual CPU cores. Requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have fewer virtual cores than this value will be shut down by the resource manager.description>
property>
<property>
<name>yarn.scheduler.minimum-allocation-mbname>
<value>1024value>
<description>The minimum allocation for every container request at the RM in MBs. Memory requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have less memory than this value will be shut down by the resource manager.description>
property>
<property>
<name>yarn.log-aggregation-enablename>
<value>truevalue>
<description>Whether to enable log aggregation. Log aggregation collects each container's logs and moves these logs onto a file-system, for e.g. HDFS, after the application completes. Users can configure the "yarn.nodemanager.remote-app-log-dir" and "yarn.nodemanager.remote-app-log-dir-suffix" properties to determine where these logs are moved to. Users can access the logs via the Application Timeline Server.description>
property>
<property>
<name>yarn.log-aggregation.retain-secondsname>
<value>604800value>
<description> How long to keep aggregation logs before deleting them. -1 disables. Be careful set this too small and you will spam the name node.description>
property>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
<description>A comma separated list of services where service name should only contain a-zA-Z0-9_ and can not start with numbersdescription>
property>
<property>
<name>yarn.resourcemanager.ha.enabledname>
<value>truevalue>
<description>Enable RM high-availability. When enabled, (1) The RM starts in the Standby mode by default, and transitions to the Active mode when prompted to. (2) The nodes in the RM ensemble are listed in yarn.resourcemanager.ha.rm-ids (3) The id of each RM either comes from yarn.resourcemanager.ha.id if yarn.resourcemanager.ha.id is explicitly specified or can be figured out by matching yarn.resourcemanager.address.{id} with local address (4) The actual physical addresses come from the configs of the pattern - {rpc-config}.{id}description>
property>
<property>
<name>yarn.resourcemanager.cluster-idname>
<value>cluster-yarnvalue>
<description>Name of the cluster. In a HA setting, this is used to ensure the RM participates in leader election for this cluster and ensures it does not affect other clustersdescription>
property>
<property>
<name>yarn.resourcemanager.ha.rm-idsname>
<value>rm1,rm2value>
<description>The list of RM nodes in the cluster when HA is enabled. See description of yarn.resourcemanager.ha .enabled for full details on how this is used.description>
property>
<property>
<name>yarn.resourcemanager.hostname.rm1name>
<value>emr-header-01value>
property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1name>
<value>emr-header-01:8088value>
property>
<property>
<name>yarn.resourcemanager.address.rm1name>
<value>emr-header-01:8032value>
property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1name>
<value>emr-header-01:8030value>
property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1name>
<value>emr-header-01:8031value>
property>
<property>
<name>yarn.resourcemanager.hostname.rm2name>
<value>emr-header-02value>
property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2name>
<value>emr-header-02:8088value>
property>
<property>
<name>yarn.resourcemanager.address.rm2name>
<value>emr-header-02:8032value>
property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2name>
<value>emr-header-02:8030value>
property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2name>
<value>emr-header-02:8031value>
property>
<property>
<name>yarn.resourcemanager.zk-addressname>
<value>emr-header-01:2181,emr-header-02:2181,emr-header-03:2181value>
<description>description>
property>
<property>
<name>yarn.resourcemanager.recovery.enabledname>
<value>truevalue>
<description>Enable RM to recover state after starting. If true, then yarn.resourcemanager.store.class must be specified.description>
property>
<property>
<name>yarn.resourcemanager.store.classname>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStorevalue>
<description>The class to use as the persistent store. If org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore is used, the store is implicitly fenced; meaning a single ResourceManager is able to use the store at any point in time. More details on this implicit fencing, along with setting up appropriate ACLs is discussed under yarn.resourcemanager.zk-state-store.root-node.acl.description>
property>
<property>
<name>yarn.nodemanager.env-whitelistname>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOMEvalue>
<description>Environment variables that containers may override rather than use NodeManager's default.description>
property>
configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replicationname>
<value>1value>
property>
<property>
<name>dfs.namenode.secondary.http-addressname>
<value>emr-header-01:50090value>
property>
<property>
<name>dfs.namenode.name.dirname>
<value>file://${hadoop.tmp.dir}/namevalue>
property>
<property>
<name>dfs.datanode.data.dirname>
<value>file://${hadoop.tmp.dir}/datavalue>
<description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. The directories should be tagged with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS storage policies. The default storage type will be DISK if the directory does not have a storage type tagged explicitly. Directories that do not exist will be created if local filesystem permission allows.description>
property>
<property>
<name>dfs.journalnode.edits.dirname>
<value>${hadoop.tmp.dir}/jnvalue>
<description>The directory where the journal edit files are stored.description>
property>
<property>
<name>dfs.nameservicesname>
<value>myclustervalue>
<description>Comma-separated list of nameservices.description>
property>
<property>
<name>dfs.ha.namenodes.myclustername>
<value>nn1,nn2,nn3value>
<description>dfs.ha.namenodes.EXAMPLENAMESERVICE The prefix for a given nameservice, contains a comma-separated list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE). Unique identifiers for each NameNode in the nameservice, delimited by commas. This will be used by DataNodes to determine all the NameNodes in the cluster. For example, if you used “mycluster” as the nameservice ID previously, and you wanted to use “nn1” and “nn2” as the individual IDs of the NameNodes, you would configure a property dfs.ha.namenodes.mycluster, and its value "nn1,nn2".description>
property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1name>
<value>emr-header-01:8020value>
property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2name>
<value>emr-header-02:8020value>
property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn3name>
<value>emr-worker-01:8020value>
property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1name>
<value>emr-header-01:50070value>
property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2name>
<value>emr-header-02:50070value>
property>
<property>
<name>dfs.namenode.http-address.mycluster.nn3name>
<value>emr-worker-01:50070value>
property>
<property>
<name>dfs.namenode.shared.edits.dirname>
<value>qjournal://emr-header-01:8485;emr-header-02:8485;emr-worker-01:8485/myclustervalue>
<description>A directory on shared storage between the multiple namenodes in an HA cluster. This directory will be written by the active and read by the standby in order to keep the namespaces synchronized. This directory does not need to be listed in dfs.namenode.edits.dir above. It should be left empty in a non-HA cluster.description>
property>
<property>
<name>dfs.client.failover.proxy.provider.myclustername>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue>
property>
<property>
<name>dfs.ha.fencing.methodsname>
<value>shell(/bin/true)value>
<description>A list of scripts or Java classes which will be used to fence the Active NameNode during a failover. See the HDFS High Availability documentation for details on automatic HA configuration.description>
property>
<property>
<name>dfs.ha.fencing.ssh.private-key-filesname>
<value>/root/.ssh/id_dsavalue>
property>
<property>
<name>dfs.ha.automatic-failover.enabledname>
<value>truevalue>
<description>Whether automatic failover is enabled. See the HDFS High Availability documentation for details on automatic HA configuration.description>
property>
<property>
<name>dfs.namenode.checkpoint.txnsname>
<value>1000000value>
<description>The Secondary NameNode or CheckpointNode will create a checkpoint of the namespace every 'dfs.namenode.checkpoint.txns' transactions, regardless of whether 'dfs.namenode.checkpoint.period' has expired.description>
property>
<property>
<name>dfs.namenode.checkpoint.check.periodname>
<value>60value>
<description>The SecondaryNameNode and CheckpointNode will poll the NameNode every 'dfs.namenode.checkpoint.check.period' seconds to query the number of uncheckpointed transactions. Support multiple time unit suffix(case insensitive), as described in dfs.heartbeat.interval.If no time unit is specified then seconds is assumed.description>
property>
<property>
<name>dfs.namenode.num.extra.edits.retainedname>
<value>1000000value>
<description>The number of extra transactions which should be retained beyond what is minimally necessary for a NN restart. It does not translate directly to file's age, or the number of files kept, but to the number of transactions (here "edits" means transactions). One edit file may contain several transactions (edits). During checkpoint, NameNode will identify the total number of edits to retain as extra by checking the latest checkpoint transaction value, subtracted by the value of this property. Then, it scans edits files to identify the older ones that don't include the computed range of retained transactions that are to be kept around, and purges them subsequently. The retainment can be useful for audit purposes or for an HA setup where a remote Standby Node may have been offline for some time and need to have a longer backlog of retained edits in order to start again. Typically each edit is on the order of a few hundred bytes, so the default of 1 million edits should be on the order of hundreds of MBs or low GBs. NOTE: Fewer extra edits may be retained than value specified for this setting if doing so would mean that more segments would be retained than the number configured by dfs.namenode.max.extra.edits.segments.retained.description>
property>
<property>
<name>dfs.namenode.num.checkpoints.retainedname>
<value>2value>
<description>The number of image checkpoint files (fsimage_*) that will be retained by the NameNode and Secondary NameNode in their storage directories. All edit logs (stored on edits_* files) necessary to recover an up-to-date namespace from the oldest retained checkpoint will also be retained.description>
property>
<property>
<name>dfs.namenode.max.extra.edits.segments.retainedname>
<value>10000value>
<description>The maximum number of extra edit log segments which should be retained beyond what is minimally necessary for a NN restart. When used in conjunction with dfs.namenode.num.extra.edits.retained, this configuration property serves to cap the number of extra edits files to a reasonable value.description>
property>
configuration>
mapred-site.xml
,需要先复制出一份cp mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<name>dfs.replicationname>
<value>1value>
property>
<property>
<name>dfs.namenode.secondary.http-addressname>
<value>emr-header-01:50090value>
property>
configuration>
编辑/opt/hadoop-2.10.1/sbin/start-yarn.sh
# start resourceManager
"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR start resourcemanager
# 插入下面这一段 启动时同时启动emr-header-02节点的resourcemanager进程
ssh emr-header-02 "$bin"/yarn-daemon.sh --config $YARN_CONF_DIR start resourcemanager
# start nodeManager
"$bin"/yarn-daemons.sh --config $YARN_CONF_DIR start nodemanager
# start proxyserver
#"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR start proxyserver
编辑/opt/hadoop-2.10.1/sbin/stop-yarn.sh
# stop resourceManager
"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR stop resourcemanager
# 插入下面一行 停止任务时同时停止emr-header-02节点的resource-manager进程
ssh emr-header-02 "$bin"/yarn-daemon.sh --config $YARN_CONF_DIR stop resourcemanager
# stop nodeManager
"$bin"/yarn-daemons.sh --config $YARN_CONF_DIR stop nodemanager
# stop proxy server
"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR stop proxyserver
插入环境变量
echo 'export HADOOP_HOME=/opt/hadoop-2.10.1/' >>/etc/profile
echo 'export PATH=${PATH}:${HADOOP_HOME}/sbin/:${HADOOP_HOME}/bin/' >>/etc/profile
source /etc/profile
复制到其他节点(emr-header-01节点操作)
# 复制密钥到其他节点,因为上面配置了dfs.ha.fencing.ssh.private-key-files参数指定了密钥文件
scp -r /root/.ssh/* emr-header-02:/root/.ssh/
scp /etc/profile emr-header-02:/etc/
scp /etc/profile emr-worker-01:/etc/
scp /etc/profile emr-worker-02:/etc/
scp -r /opt/hadoop-2.10.1/ emr-header-02:/opt/
scp -r /opt/hadoop-2.10.1/ emr-worker-01:/opt/
scp -r /opt/hadoop-2.10.1/ emr-worker-02:/opt/
所有节点刷新环境变量
source /etc/profile
emr-header-01,emr-header-02,emr-worker-01 三个节点执行
hadoop-daemon.sh start journalnode
若想检查启动情况见日志,路径与
/opt/hadoop-2.10.1/logs/hadoop-root-journalnode-emr-header-01.log
,不同节点文件名不同.
emr-header-01节点执行
# 此条重复执行会提示输入y
hdfs zkfc -formatZK
hdfs namenode -format
hadoop-daemon.sh start namenode
emr-header-02和emr-worker-02节点执行(同步namenode数据)
hdfs namenode -bootstrapStandby
emr-header-01节点运行
stop-all.sh
开始启动集群
start-all.sh
建议现在本地电脑添加hosts,不添加同样可用,不过访问到resourceManager备用节点时(也就是YARN)会跳转到主节点的主机别名.
hosts文件内容(同上面每个主机的)
20.88.10.31 emr-header-01
20.88.10.32 emr-header-02
20.88.10.33 emr-worker-01
20.88.10.34 emr-worker-02
访问header1的
# HDFS 每个运行了journalnode的都可以访问
http://20.88.10.31:50070/
http://20.88.10.32:50070/
http://20.88.10.33:50070/
HDFS第一行Overview 'emr-header-01:8020' (active)
为主节点,Overview 'emr-header-02:8020' (standby)
为备用节点
# YARN 或者MapReduce
http://20.88.10.31:8088/
查看节点状态
rm1和rm2从上面配置指定,rm1为emr-header-01节点,rm2为emr-header-02节点.
[root@emr-header-01 ~]# yarn rmadmin -getServiceState rm1
standby
[root@emr-header-01 ~]# yarn rmadmin -getServiceState rm2
active
[root@emr-header-01 ~]#
根据上面查看的情况,HDFS主节点为emr-header-01,而Yarn主节点为emr-header-02
.
# emr-header-01 执行,停止namenode,注意是hadoop-daemon.sh而不是hadoop-daemons.sh
hadoop-daemon.sh stop namenode
# emr-header-02节点,执行命令停止resourceManager
yarn-daemon.sh stop resourcemanager
Yarn header02节点已不可访问,访问http://emr-header-01:8088/cluster
可用.
HDFS header-01节点不可访问,访问header-02 http://emr-header-02:50070/dfshealth.html#tab-overview
,显示为active状态.
停止的服务重新启动,即自动成为备用节点,启动命令(stop在那个节点执行的,对应的start同样在那个节点执行):
yarn-daemon.sh start resourcemanager
hadoop-daemon.sh start namenode
当上面的测试都可用的时候,那么可以随意停掉任何一个主节点的机器模拟故障,查看是否能够切换过去.dfs.ha.fencing.methods参数为shell方式是没问题的而sshfence是切换不过去的,原因可见开头.相对比之下,shell方式主备切换过程会比较长需要多等待一会(测试大概一分钟以内),而sshfence方式会很快在active节点down掉之后补上去.但是shell方式可以承受主机节点的异常.
如果经常观察Hadoop的NameNode数据存储目录,可以发现高可用情况下NameNode的edits日志数量越来越多.
这个是通过hdfs配置中的dfs.namenode.num.extra.edits.retained
参数控制的,高可用情况下,默认保留的edits日志数量是1000000个.
对此Hadoop官方给出的解释是:
The number of extra transactions which should be retained beyond what is minimally necessary for a NN restart. It does not translate directly to file’s age, or the number of files kept, but to the number of transactions (here “edits” means transactions). One edit file may contain several transactions (edits). During checkpoint, NameNode will identify the total number of edits to retain as extra by checking the latest checkpoint transaction value, subtracted by the value of this property. Then, it scans edits files to identify the older ones that don’t include the computed range of retained transactions that are to be kept around, and purges them subsequently. The retainment can be useful for audit purposes or for an HA setup where a remote Standby Node may have been offline for some time and need to have a longer backlog of retained edits in order to start again. Typically each edit is on the order of a few hundred bytes, so the default of 1 million edits should be on the order of hundreds of MBs or low GBs. NOTE: Fewer extra edits may be retained than value specified for this setting if doing so would mean that more segments would be retained than the number configured by dfs.namenode.max.extra.edits.segments.retained.
同时提到了参数dfs.namenode.max.extra.edits.segments.retained
,默认值是10000,解释为:
The maximum number of extra edit log segments which should be retained beyond what is minimally necessary for a NN restart. When used in conjunction with dfs.namenode.num.extra.edits.retained, this configuration property serves to cap the number of extra edits files to a reasonable value.
可参考: https://blog.csdn.net/weixin_44455125/article/details/122524280
阿里云的emr集群配置同样是shell
方式,但是他的主节点数量支持两个.
不知道为啥,还在看.有知道的大佬劳烦给讲讲.