一:hdfs高可用性之仲裁管理者
1 目的:This guide provides an overview of the HDFS High Availability (HA) feature and how to configure and manage an HA HDFS cluster, using the Quorum Journal Manager (QJM) feature.This document assumes that the reader has a general understanding of general components and node types in an HDFS cluster. Please refer to the HDFS Architecture guide for details.(该向导概述了一个hdfs高可用性功能,如何通过使用仲裁管理者功能配置和管理一个高可用hdfs集群,该问当地假定读者理解大致hdfs集群的组件和节点类型,其详细信息请参阅hdfs架构向导)
2 背景:Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine.
This impacted the total availability of the HDFS cluster in two major ways:
The HDFS High Availability feature addresses the above problems by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby. This allows a fast failover to a new NameNode in the case that a machine crashes, or a graceful administrator-initiated failover for the purpose of planned maintenance.
在hadoop2.0.0之前版本,在hdfs集群中namenode节点是单点的。每一个集群只有一个namenode节点,如果那台机器或者进程挂了,整个集群将不能用直到namenode重启。
影响hdfs集群可用性有两个主要方面:
1)宕机
2)按计划的软硬件升级
提供hdfs高可用性,运行两个完全一样的namemode节点可以解决上述问题。这可以故障切换到一台新的namenode节点以防宕机按计划的维修。
3 架构:In a typical HA cluster, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state. The Active NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.
In order for the Standby node to keep its state synchronized with the Active node, both nodes communicate with a group of separate daemons called "JournalNodes" (JNs). When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs. The Standby node is capable of reading the edits from the JNs, and is constantly watching them for changes to the edit log. As the Standby Node sees the edits, it applies them to its own namespace. In the event of a failover, the Standby will ensure that it has read all of the edits from the JounalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.
In order to provide a fast failover, it is also necessary that the Standby node have up-to-date information regarding the location of blocks in the cluster. In order to achieve this, the DataNodes are configured with the location of both NameNodes, and send block location information and heartbeats to both.
It is vital for the correct operation of an HA cluster that only one of the NameNodes be Active at a time. Otherwise, the namespace state would quickly diverge between the two, risking data loss or other incorrect results. In order to ensure this property and prevent the so-called "split-brain scenario," the JournalNodes will only ever allow a single NameNode to be a writer at a time. During a failover, the NameNode which is to become active will simply take over the role of writing to the JournalNodes, which will effectively prevent the other NameNode from continuing in the Active state, allowing the new Active to safely proceed with failover.
架构这部分翻译不再赘述,上篇文章有写,不同的是存储namenode共享信息的方式不同,该方式是提供JournalNodes(仲裁)节点
4 硬件设备:In order to deploy an HA cluster, you should prepare the following:
Note that, in an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error. This also allows one who is reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled to reuse the hardware which they had previously dedicated to the Secondary NameNode.
需要的硬件设备:
1)namenode节点机器
2)仲裁节点机器