013 Hadoop 高可用 - Namenode 自动故障切换

013 Hadoop High Availability – Namenode Automatic Failover

Before Hadoop 2.0 that is Hadoop 1.0 faced a single point of failure (SPOF) in NameNode. This means if the NameNode failed the entire system would not function and manual intervention was necessary to bring the Hadoop cluster up with the help of secondary NameNode which resulted in overall downtime. With Hadoop 2.0 we had single standby node to facilitate automatic failover and with Hadoop 3.0 which supports multiple standby nodes, the system has become even more highly available. In this tutorial, we will talk about Hadoop high availability. We will look at various types of failover and discuss in detail how the components of Zookeeper provide for automatic failover.

在 Hadoop 2.0 之前,Hadoop 1.0 在 NameNode 中面临单点故障 (SPOF).这意味着,如果 NameNode 出现故障,整个系统将无法运行,需要手动干预Hadoop 集群在二级 NameNode 的帮助下,导致了整体停机.借助 Hadoop 2.0,我们有了单个备用节点,以方便自动故障切换; 借助支持多个备用节点的 Hadoop 3.0,系统变得更加可用.在本教程中,我们将讨论 Hadoop 高可用性.我们将研究各种类型的故障切换,并详细讨论动物园管理员组件提供自动故障切换.

013 Hadoop 高可用 - Namenode 自动故障切换_第1张图片
hadoop high availability

Hadoop High Availability – Automatic Failover

1. What is Hadoop High Availability?

With Hadoop 2.0, we have support for multiple NameNodes and with Hadoop 3.0 we have standby nodes. This overcomes the SPOF (Single Point Of Failure) issue using an extra NameNode (Passive Standby NameNode) for automatic failover. This is the high availability in Hadoop.

借助 Hadoop 2.0,我们支持多个名称节点,借助 Hadoop 3.0,我们拥有备用节点.这克服了使用额外的 NameNode (被动备用 NameNode) 进行自动故障切换的 SPOF (单点故障) 问题.这是 Hadoop 中的高可用性.

i. What is Failover?

i. 故障切换是什么

Failover is a process in which the system transfers control to a secondary system in an event of failure.

故障切换是指在发生故障时,系统将控制转移到辅助系统的过程.

There are two types of failover:

故障切换有两种类型:

  • Graceful Failover – In this type of failover the administrator manually initiates it. We use graceful failover in case of routine system maintenance. There is a need to manually transfer the control to standby NameNode it does not happen automatically.
  • **Automatic Failover – **In Automatic Failover, the system automatically transfers the control to standby NameNode without manual intervention. Without this automatic failover if the NameNode goes down then the entire system goes down. Hence the feature of Hadoop high availability is available only with this automatic failover, it acts as your insurance policy against a single point of failure.

013 Hadoop 高可用 - Namenode 自动故障切换_第2张图片
Hadoop Quiz

  • 优雅的故障切换在这种类型的故障切换中,管理员手动启动它.在日常系统维护的情况下,我们使用优雅的故障切换.需要手动将控件转移到备用名称节点,它不会自动发生.
  • 自动故障切换自动故障切换,系统在没有人工干预的情况下自动将控制转移到备用名称节点.如果 NameNode 出现故障,那么整个系统就会出现故障.因此,Hadoop 高可用性的特性只有在这种自动故障切换时才可用,它充当了您针对单点故障的保险单.

013 Hadoop 高可用 - Namenode 自动故障切换_第3张图片
Hadoop Quiz

2. NameNode High Availability in Hadoop

2. 、Hadoop 复制指令的高可用性

Automatic failover in Hadoop adds up below components to a Hadoop HDFS deployment:

Hadoop 中的自动故障切换将以下组件添加到 Hadoop HDFS 部署中:

  • ZooKeeper quorum.

  • ZKFailoverController Process (ZKFC).

  • 动物园管理员人数

  • 处理 (ZKFC).

i. Zookeeper Quorum

Zookeeper quorum is a centralized service for maintaining small amounts of data for coordination, configuration, and naming. It provides group services and synchronization. It keeps the client informed about changes in data and track client failures. Implementation of automatic HDFS failover relies on Zookeeper for:

Zookeeper 是用于维护少量数据以进行协调、配置和命名的集中服务.它提供组服务和同步.它让客户了解数据的变化,并跟踪客户故障.执行自动 HDFS 失败转移功能依赖于管理员的:

  • Failure detection- Zookeeper maintains a session with NameNode. In the event of failure, this session expires and the zookeeper informs the other NameNodes to start the failover process.
  • Active NameNode election- Zookeeper provides a method to elect a node as an active node. Hence whenever his active NameNode fails, other NameNode takes on exclusive lock in the Zookeeper, stating that it wants to become the next active NameNode.

ii. ZKFailoverController (ZKFC)

ZKFC is a client of Zookeeper that monitors and manages the namenode status. So, each of the machines which run namenode service also runs a ZKFC.

ZKFC是一个客户管理员监督和管理、复制指令的情况.因此,运行 namenode 服务的每台机器也都运行 ZKFC.

ZKFC handles:

ZKFC 手柄:

**Health Monitoring – **ZKFC periodically pings the active NameNode with Health check command and if the NameNode doesn’t respond it in time it will mark it as unhealthy. This may happen because the NameNode might be crashed or frozen.

健康监测-ZKFC 定期用健康检查命令 ping 活跃的 NameNode,如果 NameNode 没有及时响应,它会将其标记为不健康.这可能是因为 NameNode 可能会崩溃或冻结.

Zookeeper Session Management – If the local NameNode is healthy it keeps a session open in the Zookeeper. If this local NameNode is active, it holds a special lock znode. If the session expires then this lock will delete automatically.

会话管理-如果本地名称节点是健康的,它会在 Zookeeper 中保持会话打开.如果这个本地名称节点是活动的,它会持有一个特殊的锁Znode.如果会话过期,则此锁将自动删除.

Zookeeper-based Election – If there is a situation where local NameNode is healthy and ZKFC gets to know that none of the other nodes currently holds the znode lock, the ZKFC itself will try to acquire that lock. If it succeeds in this task then it has won the election and becomes responsible for running a failover. The failover is similar to manual failover; first, the previously active node is fenced if required to do so and then the local node becomes the active node.

动物园管理员的选举如果本地 NameNode 健康,ZKFC 知道目前没有其他节点持有 znode 锁,ZKFC 本身将尝试获得该锁.如果它在这个任务中成功,那么它就赢得了选举,并负责运行故障切换.故障切换类似于手动故障切换; 首先,如果需要,以前的活动节点会被隔离,然后本地节点会成为活动节点.

3. Summary

3. 简要

Hence, in this Hadoop High Availability article, we saw Zookeeper daemons configure to run on three or five nodes. Since Zookeeper does not have high resource requirement it could be run on the same node as the HDFS Namenode or standby Namenode. Many operators choose to deploy third Zookeeper process on the same node as the YARN Resource Manager. So, it is advised to keep Zookeeper data separate from HDFS metadata i.e. on different disks as it will give the best performance and isolation.

因此,在这个 Hadoop高可用性文章中,我们看到 Zookeeper 守护进程被配置为在三到五个节点上运行.因为管理员没有所需的它可以运行在相同的节点为 HDFS 、复制指令或待机、复制指令.许多操作员选择在与 YARN 资源管理器相同的节点上部署第三个 Zookeeper 进程.因此,建议将 Zookeeper 数据与 HDFS 元数据分开,即在不同的磁盘上,因为它将提供最佳的性能和隔离.

You must check the latest Hadoop Interview Questions for your upcoming interview.

你必须检查一下最新 Hadoop 面试题:为你即将到来的面试.

Still, if any doubt regarding Hadoop High Availability, ask in the comments. We will definitely get back to you.

尽管如此,如果对 Hadoop 的高可用性有任何疑问,请在评论中提问.我们一定会给你回复的

https://data-flair.training/blogs/hadoop-high-availability

你可能感兴趣的:(013 Hadoop 高可用 - Namenode 自动故障切换)