HDFS之Node角色

Secondary NameNode:

NameNode是一种logappend方式来存储对dfs的修改操作,editlog。
NameNode启动的时候,会去从fsimage中读取HDFS的状态,然后从editlog中恢复恢复对dfs的修改操作。然后在对fsimage写入新的状态,启动一个新的空的edits file.

由于NameNode只会在其启动的时候,会合并fsimage和editlog. Editlog会随着时间的增长变得越来越大。下次重新启动Namenode的时候,会变得异常缓慢。

SNN会每隔一段时间来合并fsimage和editslog,来保证editlog的长度限制。
SNN通常会运行在另一台机器上,SNN和NN的内存需求量是同一个数量级。

SNN的checkpoint过程会有2个参数来触发:

dfs.namenode.checkpoint.period 时间间隔来checkpoint

set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints, and

dfs.namenode.checkpoint.txns 为checkpoint的事务个数阀值

set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached.

SNN会checkpoint形成和NN同样的元数据存储文件结构,可以随时准备被NameNode来进行读取。

Checkpoint Node:

NN使用2个文件来持久化它的命名空间Namespace。
1、最近一次的namespace的checkpoint和edits
2、自从上次checkpoint后的a journal log of changes

Namenode重启的时候,它会合并fsimage和edits journal来提供一个最新的DFS的元数据。然后NN就用最新的DFS状态来overwrite现有个fsimage,然后开启一个新的edits journal.

Checkpoint Node定期的来对Namespace来创建checkpoint。
它从限制Active的NN来拉去fsimage,然后download到本地,然后在本地合并,最后上传合并后最新的image到activeNN。
Checkpoint Node通常会运行在另一台机器上,CheckpointNode和NN的内存需求量是同一个数量级。

The location of the Checkpoint (or Backup) node and its accompanying web interface are configured via the dfs.namenode.backup.address and dfs.namenode.backup.http-address configuration variables.

合并的时间间隔和阀值和SNN的参数是一样的dfs.namenode.checkpoint.period,dfs.namenode.checkpoint.txns

启动Checkpoint Node:
The Checkpoint node is started by bin/hdfs namenode -checkpoint on the node specified in the configuration file.

Backup Node:

和checkpointNode提供同样的checkpoint功能。同样也维护了一个in-memory并且最新的fs namespace的副本,会总是和activeNN来进行同步。和NN一起接受journal stream文件系统的edits并且持久化到disk。同时将edit应用到自己的namspace内存中,这样来创建backup namespace。

Backup Node不必从active NN来下载fsimage,因为它被要求是一个checkpoint node或者是一个 snn。

由于它只是一个内存中NameNode的namespace的副本,所以可以更快的进行checkpoint。

StandbyNN。

集群中有且只有一个BackupNode。如果启用了BackupNode,则不允许再注册Checkpoint Node。
bin/hdfs namenode -backup.

dfs.namenode.backup.address

dfs.namenode.backup.http-address

BackupNode支持提供一个选项来运行没有存储介质的NN。允许将所有的持久化状态的责任来交给BackupNode。
可以使用-importCheckpoint。

Term理解

  • Role of the name-node – defines name-node functionality.
  • Active name-node (NN) – a name-node in “active” role.
    This is the main (traditional) name-node, unique in the cluster.
  • Checkpoint node (CN) – a name-node in “checkpoint” role.
    This node performs only checkpoints. It does not keep an up-to-date namespace
    state.
  • Backup node (BN) – a name-node in “backup” role.
    Includes all the checkpoint responsibilities, plus it maintains an up-to-date
    namespace state, which is always in sync with the active node.
  • Standby node (SN) – a name-node in “standby” state.
    Standby is a backup node, which is able to take over the active role if the current
    active fails.
  • Image – latest checkpoint of the namespace; corresponds to “fsimage” file.
  • Journal – a collection of journal records (edits) reflecting modifications to the
    namespace since the latest checkpoint; corresponds to “edits” file.
  • Image store – a storage resource, which contains namespace image state.
  • Journal store – a storage resource, which contains namespace journal.
  • Journal Spool – a temporary storage on BN that spools journal records until they
    can be picked up and applied to the namespace.
  • Checkpoint Time – the latest time the image was saved; defines the age of the
    namespace state.

原创文章,转载请注明:

转载自:OopsOutOfMemory盛利的Blog,作者: OopsOutOfMemory

本文链接地址:http://blog.csdn.net/oopsoom/article/details/47278399

注:本文基于署名-非商业性使用-禁止演绎 2.5 中国大陆(CC BY-NC-ND 2.5 CN)协议,欢迎转载、转发和评论,但是请保留本文作者署名和文章链接。如若需要用于商业目的或者与授权方面的协商,请联系我。

你可能感兴趣的:(hadoop,hdfs,合并,存储,dfs,namenode)