Hadoop Distributed File System ~~~ learning notes

Hadoop Distributed File System ~~~ learning notes_第1张图片
Hadoop.png

The image comes from Google Images Engine by searching "HDFS.png"

An HDFS cluster has two types of nodes operating in a master-worker pattern: a namenode (the master) and a number of datanodes (workers).

1. NameNode

The NameNode maintains the namespace tree and the mapping of tfile blocks to DataNodes. This information is stored persistently on the local disk in the form of two files: the namespace image and edit log.

2. DataNode

In order to verify the namespace ID and the software version of the DataNode, each DataNode connects to the NameNode via a handshake during startup. The DataNode may automatically shut down if neither matches that of the NameNode. After the handshake, the DataNode registers with the NameNode. Then DataNode maintains a unique storage IDs. The storage ID of a DataNode is assigned when it registers with NameNode for the first time and will not change any more.
A DataNode identifies block replicas in its possession to the NameNode by sending a block report. A first block report is sent immediately after the DataNode registration. Subsequent block reports are sent every hour in order to provide NameNode with up-to-date view of where block replicas are located on the cluster.
During normal operation, the DataNode send heartbeats in a frequency of three seconds to the NameNode to confirm that the DataNode is operating and its block relicas are available. If NameNode does not receive any heartbeats from a DataNode in TEN minutes, it considers the DataNode to be out of service and the block replica hosted by the DataNode to be unavailable. Then the NameNode schedules another replicas of the unavailable blocks on other DataNodes.

你可能感兴趣的:(Hadoop Distributed File System ~~~ learning notes)