Reference:
《Archival Storage, SSD & Memory》
《Heterogeneous Storages in HDFS》
Hadoop has traditionally been used for batch processing data at large scale. Batch processing applications care more about raw sequential throughput than low-latency and hence the existing HDFS model where all attached storages are assumed to be spinning disks has worked well.
There is an increasing interest in using Hadoop for interactive query processing e.g. via Hive. Another class of applications makes use of random IO patterns e.g. HBase. Either class of application benefits from lower latency storage media. Perhaps the most interesting of the lower latency storage media are Solid State Disks (SSDs).
The high cost per Gigabyte for SSD storage relative to spinning disks makes it prohibitive to build a large scale cluster with pure SSD storage. It is desirable to have a variety of storage types and let each application choose the one that best fits its performance or cost requirements. Administrators will need mechanisms to manage a fair distribution of scarce storage resources across users. These are the scenarios that we aim to address by adding support for Heterogeneous Storages in HDFS.
Let’s take a quick sidebar to review the performance characteristics of a few common storage types. If you are familiar with this topic you may skip to the next section.
Storages can be chiefly evaluated on three classes of performance metrics:
The following table summarizes the characteristics of a few common storage types based on the above metrics.
Storage Type |
Throughput |
Random IOPS |
Data Durability |
Typical Cost |
HDD |
High |
Low |
Moderate, failures can occur at any time. |
4c/GB |
SSD |
High |
High |
Moderate, failures can occur at any time. |
50c/GB for internal SATA SSD. roughly costs 10x or more of HDD |
NAS (Network attached storage) |
High |
Varies |
May employ RAID for high durability |
Varies based on features, typically falls between HDD and SSD. |
RAM |
Very high |
Very high |
No durability, data is lost on process restart |
>$10/GB roughly costs 100x or more of HDD |
We approached the design with the following goals:
The NameNode and HDFS clients have historically viewed each DataNode as a single storage unit. The NameNode has not been aware of the number of storage volumes on a given DataNode and their individual storage types and capacities.
DataNodes communicate their storage state through the following types of messages:
Currently each DataNode sends a single storage report and a single block report containing aggregate information about all attached storages.
With Heterogeneous Storage we have changed this picture so that the DataNode exposes the types and usage statistics for each individual storage to the NameNode. This is a fundamental change to the internals of HDFS and allows the NameNode to choose not just a target DataNode when placing replicas, but also the specific storage type on each target DataNode.
Separating the DataNode storages in this manner will also allow scaling the DataNode to larger capacity by reducing the size of individual block reports which can be processed faster by the NameNode.
We plan to introduce the idea of Storage Preferences for files. A Storage Preference is a hint to HDFS specifying how the application would like block replicas for the given file to be placed. Initially the Storage Preference will include: a. The desired number of file replicas (also called the replication factor) and; b. The target storage type for the replicas.
HDFS will attempt to satisfy the Storage Preference based the following factors:
If the target storage type e.g. SSD is not available, then HDFS will attempt to place replicas on the fallback storage medium (Hard disk drives).
An applications can optionally specify a Storage Preference when creating a file, or it may choose to modify the Storage Preference on an existing file.
The following FileSystem
API changes will be exposed to allow applications to manipulate Storage Preferences:
FileSystem#create
will optionally accept Storage Preference for the new file.FileSystem#setStoragePreference
to change the Storage Preference of a single file. Replaces any existing Storage Preference on the file.FileSystem#getStoragePreference
to query the Storage Preference for a given file.Changing the Storage Preference of an existing file will initiate the migration of all existing file blocks to the new target storage. The call will return success or failure depending on the quota availability (more on quotas in the next section). The actual migration may take a long time depending on the size of the file. An application can query the current distribution of the block replicas of the file using DFSClient#getBlockLocations
.
The API will be documented in detail on HDFS-5682.
Quota is a hard limit on the disk space that may be consumed by all the files under a given directory tree. Quotas can be set by the administrator to restrict the space consumption of specific users by applying limits on their home directory or on specific directories shared by users. Disk space quota is deducted based on the number of replicas. Thus if a 1GB file is configured to have three block replicas, the total quota consumed by the file will be 3GB.
Disk space quota is assumed to be unlimited when not configured for a given directory. Quotas are checked recursively starting at a given directory and walking up to the root. The effective quota of any directory is the minimum of (Directory quota, Parent quota, Grandparent quota, … , Root quota). An interesting property of disk space quota is that it can be reduced by the administrator to be something less than the combined disk usage under the directory tree. This leaves the directory in an indefinite Quota Violation state unless one or more replicas are deleted.
We will extend the existing Quota scheme to add a per storage-type quota for each directory. For a given directory, if its parent does not specify any per-type quota, then the per-type quota of the directory applies. However if the parent does specify a per-type quota, then the minimum of the (parent, subdirectory) applies. If the parent explicitly specifies a per-type quota of zero, then the children cannot use anything. This property can be used by the administrator to prevent creating files on SSD under /tmp, for example.
Given the scope of the changes we have chosen to implement the feature in two principal phases. The first phase adds support for exposing the DataNode as a collection of storages. This support is currently available in trunk and is planned to be merged into the Apache branch-2 so that it will be available in the Apache Hadoop 2.4 release.
The second phase will add API support for applications to make use of Storage Types and is planned to align with the 2.5 release time frame.
Categorized by :
HDFS