ZFS Features Summary

阅读更多

I summarize these from Wikipedia and other documents on ZFS . But some concepts still appear unclear to me, e.g, RAID-Z.

 

Pool Storage

ZFS is not limited to devices and does not need volume manager any more.  File systems can share a common storage pool. creating filesystems is as easy as creating a new directory. You can efficiently have thousands of file systems, each with it's own quotas and reservations, and different properties (compression algorithm, checksum algorithm, etc..)

 

Copy-on-Write

On-disk state is valid. ZFS is a transactional system. All operation transactions are atomic, means they are either submitted or canceled. So all files on disks are consistent and won’t get corrupted by power fault or system collapse. This also avoids double write in logging. So, no fsck, scandisk is needed any more.

 

Capacity

ZFS is a 128-bit file system, so it can address 18 billion billion (1.84*1019) times more than current 64-bit systems.

 

Self-heal

Current file systems detect corrupted data by underlying hardware. ZFS heals itself by associate each block a checksum. If ZFS detects a checksum mismatch on a RAID-Z or mirrored filesystem, it will actively reconstruct the block from the available redundancy and go on about its job. Which brings us to the coolest thing about RAID-Z: self-healing data. In addition to handling whole-disk failure, RAID-Z can also detect and correct silent data corruption. Whenever you read a RAID-Z block, ZFS compares it against its checksum. If the data disks didn't return the right answer, ZFS reads the parity and then does combinatorial reconstruction to figure out which disk returned bad data. It then repairs the damaged disk and returns good data to the application. ZFS also reports the incident through Solaris FMA so that the system administrator knows that one of the disks is silently failing.

 

 

Instantaneous snapshots and clones

This feature makes it possible to have hourly, daily and weekly backups efficiently, as well as experiment with new system configurations without any risks. An advantage of copy-on-write is that when ZFS writes new data, the blocks containing the old data can be retained, allowing a snapshot version of the file system to be maintained. ZFS snapshots are created very quickly, since all the data composing the snapshot is already stored; they are also space efficient, since any unchanged data is shared among the file system and its snapshots. Writeable snapshots ("clones") can also be created, resulting in two independent file systems that share a set of blocks. As changes are made to any of the clone file systems, new data blocks are created to reflect those changes, but any unchanged blocks continue to be shared, no matter how many clones exist.

 

Built-in (optional) compression

In addition to reducing space usage by 2-3x, compression also reduces the amount of I/O by 2-3x. For this reason, enabling compression actually makes some workloads go faster.

 

Highly scalable

Different with traditional file system architecture (File System + Volume Manager + Storage), ZFS is directly based on storage device and provides all kinds of capability, thus it can offer better efficiency.

 

Dynamic Striping vs. Static Striping

Dynamic striping across all devices to maximize throughput means that as additional devices are added to the zpool, the stripe width automatically expands to include them; thus all disks in a pool are used, which balances the write load across them

On by default, dynamic striping automatically includes all deivces in a pool in writes simultaneously (stripe width spans all the avaiable media). This will speed up the I/O on systems with multiple paths to storage by load balancing the I/O on all of the paths.

For each virtual device that is added to the pool, ZFS dynamically stripes data across all available devices. The decision about where to place data is done at write time, so no fixed width stripes are created at allocation time.

 

Variable block sizes

ZFS uses variable-sized blocks of up to 128 kilobytes. The currently available code allows the administrator to tune the maximum block size used as certain workloads do not perform well with large blocks. Automatic tuning to match workload characteristics is contemplated.

If data compression (LZJB) is enabled, variable block sizes are used. If a block can be compressed to fit into a smaller block size, the smaller size is used on the disk to use less storage and improve IO throughput (though at the cost of increased CPU use for the compression and decompression operations).

 

Lightweight filesystem creation

In ZFS, filesystem manipulation within a storage pool is easier than volume manipulation within a traditional filesystem; the time and effort required to create or resize a ZFS filesystem is closer to that of making a new directory than it is to volume manipulation in some other systems.

 

Limitations

ZFS is not a native cluster , distributed , or parallel file system and cannot provide concurrent access from multiple hosts as ZFS is a local file system. Sun's Lustre distributed filesystem will adapt ZFS as back-end storage for both data and metadata in version 3.0, which is scheduled to be released in 2010.

 

 

Links

ZFS: Ten reasons to reformat your hard drives

ZFS on Linux Works

ZFS Filesystem for FUSE/Linux

RAID-Z

Hadoop HDFS + ZFS (RE: Some Doubts of hadoop functionality)

你可能感兴趣的:(ZFS)