The netCDF file chooses linear data layout, in which the data arrays areeither stored in contiguous space and in a predefined order or interleaved in aregular pattern. This regular and highly predictable data layout enables the PnetCDFdata I/O implementation to simply pass the data buffer, metadata (file view,MPI Datatype, etc.), and other optimization information to MPI-IO, and allparallel I/O operations are carried out in the same manner as when MPIIO aloneis used. Thus, there is very little overhead, and the PnetCDF performanceshould be nearly the same as MPIIO if only raw data I/O performance iscompared.
On the other hand, parallelHDF5 uses a tree-like file structure that issimilar to the UNIX file system: the data is irregularlylaid out using super block, header blocks, data blocks, extended header blocks,and extended data blocks. This is a very flexiblesystem and might have advantages for some applications and access patterns.However, this irregular layout pattern can make it difficult to pass useraccess patterns directly to MPI-IO, especially for variable-sized arrays.
Instead, parallelHDF5 uses dataspace and hyperslabs to define the data organization, map andtransfer data between memory space and the file space, and does buffer packing/unpackingin a recursive way. MPI-IOis used under this, but this additional overhead can result in significant performanceloss.
Second, the PnetCDF implementationmanages to keep the overhead involved in header I/O as low as possible. In the netCDF file, only one header contains all necessaryinformation for direct access of each data array, and each array is associatedwith a predefined, numerical ID that can be efficiently inquired when it isneeded to access the array. By maintaining a local copy of the header oneach process, our implementation saves a lot of interprocess synchronization aswell as avoids repeated access of the file header each time the headerinformation is needed to access a single array. All header information can beaccessed directly in local memory and interprocess synchronization is needed onlyduring the definition of the dataset. Once the definition of the dataset iscreated, each array can be identified by its permanent ID and accessed at anytime by any process, without any collective open/close operation.
On the other hand, in HDF5 the header metadata is dispersed in separate headerblocks for each object, and, in order to operate on an object, it has toiterate through the entire namespace to get the header information of thatobject before accessing it. This kind of access method may be inefficientfor parallel access, particularly because parallel HDF5 defines the open/closeof each object to be a collective operation, which forces all participatingprocesses to communicate when accessing a single object, not to mention thecost of file access to locate and fetch the header information of that object.Further, HDF5 metadata is updated during data writes in some cases. Thusadditional synchronization is necessary at write time in order to maintain synchronizedviews of file metadata.
However, PnetCDF also has limitations. Unlike HDF5, netCDF does notsupport hierarchical group based organization of data objects. Since it laysout the data in a linear order, adding a fixed-sized array or extending thefile header may be very costly once the file is created and has existing datastored, though moving the existing data to the extended area is performed inparallel. Also, PnetCDF does not provide functionality to combine two or morefiles in memory through software mounting, as HDF5 does. Nor does netCDFsupport data compression within its file format (although compressed writesmust be serialized in HDF5, limiting their usefulness). Fortunately, thesefeatures can all be achieved by external software such as netCDF Operators [8],with some sacrifice of manageability of the files.