Skip to content

HDF5

Hierarchical Data Format version 5 (HDF5) is a set of file formats, libraries, and tools for storing and managing large scientific datasets. Originally developed at the National Center for Supercomputing Applications, it is currently supported by the non-profit HDF Group.

HDF5 is different product from previous versions of software named HDF, representing a complete redesign of the format and library. It also includes improved support for parallel I/O. The HDF5 file format is not compatible with HDF 4.x versions.

Note

h5toh4 and h4toh5 converters are available on all NERSC machines.

Using HDF5 at NERSC

Cray provides native HDF5 libraries for each of the three PrgEnvs. The module cray-hdf5 provides a serial HDF5 I/O library:

module load cray-hdf5
ftn my_serial_hdf5_code.f90

while cray-hdf5-parallel provides a parallel HDF5 implementation:

module load cray-hdf5-parallel
ftn my_parallel_hdf5_code.f90

After loading one of those modules, one can continue to use the Cray compiler wrappers cc, CC, and ftn to compile HDF5 applications without requiring any additional flags to the compiler:

Note

The netCDF and HDF libraries provided by recent versions of the cray-netcdf, cray-netcdf-hdf5parallel, cray-hdf5 and cray-hdf5-parallel modules use a file locking feature. This feature is supported in the CSCRATCH file system, but it is not supported in the NGF file systems (CFS, HOME, ...). Before you run a program built with the Cray libraries in such a file system, you need to disable file locking by running the command:

export HDF5_USE_FILE_LOCKING=FALSE

Other HDF5 tools at NERSC

NERSC provides several additional tools which allow users to interact with HDF5 data.

H5py

The H5py package is a Pythonic interface to the HDF5 library.

H5py provides an easy-to-use high level interface, which allows an application to store huge amounts of numerical data, and easily manipulate that data from NumPy. H5py uses straightforward Python and NumPy metaphors, like dictionaries and NumPy arrays. For example, you can iterate over datasets in a file, or check the .shape or .dtype attributes of datasets. You don't need to know anything special about HDF5 to get started. H5py rests on an object-oriented Cython wrapping of the HDF5 C API. Almost anything you can do in HDF5 from C, you can do with h5py from Python.

For information about using H5py at NERSC, please see our page here.

H5hut

HDF5 Utility Toolkit (H5hut) is a veneer API for HDF5: H5hut files are also valid HDF5 files and are compatible with other HDF5-based interfaces and tools. For example, the h5dump tool that comes standard with HDF5 can export H5hut files to ASCII or XML for additional portability. H5hut also includes tools to convert H5hut data to the Visualization ToolKit (VTK) format and to generate scripts for the Gnuplot data plotting tool.

Using H5hut at NERSC

For serial HDF5 code:

module load cray-hdf5
module load h5hut
cc my_serial_h5hut_code.c

For parallel HDF5 code:

module load cray-hdf5-parallel
module load h5hut-parallel
cc my_parallel_h5hut_code.c

Further information about HDF5