HDF5
What is it?
HDF5 (Hierarchical Data Format Version 5) is a general purpose library and file format for storing scientific data. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic objects, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs.
Installation and Use of HDF5 on LRZ platforms
Linux based HPC Systems
As of April 2022, there is a new software stack 22.2.1 available on CoolMUC2 and SuperMUC-NG. We provide at least one minor version of HDF5 1.8 and 1.10, where you need to be careful as these versions have different formats/APIs.
For the available hdf5 modules you can check yourself via
module avail hdf5
On spack stack 22.2.1 we provide the following modules:
Serial HDF5 | HDF5 MPI parallel (with Intel-MPI) |
---|---|
hdf5/1.8.22-gcc11 hdf5/1.8.22-intel21 hdf5/1.10.7-gcc11 hdf5/1.10.7-intel19 | hdf5/1.8.22-gcc11-impi hdf5/1.8.22-intel21-impi hdf5/1.10.7-gcc11-impi hdf5/1.10.7-intel21-impi |
The suffixes "-gcc11" and "-intel21" represent the used compilers and the corresponding compiler modules should be loaded when using the modules. The suffix "-impi" stands for the MPI parallel version built with the Intel-MPI standard module.
All packages are built with C, C++ and Fortran support. To make use of HDF5, please load the appropriate Environment Module
For the parallel version with Intel compiler, e.g. use
module load hdf5/1.10.7-intel21-impi
Then, compile your code with
[mpicc|mpicxx|mpif90] -c $HDF5_INC foo.[c|cc|f90]
and link it with
[mpicc|mpicxx|mpif90] -o myprog foo.o <further objects> [$HDF5_F90_SHLIB|$HDF5_CPP_SHLIB] $HDF5_SHLIB
For a serial version (with Intel compiler), e,g, use
module load hdf5/1.10.7-intel21
Then, compile your code with
[icc|icpc|ifort] -c $HDF5_INC foo.[c|cc|f90]
and link it with
[icc|icpc|ifort] -o myprog.exe foo.o <further objects> [$HDF5_F90_SHLIB|$HDF5_CPP_SHLIB] $HDF5_SHLIB
One of the language support libraries $HDF5_F90_SHLIB or $HDF5_CPP_SHLIB is only required if either Fortran or C++ are used for compiling and linking your application.
For static linking, use $HDF5_..._LIB versions instead of $HDF5_..._SHLIB, but this not recommended.
Utilities
Loading an HDF5 module typically will also make available command-line utilities e.g., h5copy, h5debug, h5dump etc. It may be advisable to run these utilities using a serial (as opposed to MPI parallel) HDF5 version, since a linked-in MPI library may not work properly in purely interactive usage.
h5utils
h5utils (Github) is a set of utilities for the visualisation and conversion of scientific data in HDF5 format. Besides providing a simple tool for batch visualisation as PNG images, h5utils also includes programs to convert HDF5 datasets into the formats required by other free visualization software (e.g. plain text, Vis5d, and VTK).
h5utils is not part of the HDF5 module, nor is it available directly in the LRZ provided software stack. The recommended procedure to install this software on SuperMUC-NG, CoolMUC-2 and other LRZ managed clusters is to install it via user-spack:
module load user_spack # Install spack info h5utils spack install h5utils # Load to search path spack load h5utils # Unload spack unload h5utils
Documentation
Please refer to the HDF5 Web Site for documentation of the interface.
H5py (Pythonic Interface to HDF5)
There are several options to install h5py on LRZ systems. One option is using "pip
" or "Conda
" (see here or here for details; or further below). The other option (and probably preferable) is the installation via "user_spack" (see also Spack package management tool)
The installation procedure is similar on all systems.
Remark
In order to use h5py MPI parallel, one needs to build it against an hdf5 that was build with MPI support, and against mpi4py! The compiler and MPI installation must be consistent!
This is what we focus on in this docu. Without MPI requirements, h5py installation is usually less complex, and would not require the build from sources!
Spack/User Spack
To create h5py, select an hdf5 module you want to work with. Let us assume you want to use the module hdf5/1.10.11-gcc12-impi
on CoolMUC4, which is built with a GCC compiler, and Intel MPI.
One needs the HASH of the hdf5 Spack installation. It can be obtained using "module show
":
cm4login1:~> module show hdf5/1.10.11-gcc12-impi | grep BASE setenv HDF5_BASE /dss/lrzsys/sys/spack/release/23.1.0/opt/icelake/hdf5/1.10.11-gcc-mlcdtiq
The hash consists of the last seven characters: mlcdtiq
Please note: The hashes of the installations differ on all systems. Using the hash from above for an installation on e.g. SuperMUC-NG will fail!
Next, load the user_spack module to make the spack command-line tool available.
module load user_spack
Installation
The installation (which you only need to do once, if it works without problems) is then done as follows. The general build instruction looks like this,
spack install py-h5py%COMPILER ^hdf5/HASH_OF_INSTALLATION
where COMPILER stands for the compiler of the hdf5 module. It can be gcc
, intel
or oneapi
(please check with spack compilers
!) and HASH_OF_INSTALLATION is the hdf5 installation hash from above.
For example, it would be like this:
spack install py-h5py%gcc ^hdf5/mlcdtiq
Usually, it should also be ok to just
spack install py-h5py ^hdf5/mlcdtiq
Spack can resolve the used compiler from the hdf5 dependency (one can check this using spack spec -lINt py-h5py ^hdf5/mlcdtiq
).
Testing and Using
The easiest way is now to simply load this package.
:~> spack load py-h5py :~> cat > h5py_test.py << EOT from mpi4py import MPI import h5py rank = MPI.COMM_WORLD.rank # The process ID (integer 0-3 for 4-process run) print("rank:",rank) f = h5py.File('parallel_test.hdf5', 'w', driver='mpio', comm=MPI.COMM_WORLD) dset = f.create_dataset('test', (4,), dtype='i') print("created ds") dset[rank] = rank f.close() EOT > mpiexec -n 4 python h5py_test.py # check h5py working in parallel rank: 0 created ds rank: 1 created ds rank: 2 created ds rank: 3 created ds
On login nodes, it might be required to further tune the MPI environment (if a MPI module is loaded) according to
export I_MPI_HYDRA_BOOTSTRAP=fork I_MPI_FABRICS=shm # on login nodes; skip if you go to compute nodes unset I_MPI_HYDRA_IFACE I_MPI_PMI_LIBRARY # dito (if set like on CM4)
before calling mpiexec. Or, with the intel-mpi loaded, go to a compute node (salloc
or sbatch
). On compute nodes, you SHOULD load the default intel-mpi module as we usually set further environment variables for optimization.
Essentially, usage should generally be as simple as that: load user_spack
(and intel-mpi
), and then spack load py-h5py
.
Module Creation and Usage
One can also create an environment module if desired. If the steps above were successful, one can go on with
spack module tcl refresh -y
The module is then generated in the directory $HOME/{spack,user_spack}/<spack version>/<architecture>/
, or so. (This scheme has not settled, yet.)
Note: The subfolder x86_avx2
in the path $HOME/user_spack/23.1.0/modules/icelake/
to the modules differs on other systems. On e.g. SuperMUC-NG the path would be $HOME/spack/modules/x86_avx512/linux-sles15-skylake_avx512/ .
To use the h5py module, one needs to make the module available to the module system. And the corresponding hdf5 and mpi modules are required.
module use -p ~/user_spack/23.1.0/modules/icelake/ module load python/3.10.10-extended # some extended python is necessary for the mpi4py module load gcc # unless done automatically with hdf5 module load intel-mpi # unless done automatically with hdf5 module load hdf5/1.10.11-gcc12-impi module load py-h5py
It is in the user's responsibility here to load the modules consistently! We recommend the use of module collections here (check module help
!).
Using pip
For SuperMUC-NG, there is no internet access. Please, have a look here for options.
Using Conda (self-confined)
Using conda/mamba (please note the current regulations here!), installing HDF5 and MPI can be as simple as
micromamba create -n my_h5py h5py h5py=*=*mpich* # or "conda" micromamba activate my_h5py
MPICH is not really supported at LRZ. But it is similar to Intel MPI, but does not react on I_MPI_* environment variables! It still can be used together with our system. On single login or compute nodes,
mpiexec -n 4 python h5py_test.py
For more than one node, within a Slurm allocation, please use
mpiexec -launcher slurm -n 4 python h5py_test.py