Moose Framework on HPC Systems

What is Moose Framework?

Moose Framework is an open-source, parallel finite element framework based on Libmesh, which in turn is based on PETSc, and used to solve multi-physics partial differential equations computationally.

Moose comes already with a lot of modules, but can be extended as is the case for Golem.

Getting Started

Installation

The basic installation steps are as follows (please also consult the Moose Framework documentation page).

> git clone https://github.com/idaholab/moose.git
> cd moose
> git checkout master
> git submodule update --init
> git submodule foreach --recursive git submodule update --init
> ./scripts/update_and_rebuild_petsc.sh                              # building PETSc
> ./scripts/update_and_rebuild_libmesh.sh                            # building Libmesh
> ./scripts/update_and_rebuild_wasp.sh                               # building WASP
> cd test                                                            # building and performing tests
> make -j 6
> ./run_tests
> cd ../modules                                                      # building the Moose modules (apps); instead also own apps (Golem) can be built
> make -j 6

This above procedure already assumes a correct build environment. On the LRZ HPC cluster, such environment must be arranged for manually by the user. Some adaptations are necessary, e.g. in order to use our tuned performance libraries (e.g. usage of HDF5, Intel MKL and MPI, etc.). As quite usual for PETSc applications, the whole pipeline down to Moose must be build with the same tool chain (same compiler, compiler settings, MPI, ...).

On ColllMUC-4, for instance, the following procedure using GCC and Intel MPI works reasonably well. But Intel compilers should work principially, too.

CoolMUC-4 Installation Procedure Example using GCC compiler and Intel MPI
> module load cmake gcc intel-mkl intel-mpi hdf5/1.10.11-gcc12-impi boost/1.83.0-gcc12-impi libtirpc

> git clone https://github.com/idaholab/moose.git
> cd moose
> git checkout master
> git submodule update --init
> git submodule foreach --recursive git submodule update --init

> export HDF5_DIR=$HDF5_BASE 
> export I_MPI_HYDRA_BOOTSTRAP=fork                                           # because PETSc requires the test of MPI functionality
> export I_MPI_FABRICS=shm
> export MOOSE_JOBS=20                                                        # builds faster  

# building PETSc
> ./scripts/update_and_rebuild_petsc.sh --help                                # this script passes all cmd parameters also to PETSc configure script
> ./scripts/update_and_rebuild_petsc.sh --with-blaslapack-dir=$MKL_BASE \
           --with-cc=$(which mpicc) --with-cxx=$(which mpicxx) --with-fc=$(which mpif90) --with-mpi-f90=$(which mpif90) --with-mpiexec=$(which mpiexec) \ 
           COPTFLAGS='-g -O3 -march=native' CXXOPTFLAGS='-g -O3 -march=native' FOPTFLAGS='-g -O3 -march=native' \
           --with-mpi-include=$MPI_BASE/include --with-mpi-lib=$MPI_BASE/lib/release/libmpi.a --with-64-bit-indices=true
# instead of -march=native, -march=x86-64-v4 could be used

# building Libmesh
> export CC=mpicc CXX=mpicxx FC=mpif90 F90=mpif90 F77=mpif77
> export CFLAGS="-O3 -march=native" CXXFLAGS="-O3 -march=native" FCFLAGS="-O3 -march=native" FFLAGS="-O3 -march=native"

> ./scripts/update_and_rebuild_libmesh.sh

# This may fail with "configure: error: *** XDR was not found, but --enable-xdr-required was specified."
# In that case, load the libtirpc module as above and retry the step above with
> ./scripts/update_and_rebuild_libmesh.sh --with-xdr-include=$LIBTIRPC_BASE/include/tirpc --with-xdr-libdir=$LIBTIRPC_BASE/lib --with-xdr-libname=tirpc
# Alternatively, you can also remove the --enable-xdr-required flag in the file scripts/configure_libmesh.sh,
# or set it to --disable-xdr, if you are sure that you won't need XDR support (might still be that the build then fails).

# building WASP
> ./scripts/update_and_rebuild_wasp.sh

# building and running tests
> cd test
> module load python/3.10.10-extended                                          # the system provided Python might not suffice
> make -j 6
> unset I_MPI_PMI_LIBRARY                                                      # disturbing on login nodes
> ./run_tests -j 4

Some (10) tests may fail. Some 100 are skipped maybe. As long as this is not essential for your workflow, you can live with that.

The hardware-specific GCC optimization flags (-march) must be changed when using a different architecture. Please consult the GCC doku on that!

For building Moose apps, the environment must be restored (compilers, tool chain, libraries ... please consider module collections (module help), can be placed also in an own user-defined module file), and MOOSE_DIR must be set to the moose top directory. For instance, building (separately) the moose apps (although MOOSE_DIR is not necessary in this case), might work as follows.

> module load cmake gcc intel-mkl intel-mpi hdf5/1.10.11-gcc12-impi boost/1.83.0-gcc12-impi libtirpc
> export HDF5_DIR=$HDF5_BASE                                         # probably not necessary anymore; HDF5 is linked in PETSc/Libmesh
> export CC=mpicc CXX=mpicxx FC=mpif90 F90=mpif90 F77=mpif77
> export CFLAGS="-O3 -march=native" CXXFLAGS="-O3 -march=native" FCFLAGS="-O3 -march=native" FFLAGS="-O3 -march=native"
> cd moose
> export MOOSE_DIR=$PWD
> cd modules
> module load python/3.10.10-extended
> make -j 10                                  # takes some while
> unset I_MPI_PMI_LIBRARY                     # disturbing on login nodes
> ./run_tests -j 4                            # takes even longer

Few tests may fail again. Some are skipped. Please check, whether that's critical for your workflows.

Finally, there are also examples, moose/examples. Good starting point to learn the workflows of Moose, and to have some reference on how to setup solvers and cases.

Usage

For the run-time application, only the run-time libraries are necessary (boost may only be used as compile-time library; but loading it does not harm ... gcc is maybe also not relevant).

> module load gcc intel-mkl intel-mpi hdf5/1.10.11-gcc12-impi boost/1.83.0-gcc12-impi libtirpc
> module save moose_runtime                      # create a module collection; for later use: module restore moose_runtime
> mpiexec <mpi-options> ./my-moose-app-opt <options>
# for instance, in moose/examples/ex01_inputfile
> make
> mpiexec -n 2 ./ex01-opt --n-threads=4 -i diffusion_pathological.i

Framework Information:
MOOSE Version:           git commit e2ec6f19cf on 2025-01-30
LibMesh Version:         6ef7d4395794104f48dae1fd48e64077207188e8
PETSc Version:           3.22.1
SLEPc Version:           3.22.1
Current Time:            Fri Jan 31 22:02:55 2025
Executable Timestamp:    Fri Jan 31 21:27:42 2025
...
Parallelism:
  Num Processors:          2
  Num Threads:             4
...

The framework includes most libraries in the executable app's RPATH. However, Intel MKL/MKI and HDF5 modules also provide run-time optimization settings via environment variables. So, loading these modules is recommended.

Moose applications have the option --help. Use this to learn about run-time adaptations, and monitoring capabilities. As PETSc applications, Moose apps also accept the PETSc run-time cmd parameters. A Slurm job script can be kept rather short. E.g.

moose.slurm
#!/bin/bash
#SBATCH -o myjob.%j.%N.out
#SBATCH -D .
#SBATCH -J Test
#SBATCH --clusters=cm4                      # SRPs, 112 CPU cores: 2 sockets, 56 CPUs per socket
#SBATCH --partition=cm4_tiny
#SBATCH --get-user-env
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2                 # 2 ranks (1 rank per socket), and 
#SBATCH --cpus-per-task=56                  # 56 threads per rank
#SBATCH --hint=nomultithread                # don't use hardware threading
#SBATCH --mail-type=none                    # if set differently, provide a valid email address
#SBATCH --export=NONE                       # mandatory!
#SBATCH --time=2:00:00

# module restore moose_runtime              # if you created a module collection; (must be done before slurm_setup)

module load slurm_setup

# or, if no module collection is used ...
module load gcc intel-mkl intel-mpi hdf5/1.10.11-gcc12-impi boost/1.83.0-gcc12-impi libtirpc

export OMP_PLACES=cores OMP_PROC_BIND=close                                 # GOMP thread placement *)
mpiexec ./my-moose-app-opt --n-threads=$SLURM_CPUS_PER_TASK -i Test_Case.i

*) In this MPI+OpenMP hybrid mode, within an MPI rank the communication between threads is via shared memory, what is usually faster than MPI within a NUMA domain.
    Care must be taken that the threads run on different CPUs. The user is responsible for the correct settings.

HPC relevant Topics

Advises

  • Create two module collections - one for build-time, and one for run-time.
  • Compile the whole chain (PETSc-Libmesh-Moose app) with AVX support for the respective hardware. Computational frameworks like PETSc usually benefit from this.
  • Moose apps are MPI programs and are usually started via mpiexec or srun --mpi=pmi2, or the like. However, some applications may also have the option --n-threads=<# threads per rank>. Hybrid MPI/Thread execution of applications on the LRZ clusters is recommended for efficient use on NUMA nodes (see the Usage section above).
  • Measure Performance: Specifically when you start with a new case, this is important! Start with few time steps. Check correct MPI rank/thread placement to the CPU cores. Try to assess the run-time and memory consumption requirements (if memory becomes a bottleneck on the nodes, consider to use distributed meshes). Perform some scaling test with the test-case at hand, in oder to assess the possibility for accelerating your computations. (A parallel efficiency of 70% and more are ok. Please also look on the Slurm queue limits!)
    Assessing the total runtime of a simulation case might prove difficult, because of the adaptive time-step integration. But starting from a certain time-step is possible, and thus stop and restart is a solution to recursively extend the simulation's total time-integration time.
  • Pre/Post Processing: Most file formats used in Libmesh/Moose can be analysed with ParaView.

Why don't you provide Moose as a centrally installed Module at LRZ?

  1. Still only few users with diverging requirements.
  2. Moose/Apps experience ongoing rapid development. When users start using some fixed-version, or a bigger community starts using the apps by Moose, we can revise our decision.
  3. Education: Moose is a framework meant to support the development of apps. This is, on the one hand, often not easy to provide as a central module. On the other hand, users should learn to know the complete setup of their tools (including the build of PETSc and Libmesh ... the moose developers have already simplified the business). This is best done when users practice the installation by themselves. For support requests, please contact our Service Desk.
  4. But for OpenFOAM, there are central modules. And OpenFOAM ist also a framework. That is true. But: More users (much larger community) are using the software as is (no development). Well settled environment management for build and run-time. Well settled release cycle and versioning strategy. Industrial support. We are not going to discriminate Moose here. We just have to make reasonable decisions accounting for limited man power for the support of software.