Moose Framework on HPC Systems

What is Moose Framework?

Moose Framework is an open-source, parallel finite element framework based on Libmesh, which in turn is based on PETSc, and used to solve multi-physics partial differential equations computationally.

Moose comes already with a lot of modules, but can be extended as is the case for Golem.

Getting Started

Installation

The basic installation steps are as follows (please also consult the Moose Framework documentation page).

> git clone https://github.com/idaholab/moose.git
> cd moose
> git checkout master
> git submodule update --init
> git submodule foreach --recursive git submodule update --init
> ./scripts/update_and_rebuild_petsc.sh                              # building PETSc
> ./scripts/update_and_rebuild_libmesh.sh                            # building Libmesh
> cd test                                                            # building and performing tests
> make -j 6
> ./run_tests
> cd ../modules                                                      # building the Moose modules (apps); instead also own apps (Golem) can be built
> make -j 6

This above procedure already assumes a correct build environment. On the LRZ HPC cluster, such environment must be arranged for. And some adaptations are necessary, e.g. in order to use our tuned performance libraries (e.g. usage of HDF5, Intel MKL and MPI, etc.). As quite usual for PETSc applications, the whole pipeline down to Moose must be build with the same tool chain (same compiler, compiler settings, MPI, ...).

On ColllMUC-2, for instance, the following procedure works reasonably well.

CoolMUC-2 Installation Procedure Example
> module rm intel-mpi/2019-intel intel-mkl/2020 intel
> module load cmake gcc/8 intel-mkl/2020-gcc11 intel-mpi/2019-gcc hdf5/1.10.7-gcc11-impi boost/1.77.0-gcc11

> git clone https://github.com/idaholab/moose.git
> cd moose
> git checkout master
> git submodule update --init
> git submodule foreach --recursive git submodule update --init

> export HDF5_DIR=$HDF5_BASE 
> export I_MPI_HYDRA_BOOTSTRAP=fork                                           # because PETSc requires the test of MPI functionality
> export I_MPI_FABRICS=shm

# building PETSc
> ./scripts/update_and_rebuild_petsc.sh --help                                # this script passes all cmd parameters also to PETSc configure script
> ./scripts/update_and_rebuild_petsc.sh --with-blaslapack-dir=$MKL_BASE \
           --with-cc=$(which mpicc) --with-cxx=$(which mpicxx) --with-fc=$(which mpif90) --with-mpi-f90=$(which mpif90) --with-mpiexec=$(which mpiexec) \ 
           COPTFLAGS='-g -O3 -march=haswell' CXXOPTFLAGS='-g -O3 -march=haswell' FOPTFLAGS='-g -O3 -march=haswell' \
           --with-mpi-include=$MPI_BASE/include --with-mpi-lib=$MPI_BASE/lib/release/libmpi.a --with-64-bit-indices=true

# building Libmesh
> export CC=mpicc CXX=mpicxx FC=mpif90 F90=mpif90 F77=mpif77
> export CFLAGS="-O3 -march=haswell" CXXFLAGS="-O3 -march=haswell" FCFLAGS="-O3 -march=haswell" FFLAGS="-O3 -march=haswell"

> ./scripts/update_and_rebuild_libmesh.sh

# building and running tests
> cd test
> make -j 6
> module load python/3.8.11-extended                                          # the system provided Python might not suffice
> ./run_tests -j 4

Some (10) tests may fail. Some 100 are skipped maybe. As long as this is not essential for your workflow, you can live with that.

The hardware-specific GCC optimization flags must be changed when using a different compiler, or using different hardware.

For building Moose apps, the environment must be restored (compilers, tool chain, libraries ... please consider module collections (module help), can be placed also in an own user-defined module file), and MOOSE_DIR must be set to the moose top directory. For instance, building (separately) the moose apps (although MOOSE_DIR is not necessary in this case), might work as follows.

> module rm intel-mpi/2019-intel intel-mkl/2020 intel
> module load cmake gcc/8 intel-mkl/2020-gcc11 intel-mpi/2019-gcc hdf5/1.10.7-gcc11-impi boost/1.77.0-gcc11
> export HDF5_DIR=$HDF5_BASE                                         # probably not necessary anymore; HDF5 is linked in PETSc/Libmesh
> export CC=mpicc CXX=mpicxx FC=mpif90 F90=mpif90 F77=mpif77
> export CFLAGS="-O3 -march=haswell" CXXFLAGS="-O3 -march=haswell" FCFLAGS="-O3 -march=haswell" FFLAGS="-O3 -march=haswell"
> cd moose
> export MOOSE_DIR=$PWD
> cd modules
> make -j 10                                  # takes some while
> module load python/3.8.11-extended
> ./run_tests -j 4                            # takes even longer

Few tests may fail again. Some are skipped. Please check, whether that's critical for your workflows.

Finally, there are also examples, moose/examples. Good starting point to learn the workflows of Moose, and to have some reference on how to setup solvers and cases.

Usage

For the run-time application, only the run-time libraries are necessary (boost may only be used as compile-time library; but loading it does not harm ... gcc is maybe also not relevant).

> module rm intel-mpi/2019-intel intel-mkl/2020 intel
> module load gcc/8 intel-mkl/2020-gcc11 intel-mpi/2019-gcc hdf5/1.10.7-gcc11-impi boost/1.77.0-gcc11
> module save moose_runtime                      # create a module collection; for later use: module restore moose_runtime
> ./my-moose-app-opt <options>

The framework includes most libraries in the executable app's RPATH. However, Intel MKL/MKI and HDF5 modules also provide run-time optimization settings via environment variables. So, loading these modules is recommended.

Moose applications have the option --help. Use this to learn about run-time adaptations, and monitoring capabilities. As PETSc applications, Moose apps also accept the PETSc run-time cmd parameters. A Slurm job script can be kept rather short. E.g.

moose.slurm
#!/bin/bash
#SBATCH -o myjob.%j.%N.out
#SBATCH -D .
#SBATCH -J Test
#SBATCH --clusters=cm2_tiny                 # Haswell - 2 sockets with each 2 NUMA domains with each 7 CPU cores
#SBATCH --partition=cm2_tiny
#SBATCH --get-user-env
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=7
#SBATCH --mail-type=none                    # if set differently, provide a valid email address
#SBATCH --export=NONE                       # mandatory!
#SBATCH --time=2:00:00

# module restore moose_runtime              # if you created a module collection; (must be done before slurm_setup)

module load slurm_setup

# or, if no module collection is used ...
module rm intel-mpi/2019-intel intel-mkl/2020 intel
module load gcc/8 intel-mkl/2020-gcc11 intel-mpi/2019-gcc hdf5/1.10.7-gcc11-impi boost/1.77.0-gcc11

mpiexec ./my-moose-app-opt --n-threads=$SLURM_CPUS_PER_TASK -i Test_Case.i

HPC relevant Topics

Advises

  • Create two module collections - one for build-time, and one for run-time.
  • Compile the whole chain (PETSc-Libmesh-Moose app) with AVX support for the respective hardware. Computational frameworks like PETSc usually benefit from this.
  • Moose apps are MPI programs and are usually started via mpiexec or srun --mpi=pmi2, or the like. However, some applications may also have the option --n-threads=<# threads per rank>. Hybrid MPI/Thread execution of applications on the LRZ clusters is recommended for efficient use on NUMA nodes (see the Usage section above).
  • Measure Performance: Specifically when you start with a new case, this is important! Start with few time steps. Check correct MPI rank/thread placement to the CPU cores. Try to assess the run-time and memory consumption requirements (if memory becomes a bottleneck on the nodes, consider to use distributed meshes). Perform some scaling test with the test-case at hand, in oder to assess the possibility for accelerating your computations. (A parallel efficiency of 70% and more are ok. Please also look on the Slurm queue limits!)
    Assessing the total runtime of a simulation case might prove difficult, because of the adaptive time-step integration. But starting from a certain time-step is possible, and thus stop and restart is a solution to recursively extend the simulation's total time-integration time.
  • Pre/Post Processing: Most file formats used in Libmesh/Moose can be analysed with ParaView.

Why don't you provide Moose as a centrally installed Module at LRZ?

  1. Still only few users with diverging requirements.
  2. Moose/Apps experience ongoing rapid development. When users start using some fixed-version, or a bigger community starts using the apps by Moose, we can revise our decision.
  3. Education: Moose is a framework meant to support the development of apps. This is, on the one hand, often not easy to provide as a central module. On the other hand, users should learn to know the complete setup of their tools (including the build of PETSc and Libmesh ... the moose developers have already simplified the business). This is best done when users practice the installation by themselves. For support requests, please contact our Service Desk.
  4. But for OpenFOAM, there are central modules. And OpenFOAM ist also a framework. That is true. But: More users (much larger community) are using the software as is (no development). Well settled environment management for build and run-time. Well settled release cycle and versioning strategy. Industrial support. We are not going to discriminate Moose here. We just have to make reasonable decisions accounting for limited man power for the support of software.