Energy Aware Runtime

          

Energy Aware Runtime (EAR) is a system level tool used on SuperMUC-NG for optimisation of energy consumption.

It has been created in the context of the Barcelona Supercomputing Centre (BSC)/Lenovo Cooperation project.

Details: user_guide_sng.pdf

How it works

EAR regularly monitors the runtime behaviour of a job taking instruction throughput, memory access behaviour, and power consumption into account. From this, it derives the best frequency setting according to a configured policy.

For MPI jobs (Intel MPI or OpenMPI), EAR can hook into MPI functions to detect iterative computational phases of an application, allowing it to immediately change frequency when a phase with already known behaviour is entered. In this mode, EAR monitors its own overhead. If that is too high, it switches back to a mode that uses time-based behaviour monitoring. The latter is the default if MPI is not used.

Default EAR Configuration On SuperMUC-NG

By default, the policy of EAR is set to targeting higher performance by using higher frequencies. The frequency drops at a base level of 2.3GHz when higher frequencies don't result in an increase of performance. This is usually the case in memory bound codes. This policy is called "min_time" in EAR terms.

Controlling EAR behaviour

EAR can render profiling or benchmark measurements difficult and unstable. In this case, users can enforce a fixed base frequency of 2.3 GHz by switching EAR off, putting the following line in the job script:

#SBATCH --ear=off

(Warnung) Attention: switching EAR off for regular runs is not recommended, as it probably will slow down your jobs due to not using higher CPU clock frequencies!


It is also possible to use the above switches as command line arguments on salloc.

Gathering detailed data

In order to collect all data of large applications which have been measured by EAR you can either let them be stored in the database via the following environment variable:

export SLURM_EARL_REPORT_LOOPS=1

In order to access this data, please request this to LRZ via the Service Desk. Note that the data is available only for 60 days. Remember that performance data is available at hpcreport.lrz.de. While HPC Report has a wider variety of metrics, EAR's data is more frequent and detailed.

For small applications (100 nodes or less), you can use a batch script option to store everything on files:

#SBATCH --ear-user-db=file



Troubleshooting

Getting EAR general debug information

If your application, without having made any changes to the code, fails for no apparent reason, consider enabling the EAR debugging information.

This information will be saved in the error file:

...
#SBATCH --error=<desired error file> 
#SBATCH --ear-verbose=1
...

Crash right after application startup on python based codes

The cause of this problem is that the mpi symbols are not recognized. Therefore, please specify whether you are using an Intel-MPI version or an Open MPI version with one of these exports respectively:

export SLURM_EAR_LOAD_MPI_VERSION="intel"

export SLURM_EAR_LOAD_MPI_VERSION="open mpi"

(Warnung) Note: when combining a python-mpi and a regular mpi application (i.e., no python but C/C++/Fortran) in the same batch please unset this variable for the regular mpi application while using EAR, otherwise your application may crash:

unset SLURM_EAR_LOAD_MPI_VERSION

Another option is to switch EAR to off on the entire batch script (see how to do this here).

Crash on scripts using anaconda / miniconda

Since intel conda channel provides mpich, it is necessary to disable ear completely for running jobs with anaconda or miniconda setup. In this case, please set ear to off.

Trouble with shared libraries

With the current setup, there may be trouble with switching modules; an error message like

/usr/bin/tclsh: error while loading shared libraries: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory

appears. To work around this, there are two options:

  1. switch EAR off as described above.
  2. temporarily unset the LD_PRELOAD variable before making changes to the environment, and set it back to its original value just before running mpiexec.

Changing cpu affinity within your application

Generally speaking, applications use default settings and thus a default affinity of cpus where the application runs. However, if your application changes the cpu affinity during runtime, EAR can't recognize this. EAR would have too much overhead, if it checks this at every iteration. Thus, you can either turn ear to off (see here) or you can set the affinity for the entire node, for all nodes with this environment variable:

export SLURM_EARL_NO_AFFINITY_MASK=1


Further Information

EAR is developed by Lenovo under an Open-Source licence. Please contact LRZ if you are interested in collaboration on energy efficiency of HPC systems.