Page tree
Skip to end of metadata
Go to start of metadata

          

Energy Aware Runtime (EAR) is a system level tool used on SuperMUC-NG for optimisation of energy consumption.

It has been created in the context of the Barcelona Supercomputing Centre (BSC)/Lenovo Cooperation project.

Details: User Guide

How it works

EAR regularly monitors the runtime behaviour of a job taking instruction throughput, memory access behaviour, and power consumption into account. From this, it derives the best frequency setting according to a configured policy.

For MPI jobs (Intel MPI or OpenMPI), EAR can hook into MPI functions to detect iterative computational phases of an application, allowing it to immediately change frequency when a phase with already known behaviour is entered. In this mode, EAR monitors its own overhead. If that is too high, it switches back to a mode that uses time-based behaviour monitoring. The latter is the default if MPI is not used.

Default EAR Configuration On SuperMUC-NG

By default, the policy of EAR is set to targeting high performance, but reducing frequency as long as performance is acceptable high (this policy is called "MIN_TIME_TO_SOLUTION" in EAR terms).

Controlling EAR behaviour

When OpenMPI is used (instead of Intel MPI) and EAR should be active, please add the following switch to the job submission:

#SBATCH --ear-mpi-dist=openmpi

EAR can render profiling or benchmark measurements difficult and unstable. In this case, users can enforce a fixed base frequency of 2.3 GHz by switching EAR off, putting the following line in the job script:

#SBATCH --ear=off

(warning) Attention: switching EAR off for regular runs is not recommended, as it probably will slow down your jobs due to not using higher CPU clock frequencies!


It is also possible to use the above switches as command line arguments on salloc.

Troubleshooting

Trouble with shared libraries

With the current setup, there may be trouble with switching modules; an error message like

/usr/bin/tclsh: error while loading shared libraries: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory

appears. To work around this, there are two options:

  1. switch EAR off as described above.
  2. temporarily unset the LD_PRELOAD variable before making changes to the environment, and set it back to its original value just before running mpiexec.


Further Information

EAR is developed by Lenovo under an Open-Source licence. Please contact LRZ if you are interested in collaboration on energy efficiency of HPC systems.


  • No labels