lrztools and lrzlib on SuperMUC-NG

The tools and library provide liittle helper functions.

LRZ Tools

To get access to the LRZ Tools insert the following command

module load lrztools

The following commands are available

ComandPurpose, details and usage

Information

budget_and_quotaDisplays the CPU time budget and the file system quotas
full_quotaDisplays the used resources and quota of all directories which are accessible for a user
sw-info

Displays information about the installed software, particullarly how the SPACK  packages have been compiled.

Usage: sw-info [-lfisaS] [name|/HASH|module]
-l: list available software
-f: full info, including (implicite) spack software
-i: info
-s: display SPACK specs for building the software
-S: display SPACK specs only
-a: display all available information

Examples:

module load spack
sw-info
sw-info -S
sw-info cmake
sw-info -i cmake
sw-info -is cmake
sw-info -isf cmake
sw-info -a cmake
sw-info -f libelf
sw-info -S libelf
sw-info -S /wkue45z
sw-info fftw/mpi/3.3

NODEMEM.mpiReports the available memory of the nodes in a batch job.

Placement of processes and threads

cpuset_query
CPUSET_QUERY

Returns information about the topology, number of core and CPU on which a process can run

get_cpuset_mask
CPUSET_MASK
get_cpuset_mask_2

Returns a string of 0's and 1's indicating on which (logical) CPUs a process can run
Returns the host, CPU-Id and the mask
where_do_i_runReturns the CPU-ID on which the command was run
placementtest-mpi.intel
placementtest-omp
Returns information how processes and threads are placed on nodes and CPUs.
Example:
mpiexec -n 5 placementtest-mpi.intel
mask2hexconverts a binary mask to hex: mask2hex 11111111000000001111111 â†’ FF00FF.
can be used for processor lists

Performance tools

gprof-wrapperfor Intel MPI: mpiexec gprof-wrapper ./a.out. Output is in: gmon.out.mpi.intel.*

Batchjobs

sq 

SLURM queue and partition status

sq [-aCrvx] [-c list] [-S sortkey1,sortkey2,...] [[-F] Filter] 
-a: all clusters (default: SuperMUC-NG)
-A: show account instead of user
-c: cluster1,cluster2,... (see above)
-C: show name of user's batch script
-D: show dependencies
-e: extra output filed (squeue --Format=...)
-r: show each array job separately
-x: extended summary per user and per cluster
-p: extended partition status
-P: very extended partition status
-X: only partition status
-S: sort columns (default: STATUS,NODES)
sortkeys(list): JOBID,STATUS,USER,GROUP,ACCOUNT,NODES,MEMORY,TIME_LIMIT,PRIORITY,TIME_USED,START_TIME

Workflow

pexec

Parallel execution of a list of serial tasks. Usage together with Intel-MPI. cmdfile contains the serial command to be executed.
pexec does load balancing.

cat  cmdfile
./mytask <input.$SUBJOB >out.$SUBJOB 2>err.$SUBJOB
./mytask <input.$SUBJOB >out.$SUBJOB 2>err.$SUBJOB
./mytask <input.$SUBJOB >out.$SUBJOB 2>err.$SUBJOB
./another_task <inputx >outputx
mpiexec -n 64 pexec cmdfile
prsync

Generate and execute commands for parallel rsync on many nodes

# generate the commands, make the directory structure, rsync the data
prsync -f $SCRATCH/mydata -t $WORK/RESULTS/Experiment1 # sequential
source $HOME/.lrz_parallel_rsync/MKDIR # sequential
# best to execute on several nodes, not just on one
mpiexec -n 64 $HOME/.lrz_parallel_rsync/RSYNCS # parallel
msrsync

Multi-stream rsync on one node

#use 48 tasks on one node
msrsync -p 48 $SCRATCH/mydata $WORK/RESULTS/Experiment1

Programming Environment

I_MPIDisplays sorted list of the current settings of the Intel MPI Environment

Back and Archive 

pdsmcpdsmc - A helper tool for enabling parallel tape retrievals

LRZ Library

module load lrztools

Contains useful subroutines and functions. Compile with:

  • Fortran: mpif90 -nofor-main ... -I $LRZ_INCLUDE ... $LRZLIB_LIB
  • C/C++: mpicc ... -I $LRZ_INCLUDE   ....  $LRZLIB_LIB
Function/Subroutine
C and FORTRAN
Purpose, details and usage
int getpid (void)
INTER GETPID
Returns the process ID
int gettid (void)
INTER GETTID
Returns the thread ID
int where_do_i_run(void);
INTEGER WHERE_DO_I_RUN()
Returns the physical CPU ID where the task/thread is running
double dwalltime()
Double dcputime()
REAL(KIND=8) DWALLTIME()
REAL(KIND=8) DCPUTIME()
Returns the wallclock time/cputime spent between first and current call to the routine

MEMUSAGE(avail, used, free, buffers, cached)
integer(kind=8):: avail, used, free, buffers, cached

void memusage(size_t *avail, size_t *used, size_t *free, size_t *buffers, size_t *cached)

Returns in kb
  • Total available Memory
  • Used Memory
  • Free Memory
  • Memory used for buffers (raw blocks of the disks)
  • Memory used for file caching of the file systems

In case your code is written in C++ you have to encapsulate the memusage function header. For that, you have to, after the includes section and before the main function code, add the following preprocessor directive:

extern "C"   {  void memusage(size_t *, size_t *, size_t *, size_t *, size_t *);    }

void place_task_(int cpu[],int *n);
INTEGER CPU(N)
PLACE_TASK(CPU,N)
Sets the mask, that the current task will run on the physical CPUs contained in the array CPU.
void(place_all_tasks(int *debug)

LOGICAL DEBUG
PLACE_ALL_TASKS(DEBUG)
Places the tasks and thread on particular CPUs, whether by default algorithm or by using the environment variable CPU_MAPPING. Example:
CPU_MAPPING=0,2,4,8,10,12
OMP_NUM_THREADS=3
MP_NODES=8
mpiexec -n 16 ./a.out
If DEBUG is TRUE or 1 then information about the placement is output.
void place_task_(int cpu[],int *n);
INTEGER CPU(N)
PLACE_TASK(CPU,N)
Sets the mask, that the current task will run on the physical CPUs contained in the array CPU.
void placementinfo()
PLACEMENTINFO()

Outputs information about the placement of tasks and threads.

Programs written in C must link with (Intel) fortran:
mpif90 -nofor-main -qopenmp -I $LRZ_INCLUDE ... main.c $LRZLIB_LIB