lrztools and lrzlib on SuperMUC-NG

The tools and library provide liittle helper functions.

LRZ Tools

To get access to the LRZ Tools insert the following command

module load lrztools

The following commands are available

ComandPurpose, details and usage

Information

lrz-budget_and_quotaDisplays the CPU time budget and the file system quotas
lrz-cpufreqShows the distribution of cpu frequencies of the node. No arguments needed.
lrz-full_quotaDisplays the used resources and quota of all directories which are accessible for a user

lrz-one_per_node -m

Reports the available memory of the nodes in a batch job.

lrz-one_per_node -s

Reports the memory statistics of the nodes in a batch job.

lrz-ps

Shows selected columns of the ps command, which reflect memory, cpu usage and pinning of processes and threads.

lrz-snapshot

Shows the available snapshots file or directory. Note, this only works on files and dirs inside your HOME.

Utilities external to LRZ

gdu, ncdu 

gdu and ncdu are similar to the `du` command in linux, but with graphical interfaces.

CPUINFO

Shows groupings for package, cpu, core, and cache sharing.

ugrep

Similar to grep with added functionality

time

Get time spent as User or as System. Use \time to see other metrics.

Placement of processes and threads

lrz-where_do_i_runReturns the CPU-ID on which the command was run
placementtest-mpi.intel
placementtest-omp
Returns information how processes and threads are placed on nodes and CPUs.
Example:
mpiexec -n 5 placementtest-mpi.intel -o 2
lrz-mask2hexconverts a binary mask to hex: mask2hex 11111111000000001111111 â†’ FF00FF.
can be used for processor lists

Performance tools

gprof-wrapperfor Intel MPI: mpiexec gprof-wrapper ./a.out. Output is in: gmon.out.mpi.intel.*

Batchjobs

lrz-sq 

SLURM queue and partition status

lrz-sq [-aCrvx] [-c list] [-S sortkey1,sortkey2,...] [[-F] Filter] 
-a: all clusters (default: SuperMUC-NG)
-A: show account instead of user
-c: cluster1,cluster2,... (see above)
-C: show name of user's batch script
-D: show dependencies
-e: extra output filed (squeue --Format=...)
-r: show each array job separately
-x: extended summary per user and per cluster
-p: extended partition status
-P: very extended partition status
-X: only partition status
-S: sort columns (default: STATUS,NODES)
sortkeys(list): JOBID,STATUS,USER,GROUP,ACCOUNT,NODES,MEMORY,TIME_LIMIT,PRIORITY,TIME_USED,START_TIME

lrz-sq-run

Shows selected columns of your jobs submitted to the queue.

Workflow

pexec

Parallel execution of a list of serial tasks. Usage together with Intel-MPI. cmdfile contains the serial command to be executed.
pexec does load balancing.

cat  cmdfile
./mytask <input.$SUBJOB >out.$SUBJOB 2>err.$SUBJOB
./mytask <input.$SUBJOB >out.$SUBJOB 2>err.$SUBJOB
./mytask <input.$SUBJOB >out.$SUBJOB 2>err.$SUBJOB
./another_task <inputx >outputx
mpiexec -n 64 pexec cmdfile
prsync

Generate and execute commands for parallel rsync on many nodes

# generate the commands, make the directory structure, rsync the data
prsync -f $SCRATCH/mydata -t $WORK/RESULTS/Experiment1 # sequential
source $HOME/.lrz_parallel_rsync/MKDIR # sequential
# best to execute on several nodes, not just on one
mpiexec -n 64 $HOME/.lrz_parallel_rsync/RSYNCS # parallel
msrsync

Multi-stream rsync on one node

#use 48 tasks on one node
msrsync -p 48 $SCRATCH/mydata $WORK/RESULTS/Experiment1

Programming Environment

lrz-parallel_envDisplays sorted list of the current settings of the Intel MPI Environment

LRZ Library

module load lrztools

Contains useful subroutines and functions. Compile with:

  • Fortran: mpif90 -nofor-main ... -I $LRZ_INCLUDE ... $LRZLIB_LIB
  • C/C++: mpicc ... -I $LRZ_INCLUDE   ....  $LRZLIB_LIB
Function/Subroutine
C and Fortran
Purpose, details and usage

int lrz_getpid();

Returns the process ID

int lrz_gettid();

Returns the thread ID

int lrz_where_do_i_run();

Returns the physical CPU ID where the task/thread is running

double lrz_dwalltime();

double lrz_dcputime();

Returns the wallclock time/cputime spent between first and current call to the routine

void lrz_memusage(int *avail,

        int *used,

        int *free, 

        int *buffers

        int *cached);


Returns in kb
  • Total available Memory
  • Used Memory
  • Free Memory
  • Memory used for buffers (raw blocks of the disks)
  • Memory used for file caching of the file systems

In case your code is written in C++ you have to encapsulate the memusage function header. For that, you have to, after the includes section and before the main function code, add the following preprocessor directive:

extern "C"   {  void memusage(size_t *, size_t *, size_t *, size_t *, size_t *);    }

void lrz_place_task(int cpu[],

        int *n);

Sets the mask, that the current task will run on the physical CPUs contained in the array CPU.

void lrz_place_all_tasks(int *idbg);

Places the tasks and thread on particular CPUs, whether by default algorithm or by using the environment variable CPU_MAPPING. Example:
CPU_MAPPING=0,2,4,8,10,12
OMP_NUM_THREADS=3
MP_NODES=8
mpiexec -n 16 ./a.out
If idbg is TRUE or 1 then information about the placement is output.

void lrz_place_info();

Outputs information about the placement of tasks and threads.

Programs written in C must link with (Intel) fortran:
mpif90 -nofor-main -qopenmp -I $LRZ_INCLUDE ... main.c $LRZLIB_LIB