Example parallel job scripts on the Linux-Cluster

Introductory remarks

The job scripts for SLURM partitions are provided as templates which you can adapt for your own settings. In particular, you should account for the following points:

  • Some entries are placeholders, which you must replace with correct, user-specific settings. In particular, path specifications must be adapted. Always specify the appropriate directories instead of the names with the three periods in the following examples!

  • For recommendations on how to do large-scale I/O please refer to the description of the file systems available on the cluster. It is recommended to keep executables within your HOME file system, in particular for parallel jobs. The example jobs reflect this, assuming that files are opened with relative path names from within the executed program.

  • Because you usually have to work with the environment modules package in your batch script, sourcing the file /etc/profile.d/modules.sh is included in the example scripts.



Shared Memory jobs

This job type uses a single shared memory node of the designated SLURM partition. Parallelization can be achieved either via (POSIX) thread programming or directive-based OpenMP programming.

In the following, example scripts for starting an OpenMP program are provided. Please note that these scripts are usually not useful for MPI applications; scripts for such programs are given in subsequent sections.

On the CoolMUC-2 cluster

#!/bin/bash
#SBATCH -J job_name
#SBATCH -o ./%x.%j.%N.out
#SBATCH -D ./
#SBATCH --get-user-env
#SBATCH --clusters=cm2_tiny 
#SBATCH --partition=cm2_tiny
#SBATCH --nodes=1-1
#SBATCH --cpus-per-task=28 
# 56 is the maximum reasonable value for CoolMUC-2 
#SBATCH --export=NONE
#SBATCH --time=08:00:00 
module load slurm_setup
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

./my_openmp_program.exe 

On the CoolMUC-3 cluster

#!/bin/bash
#SBATCH -J job_name
#SBATCH -o ./%x.%j.%N.out
#SBATCH -D ./
#SBATCH --get-user-env
#SBATCH --clusters=mpp3
#SBATCH --nodes=1-1
#SBATCH --cpus-per-task=64 
# 256 is the maximum reasonable value for CoolMUC-3 
#SBATCH --export=NONE
#SBATCH --time=08:00:00 
module load slurm_setup
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

./my_openmp_program.exe 

MPI jobs

For MPI documentation please consult the MPI page on the LRZ web server. On current cluster systems, Intel MPI is used as the default environment.

MPI jobs may be jobs that use MPI only for parallelization ("MPP-style"), or jobs that combine usage of MPI and OpenMP ("hybrid")

On the CoolMUC-2 cluster

CoolMUC-2 MPP-style jobCoolMUC-2 hybrid MPI+OpenMP job
#!/bin/bash
#SBATCH -J job_name
#SBATCH -o ./%x.%j.%N.out
#SBATCH -D ./
#SBATCH --get-user-env
#SBATCH --clusters=cm2
#SBATCH --partition=cm2_std 
#SBATCH --qos=cm2_std 
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=28 
#SBATCH --export=NONE
#SBATCH --time=08:00:00
 
module load slurm_setup

mpiexec -n $SLURM_NTASKS ./my_mpi_program.exe

The example will start 224 MPI tasks distributed over 8 nodes.

#!/bin/bash
#SBATCH -J job_name
#SBATCH -o ./%x.%j.%N.out
#SBATCH -D ./
#SBATCH --get-user-env
#SBATCH --clusters=cm2
#SBATCH --partition=cm2_std 
#SBATCH --qos=cm2_std 
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=4
#SBATCH --export=NONE
#SBATCH --time=08:00:00
 
module load slurm_setup
export OMP_NUM_THREADS=7
mpiexec -n $SLURM_NTASKS ./my_hybrid_program.exe
  • The example will start 32 MPI tasks with 7 threads each. Each node has 28 cores, so 4 tasks are started per host.
  • Exploitation of hyperthreads via OpenMP could be done by setting OMP_NUM_THREADS to 14
CoolMUC-2 MPP-style TINY jobCoolMUC-2 MPP-style LARGE job

Important:

  1. one or two nodes can be used


#!/bin/bash
#SBATCH -J job_name
#SBATCH -o ./%x.%j.%N.out
#SBATCH -D ./
#SBATCH --get-user-env
#SBATCH --clusters=cm2_tiny
#SBATCH --partition=cm2_tiny
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=28 
#SBATCH --export=NONE
#SBATCH --time=08:00:00
 
module load slurm_setup

mpiexec -n $SLURM_NTASKS ./my_mpi_program.exe

Important:

  1. the --qos flag must be added!
  2. more than 24 nodes can be used


#!/bin/bash
#SBATCH -J job_name
#SBATCH -o ./%x.%j.%N.out
#SBATCH -D ./
#SBATCH --get-user-env
#SBATCH --clusters=cm2
#SBATCH --partition=cm2_large
#SBATCH --qos=cm2_large
#SBATCH --nodes=32
#SBATCH --ntasks-per-node=28 
#SBATCH --export=NONE
#SBATCH --time=08:00:00
 
module load slurm_setup

mpiexec -n $SLURM_NTASKS ./my_mpi_program.exe

Notes

  • A setup as for the hybrid job can also serve to provide more memory per MPI task without using OpenMP (e.g., by setting OMP_NUM_THREADS=1). Note that this will leave cores unused!
  • Very small jobs (1-2 nodes) must use cm2_tiny instead, very large jobs (25-64 nodes) must use cm2_large.

On the CoolMUC-3 cluster

CoolMUC-3 MPP-style job

use physical cores only

CoolMUC-3 hybrid job

use physical cores only

CoolMUC-3 hybrid job

use hyperthreads

#!/bin/bash
#SBATCH -J job_name
#SBATCH -o ./%x.%j.%N.out
#SBATCH -D ./
#SBATCH --get-user-env
#SBATCH --clusters=mpp3 
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=64 
#SBATCH --export=NONE
#SBATCH --time=08:00:00
 
module load slurm_setup

mpiexec -n $SLURM_NTASKS ./my_mpi_program.exe

The example will start 512 MPI tasks.

#!/bin/bash
#SBATCH -J job_name
#SBATCH -o ./%x.%j.%N.out
#SBATCH -D ./
#SBATCH --get-user-env
#SBATCH --clusters=mpp3 
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=16 
#SBATCH --constraint=quad,cache
#SBATCH --export=NONE
#SBATCH --time=08:00:00
module load slurm_setup
export OMP_NUM_THREADS=4
mpiexec -n $SLURM_NTASKS ./my_hybrid_program.exe

The example will start 128 MPI tasks with 4 threads each.

#!/bin/bash
#SBATCH -J job_name
#SBATCH -o ./%x.%j.%N.out
#SBATCH -D ./
#SBATCH --get-user-env
#SBATCH --clusters=mpp3 
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=16 
#SBATCH --constraint=quad,cache
#SBATCH --export=NONE
#SBATCH --time=08:00:00
module load slurm_setup
export OMP_NUM_THREADS=16
mpiexec -n $SLURM_NTASKS ./my_hybrid_program.exe

The example will start 128 MPI tasks with 16 threads each; 4 hyperthreads per core are used.

Notes

  • starting more than 64 MPI tasks per KNL node is likely to cause startup failures.
  • the --constraint option supplied in some of the scripts below is a suggestion. See the KNL features documentation for more details.

General comments

  • For some software packages, it is also possible to use SLURM's own srun command; this will however not work well in all situations for programs compiled against Intel MPI.
  • It is also possible to use the --ntasks keyword in combination with --cpus-per-task to configure parallel jobs; this specification replaces the --nodes/--tasks-per-node combination given in the scripts above.

Pfeil nach oben


Special job configurations

Job Farming (starting multiple serial jobs on a shared memory system)

Please use this with care! If the serial jobs are imbalanced with respect to run time, this usage pattern can waste CPU resources. At LRZ's discretion, unbalanced jobs may be removed forcibly. The example job script illustrates how to start up multiple serial jobs within a shared memory parallel SLURM script. Note that the various subdirectories subdir_1, ..., subdir_28 must exist and contain the needed input data.

Multi-Serial Example using a single node
#!/bin/bash
#SBATCH -J job_name
#SBATCH -o ./%x.%j.%N.out
#SBATCH -D ./
#SBATCH --get-user-env
#SBATCH --clusters=cm2_tiny
#SBATCH --nodes=1
#SBATCH --export=NONE
#SBATCH --time=08:00:00 
module load slurm_setup

MYPROG=path_to_my_exe/my_serial_program.exe

# Start as many background serial jobs as there are cores available on the node
for ((i=1; i<=$SLURM_NTASKS; i++)); do 
  cd subdir_${i} 
  $MYPROG & 
  cd .. 
done 
wait # for completion of background tasks

For more complex setups, please read the detailed job farming document (it is in the SuperMUC-NG section, but for the most part it applies for the Cluster environment as well).