SLURM Workload Manager

List of SLURM commands

The batch system at LRZ is the open-source workload manager SLURM (Simple Linux Utility for Resource management). You must submit a job script to SLURM, which will find and allocate the resources required for your job (e.g. the compute nodes to run your job on). The following table provides an overview of SLURM commands used to submit and manage jobs; on the system, a man page is also supplied for each command.

CommandFunctionality
sbatchsubmit a job script
salloccreate an interactive SLURM shell.
srun

execute argument command on the resources assigned to a job. Note: must be executed inside an active job (script or interactive environment); in most cases, using mpiexec (which in turn uses srun for startup) is the preferred alternative.

squeueprint table of submitted jobs and their state. Note: non-privileged users can only see their own jobs.
sinfoprovide overview of cluster status
scontrolquery and modify SLURM state
sviewGUI for viewing and managing resources and jobs (see below)

SLURM is deployed on both the SuperMUC-NG system and the Linux Clusters. For most commands on the Linux Clusters, it is necessary to supply a -M or --cluster= option that specifies for which (sub)cluster the command applies.

sbatch Command / #SBATCH option

Batch job options and resources can be given as command line switches to sbatch (in which case they override script-provided values), or they can be embedded into a SLURM job script as a comment line of the form

#SBATCH <option>=<value>

Examples / How to set up jobs

General options (Shell, Execution Path, Mail, Output etc.)

Note: specifying environment variables in the command section of a SLURM job is not supported.

SLURM option

Functionality

Remarks

-D <directory>

Start job in specified directory.

If this is not specified, the working directory at submission time will be the starting directory of the job. If the specified directory does not exist, the job will not run.

--mail-user=<mail_addr>

User's e-mail address.

Obligatory so LRZ can contact you in case of problems. Batch requests which do not specify an e-mail address will not be processed.

--mail-type=[BEGIN|END|FAIL|REQUEUE|ALL|NONE]

Batch system sends e-mail when [starting|ending|aborting|requeuing] job.

If ALL is specified, each state change will produce a mail. If NONE is specified, no mails will be sent out.

-J <req_name>

Name of batch request.

Default is the name of the script, or "sbatch" if the script is read from standard input.

Please do not use more than 10 characters here!

-o <filename>
-e <filename>

write standard output to specified file.
write standard error to specified file.

LRZ recommends specifying the full path name. Default value is slurm-%j.out, where %j stands for the job ID. Apart from %j, the output specification can also contain %N, which is replaced by the master node name the job is initiated on. 

You can also specify an absolute or relative directory in the filename. However, the directory must exist, otherwise the job will not run.

--export=NONE
or
--export=ALL
or
--export var1=val1,var2=val2,...

exports designated environment variables into job script

Please use with care: It is recommended to specify NONE and load all needed environment modules in the script. Conversely, if you use "ALL", it is probably a good idea to not load the module environment via /etc/profile.d/modules.sh in the script section.
Specifying ALL makes it nearly impossible to debug errors in your script because LRZ cannot reproduced your environment at the time of job

If only the name of a variable is specified, the existing value is used in the SLURM script at run time. For more than one variable, a comma-separated list must be specified. Example:
--export= PATH,MYVAR=xxyy,LD_LIBRARY_PATH.

Job Control, Limits and Resource Requirements

SLURM option

Functionality

Remarks

--hold

batch request is kept in User Hold (meaning it is queued but not initiated, because its priority is set to zero)

A  held job can be released using scontrol to reset its priority (e.g. "scontrol update jobid=<id> priority=1".

--time=x:y:z

specify limit for job run time as x hours y minutes z seconds. All three fields must be provided.

This is recommended if you want an improved chance of putting a short or medium-length parallel job through the system using the backfill mechanism of the SLURM scheduler. The maximum run time value imposed by LRZ cannot be exceeded using this option.

--mem=<memory in MByte>

explicit memory requirement on a per-node basis

The total memory available for the job will be the number of nodes multiplied with the --mem value.

For serial jobs, please always specify --mem. Reason: If you only need a small amount of memory, the total job throughput of the cluster may be considerably improved.

--nodes=<number>[-<number>]range for number of nodes to be usedOnly for parallel jobs. A range can be specified.
--ntasks=<number>number of (MPI) tasks to be started

by default the number of allocated nodes is equal to the number of tasks divided by the number of physical cores in one node.

Only for parallel jobs

--ntasks-per-core=<number>number of MPI tasks assigned to a physical coreThis can be used to overcommit resources. Specifying this will normally be only useful on a system with hyperthreaded cores and sufficient memory per core.

Should only be specified if --ntasks is used.

--ntasks-per-node=<number>number of MPI tasks assigned to a nodeThis can be used to overcommit resources. Specifying this will normally be only useful on a system with hyperthreaded cores and sufficient memory per core.

Should only be specified if --ntasks is used.

--cpus-per-task=<number>assign a number of virtual CPUs to each started task

this can be specified if either the per-task memory requirement is very large, or if a hybrid (e.g.,  MPI+OpenMP) program should be run; in the latter case, the number specified should correspond to OMP_NUM_THREADS.
When hyperthreading is enabled on the node, use --cpus-per-task=1 if you cannot take advantage of hyperthreading

--overcommit Overcommit resources.

--requeue / --no-requeue

Identifies the ability of a job to be rerun or not in case of a node failure. If the switch --no-requeue is set, the job will not be rerun.

Default is --requeue

--clusters=[all | cluster_name]specify a cluster to inspect or submit to. You can also use the -M option (with the same modifiers) instead.Please use this option in scripted batch jobs. A full list of available clusters is provided via

sinfo --clusters=all

or

sinfo -M all

--partition=<partition_name>

Select the SLURM partition in which the job shall be executed.

Each SLURM cluster contains one or more partitions (with possibly different resource settings) from which a suitable one should be selected by name.

--dependency=<dependency_list>

Defer start of job until dependency conditions are fulfilled. The dependency list may be one (or more, via a comma separated list) of the following:

after:job_id[:job_id ...]  (start up once specified jobs have started execution)

afterany:job_id[:job_id ...] (start up once specified jobs have terminated)

afternotok:job_id[:job_id ...] (start up once specified jobs have terminated with a failure)

afterok:job_id[:job_id ...] (start up once specified jobs have terminated successfully)

singleton  (start up once any previously submitted job with the same name and job user has terminated)

Example: the command

sbatch --dependency=afterok:3712,3722 ./myjob.cmd

will put a job into the queue that will only start if the jobs with ID 3712 and 3722 in the same SLURM cluster have successfully completed. These jobs may even belong to a different user.

--cpu-freq=<value>

Specify the core frequency with which a job should execute. The form the value takes is

p1[-p2[:p3]]

Please consult the sbatch man page for details.

Example: issuing the command

sbatch --cpu-freq=1100 ./myjob.cmd

will execute the job at 1.1 GHz core frequency if this is supported.

Note: unsupported settings will generate an error message, but the job will execute anyway at some other setting.

-C (or --constraint=)Issue a feature request on the nodes of jobAvailable features depend on the cluster used. A description is supplied below. Only one -C option should be supplied, but a comma-separated list of entries can be given as an argument.

Environment Variables

The environment at submission time is exported into the job for interactive jobs, and also into scripted jobs if the --export=ALL option is specified.

The following additional, SLURM-specific variables are available inside a job:

SLURM_JOBIDA unique job identifier assigned by SLURM. The jobname set by -N.
SLURM_JOB_NODELISTString containing a coded version of the list of nodes assigned to the job
SLURM_JOB_NUM_NODESThe number of compute nodes assigned to the parallel job.
SLURM_NTASKS_PER_NODEThe number of tasks to start per node
SLURM_NTASKSThe total number of tasks available for the job

This list is quite incomplete. Please consult the SLURM man pages for more information.

SLURM Exit Codes

When a signal was responsible for a job or step's termination, the signal number will be displayed after the exit code, delineated by a colon(:).

Slurm displays a job's exit code in the output of the scontrol show job and the sview utility. Slurm displays job step exit codes in the output of the scontrol show step and the sview utility.

  • For sbatch jobs, the exit code that is captured is the output of the batch script.
  • For salloc jobs, the exit code will be the return value of the exit call that terminates the salloc session.
  • For srun, the exit code will be the return value of the command that srun executes.
  • Exit codes triggered by the user application depend on specific compilers, e.g. List of Intel Fortran Run-Time Error Messages

Details about: SLURM exit codes.

A GUI for job management

The command sview is available to inspect and modify jobs via a graphical user interface:

sview  sview_2

  • To identify your jobs among the many ones in the list, select either the "specific user's jobs" or the "job ID" item from the menu "Actions Y Search"
  • By right-clicking on a job of yours and selecting "Edit job" in the context menu, you can obtain a window which allows to modify the job settings. Please be careful about committing your changes.