Resource limits for serial jobs on Linux Cluster

This subdocument contains a description of constraints under which serial jobs execute on the cluster systems: maximum run times, maximum memory and other SLURM-imposed parameters.

Resource Limits

The following is an overview of the resource limits imposed for various classes of jobs. These are comprised of run time limits, limits on core counts for shared memory jobs, and memory limits. Please consult the SLURM specifications subdocument for more details, in particular how to correctly specify memory requirements. 

  • The designation "shared memory" assumes that a number of cores assigned by SLURM will be used by threads; typically a command like export OMP_NUM_THREADS=<number> should be issued to achieve this.
  • If a job appears to not use resources properly, it may get deleted at LRZ staff's or surveillance system's discretion.

Cluster / Partition

Architecture

Core counts and remarks

Run time limit (hours)

Memory limit (GByte)

--clusters=serial

--partition=serial_std

28-way Haswell-EP node

1 core (effectively more if large memory is specified)

96

2 GByte (for 1 core)

--clusters=inter

--partition=teramem_inter

192-way HP DL580 shared memory node

up to 96 logical cores can be specified. Generally, a memory specification should be provided as well using the --mem submission option.

96 (default 8)

6000 GByte

--clusters=inter

--partition=cm4_inter_large_mem

80-way Ice Lake node

1 core (effectively more if large memory is specified)

96

6 GByte (for 1 core)

--clusters=serial

--partition=serial_long

28-way Haswell-EP node1 core (effectively more if large memory is specified)

480

2 GByte (for 1 core)

Resource Limits on housed clusters

The clusters and partitions described in this section are only available to institutes that have a housing contract with LRZ.

Cluster/Partition

Architecture

Core counts and remarks

Run time limit (hours)

Memory limit (GByte)

--clusters=tum_geodesy

--partition=tum_geodesy_std

28-way Haswell-EP node

1 core (effectively more if large memory is specified). Access is restricted to users from the TUM geodesy chairs.

240

2 GByte (for 1 core)

--clusters=lcg

--partition=lcg_serial

28-way Haswell-EP node

40-way Cascade Lake node

1 core (effectively more if large memory is specified). Access is restricted to users from LMU high energy physics.

96

64 -180 GByte  (complete node)

--clusters=htso

--partition=htso_std

80-way Ice Lake node

1 core (effectively more if large memory is specified). Access is restricted.

168

9 GByte (for 1 core)

--clusters=hlai

--partition=hlai_std

80-way Ice Lake node

1 core (effectively more if large memory is specified). Access is restricted.

168

6 GByte (for 1 core)

--clusters=httc

--partition=httc_std

80-way Ice Lake node

1 core (effectively more if large memory is specified). Access is limited.

960

3 GByte (for 1 core)

--clusters=httc

--partition=httc_high_mem

80-way Ice Lake node

1 core (effectively more if large memory is specified). Access is restricted.

960

3 GByte (for 1 core)

--clusters=biohpc_gen

--partition=biohpc_gen_highmem

40-way Sky lake Node

1 cpu (effectively more if large memory is specified). Access is limited.

504

4-40 GByte (for 1 cpu)

--clusters=biohpc_gen

--partition=biohpc_gen_production

40-way Sky lake Node

1 cpu (effectively more if large memory is specified). Access is limited.

336

4-40 GByte (for 1 cpu)

--clusters=biohpc_gen

--partition=biohpc_gen_normal

40-way Sky lake Node

1 cpu (effectively more if large memory is specified). Access is limited.

48

4-40 GByte (for 1 cpu)

--clusters=biohpc_gen

--partition=biohpc_gen_inter

40-way Sky lake Node

1 cpu (effectively more if large memory is specified). Access is restricted.

12

4-40 GByte (for 1 cpu)

--clusters=htce

--partition=htce_short

40-way Cascade Lake node

1 core (effectively more if large memory is specified). Access is restricted.

5

9 GByte (for 1 core)

--clusters=htce

--partition=htce_long

40-way Cascade Lake node

1 core (effectively more if large memory is specified). Access is restricted.

336

9-19 GByte (for 1 core)

--clusters=htce

--partition=htce_all

40-way Cascade Lake node

1 core (effectively more if large memory is specified). Access is restricted.

72

9-19 GByte (for 1 core)

--clusters=htce

--partition=htce_special

40-way Cascade Lake node

1 core (effectively more if large memory is specified). Access is restricted.

120

9 GByte (for 1 core)

--clusters=c2pap

--partition=c2pap_serial

28-way Haswell-EP node

1 core (effectively more if large memory is specified). Access is restricted.

48 

2 GByte (for 1 core)

--clusters=c2pap

--partition=c2pap_preempt

28-way Haswell-EP node

1 core (effectively more if large memory is specified). Access is restricted.

48

2 GByte (for 1 core)

Policies for interactive jobs

  • serial interactive program runs are not started via SLURM. Such runs should be kept short; anything running longer than 30 minutes should be submitted as a scripted batch job
  • Submission of serial jobs is supported on the login nodes lxlogin5, lxlogin6 and lxlogin7

Policies for queued batch jobs

General restrictions

  • The job name should not exceed 10 characters. If no job name is specified, please do not use excessively long script names.
  • Do not use the xargs command to generate command line arguments at submission time. Instead, generate any necessary arguments inside your script.

Jobs in Hold

  • Jobs in user hold will be removed at the LRZ administrators' discretion if older than 8 weeks.

Job Submissions

  • There exists a maximum number of jobs that can be submitted by a user into a serial queue. This limit may change over time, dependent on the cluster load.
  • Submission of large numbers of jobs (>100, including array jobs) with very short run time (< 1min) is considered a misuse of resources. It causes both waste of computational resources and - if mail notifications are used - disruption of the notification system. Users that submit such jobs will be banned from further use of the batch system. Bundle the individual jobs into a much bigger one!

Memory use

  • Jobs exceeding the physical memory available on the selected node(s) will be removed, either by SLURM itself, or the OOM ("out of memory") killer in the operating system kernel, or at LRZ's discretion since such a usage typically has a negative impact on system stability.