Resource limits for serial jobs on Linux Cluster

This subdocument contains a description of constraints under which serial jobs execute on the cluster systems: maximum run times, maximum memory and other SLURM-imposed parameters.

Resource Limits

The following is an overview of the resource limits imposed for various classes of jobs. These are comprised of run time limits, limits on core counts for shared memory jobs, and memory limits. Please consult the SLURM specifications subdocument for more details, in particular how to correctly specify memory requirements.

The designation "shared memory" assumes that a number of cores assigned by SLURM will be used by threads; typically a command like export OMP_NUM_THREADS=<number> should be issued to achieve this.
If a job appears to not use resources properly, it may get deleted at LRZ staff's or surveillance system's discretion.

Cluster / Partition	Architecture	Core counts and remarks	Run time limit (hours)	Memory limit (GByte)
--clusters=serial --partition=serial_std	28-way Haswell-EP node	1 core (effectively more if large memory is specified)	96	2 GByte (for 1 core)
--clusters=inter --partition=teramem_inter	192-way HP DL580 shared memory node	up to 96 logical cores can be specified. Generally, a memory specification should be provided as well using the --mem submission option.	96 (default 8)	6000 GByte
--clusters=inter --partition=cm4_inter_large_mem	80-way Ice Lake node	1 core (effectively more if large memory is specified)	96	6 GByte (for 1 core)
--clusters=serial --partition=serial_long	28-way Haswell-EP node	1 core (effectively more if large memory is specified)	480	2 GByte (for 1 core)

Resource Limits on housed clusters

The clusters and partitions described in this section are only available to institutes that have a housing contract with LRZ.

Cluster/Partition	Architecture	Core counts and remarks	Run time limit (hours)	Memory limit (GByte)
--clusters=tum_geodesy --partition=tum_geodesy_std	28-way Haswell-EP node	1 core (effectively more if large memory is specified). Access is restricted to users from the TUM geodesy chairs.	240	2 GByte (for 1 core)
--clusters=lcg --partition=lcg_serial	28-way Haswell-EP node 40-way Cascade Lake node	1 core (effectively more if large memory is specified). Access is restricted to users from LMU high energy physics.	96	64 -180 GByte (complete node)
--clusters=htso --partition=htso_std	80-way Ice Lake node	1 core (effectively more if large memory is specified). Access is restricted.	168	9 GByte (for 1 core)
--clusters=hlai --partition=hlai_std	80-way Ice Lake node	1 core (effectively more if large memory is specified). Access is restricted.	168	6 GByte (for 1 core)
--clusters=httc --partition=httc_std	80-way Ice Lake node	1 core (effectively more if large memory is specified). Access is limited.	960	3 GByte (for 1 core)
--clusters=httc --partition=httc_high_mem	80-way Ice Lake node	1 core (effectively more if large memory is specified). Access is restricted.	960	3 GByte (for 1 core)
--clusters=biohpc_gen --partition=biohpc_gen_highmem	40-way Sky lake Node	1 cpu (effectively more if large memory is specified). Access is limited.	504	4-40 GByte (for 1 cpu)
--clusters=biohpc_gen --partition=biohpc_gen_production	40-way Sky lake Node	1 cpu (effectively more if large memory is specified). Access is limited.	336	4-40 GByte (for 1 cpu)
--clusters=biohpc_gen --partition=biohpc_gen_normal	40-way Sky lake Node	1 cpu (effectively more if large memory is specified). Access is limited.	48	4-40 GByte (for 1 cpu)
--clusters=biohpc_gen --partition=biohpc_gen_inter	40-way Sky lake Node	1 cpu (effectively more if large memory is specified). Access is restricted.	12	4-40 GByte (for 1 cpu)
--clusters=htce --partition=htce_short	40-way Cascade Lake node	1 core (effectively more if large memory is specified). Access is restricted.	5	9 GByte (for 1 core)
--clusters=htce --partition=htce_long	40-way Cascade Lake node	1 core (effectively more if large memory is specified). Access is restricted.	336	9-19 GByte (for 1 core)
--clusters=htce --partition=htce_all	40-way Cascade Lake node	1 core (effectively more if large memory is specified). Access is restricted.	72	9-19 GByte (for 1 core)
--clusters=htce --partition=htce_special	40-way Cascade Lake node	1 core (effectively more if large memory is specified). Access is restricted.	120	9 GByte (for 1 core)
--clusters=c2pap --partition=c2pap_serial	28-way Haswell-EP node	1 core (effectively more if large memory is specified). Access is restricted.	48	2 GByte (for 1 core)
--clusters=c2pap --partition=c2pap_preempt	28-way Haswell-EP node	1 core (effectively more if large memory is specified). Access is restricted.	48	2 GByte (for 1 core)

Policies for interactive jobs

serial interactive program runs are not started via SLURM. Such runs should be kept short; anything running longer than 30 minutes should be submitted as a scripted batch job
Submission of serial jobs is supported on the login nodes lxlogin5, lxlogin6 and lxlogin7

Policies for queued batch jobs

General restrictions

The job name should not exceed 10 characters. If no job name is specified, please do not use excessively long script names.
Do not use the xargs command to generate command line arguments at submission time. Instead, generate any necessary arguments inside your script.

Jobs in Hold

Jobs in user hold will be removed at the LRZ administrators' discretion if older than 8 weeks.

Job Submissions

There exists a maximum number of jobs that can be submitted by a user into a serial queue. This limit may change over time, dependent on the cluster load.
Submission of large numbers of jobs (>100, including array jobs) with very short run time (< 1min) is considered a misuse of resources. It causes both waste of computational resources and - if mail notifications are used - disruption of the notification system. Users that submit such jobs will be banned from further use of the batch system. Bundle the individual jobs into a much bigger one!

Memory use

Jobs exceeding the physical memory available on the selected node(s) will be removed, either by SLURM itself, or the OOM ("out of memory") killer in the operating system kernel, or at LRZ's discretion since such a usage typically has a negative impact on system stability.