Resource limits for serial jobs on Linux Cluster
This subdocument contains a description of constraints under which serial jobs execute on the cluster systems: maximum run times, maximum memory and other SLURM-imposed parameters.
Resource Limits
The following is an overview of the resource limits imposed for various classes of jobs. These are comprised of run time limits, limits on core counts for shared memory jobs, and memory limits. Please consult the SLURM specifications subdocument for more details, in particular how to correctly specify memory requirements.
- The designation "shared memory" assumes that a number of cores assigned by SLURM will be used by threads; typically a command like export OMP_NUM_THREADS=<number> should be issued to achieve this.
- If a job appears to not use resources properly, it may get deleted at LRZ staff's or surveillance system's discretion.
Cluster / Partition | Architecture | Core counts and remarks | Run time limit (hours) | Memory limit (GByte) |
---|---|---|---|---|
--clusters=serial --partition=serial_std | 28-way Haswell-EP node | 1 core (effectively more if large memory is specified) | 96 | 2 GByte (for 1 core) |
--clusters=inter --partition=teramem_inter | 192-way HP DL580 shared memory node | up to 96 logical cores can be specified. Generally, a memory specification should be provided as well using the --mem submission option. | 96 (default 8) | 6000 GByte |
--clusters=inter --partition=cm4_inter_large_mem | 80-way Ice Lake node | 1 core (effectively more if large memory is specified) | 96 | 6 GByte (for 1 core) |
--clusters=serial --partition=serial_long | 28-way Haswell-EP node | 1 core (effectively more if large memory is specified) | 480 | 2 GByte (for 1 core) |
Resource Limits on housed clusters
The clusters and partitions described in this section are only available to institutes that have a housing contract with LRZ.
Cluster/Partition | Architecture | Core counts and remarks | Run time limit (hours) | Memory limit (GByte) |
---|---|---|---|---|
--clusters=tum_geodesy --partition=tum_geodesy_std | 28-way Haswell-EP node | 1 core (effectively more if large memory is specified). Access is restricted to users from the TUM geodesy chairs. | 240 | 2 GByte (for 1 core) |
--clusters=lcg --partition=lcg_serial | 28-way Haswell-EP node 40-way Cascade Lake node | 1 core (effectively more if large memory is specified). Access is restricted to users from LMU high energy physics. | 96 | 64 -180 GByte (complete node) |
--clusters=htso --partition=htso_std | 80-way Ice Lake node | 1 core (effectively more if large memory is specified). Access is restricted. | 168 | 9 GByte (for 1 core) |
--clusters=hlai --partition=hlai_std | 80-way Ice Lake node | 1 core (effectively more if large memory is specified). Access is restricted. | 168 | 6 GByte (for 1 core) |
--clusters=httc --partition=httc_std | 80-way Ice Lake node | 1 core (effectively more if large memory is specified). Access is limited. | 960 | 3 GByte (for 1 core) |
--clusters=httc --partition=httc_high_mem | 80-way Ice Lake node | 1 core (effectively more if large memory is specified). Access is restricted. | 960 | 3 GByte (for 1 core) |
--clusters=biohpc_gen --partition=biohpc_gen_highmem | 40-way Sky lake Node | 1 cpu (effectively more if large memory is specified). Access is limited. | 504 | 4-40 GByte (for 1 cpu) |
--clusters=biohpc_gen --partition=biohpc_gen_production | 40-way Sky lake Node | 1 cpu (effectively more if large memory is specified). Access is limited. | 336 | 4-40 GByte (for 1 cpu) |
--clusters=biohpc_gen --partition=biohpc_gen_normal | 40-way Sky lake Node | 1 cpu (effectively more if large memory is specified). Access is limited. | 48 | 4-40 GByte (for 1 cpu) |
--clusters=biohpc_gen --partition=biohpc_gen_inter | 40-way Sky lake Node | 1 cpu (effectively more if large memory is specified). Access is restricted. | 12 | 4-40 GByte (for 1 cpu) |
--clusters=htce --partition=htce_short | 40-way Cascade Lake node | 1 core (effectively more if large memory is specified). Access is restricted. | 5 | 9 GByte (for 1 core) |
--clusters=htce --partition=htce_long | 40-way Cascade Lake node | 1 core (effectively more if large memory is specified). Access is restricted. | 336 | 9-19 GByte (for 1 core) |
--clusters=htce --partition=htce_all | 40-way Cascade Lake node | 1 core (effectively more if large memory is specified). Access is restricted. | 72 | 9-19 GByte (for 1 core) |
--clusters=htce --partition=htce_special | 40-way Cascade Lake node | 1 core (effectively more if large memory is specified). Access is restricted. | 120 | 9 GByte (for 1 core) |
--clusters=c2pap --partition=c2pap_serial | 28-way Haswell-EP node | 1 core (effectively more if large memory is specified). Access is restricted. | 48 | 2 GByte (for 1 core) |
--clusters=c2pap --partition=c2pap_preempt | 28-way Haswell-EP node | 1 core (effectively more if large memory is specified). Access is restricted. | 48 | 2 GByte (for 1 core) |
Policies for interactive jobs
- serial interactive program runs are not started via SLURM. Such runs should be kept short; anything running longer than 30 minutes should be submitted as a scripted batch job
- Submission of serial jobs is supported on the login nodes lxlogin5, lxlogin6 and lxlogin7
Policies for queued batch jobs
General restrictions
- The job name should not exceed 10 characters. If no job name is specified, please do not use excessively long script names.
- Do not use the xargs command to generate command line arguments at submission time. Instead, generate any necessary arguments inside your script.
Jobs in Hold
- Jobs in user hold will be removed at the LRZ administrators' discretion if older than 8 weeks.
Job Submissions
- There exists a maximum number of jobs that can be submitted by a user into a serial queue. This limit may change over time, dependent on the cluster load.
- Submission of large numbers of jobs (>100, including array jobs) with very short run time (< 1min) is considered a misuse of resources. It causes both waste of computational resources and - if mail notifications are used - disruption of the notification system. Users that submit such jobs will be banned from further use of the batch system. Bundle the individual jobs into a much bigger one!
Memory use
- Jobs exceeding the physical memory available on the selected node(s) will be removed, either by SLURM itself, or the OOM ("out of memory") killer in the operating system kernel, or at LRZ's discretion since such a usage typically has a negative impact on system stability.