Resource limits for parallel jobs on Linux Cluster

This subdocument contains a description of constraints under which parallel jobs execute on the cluster systems: maximum run times, maximum memory and other SLURM-imposed parameters.

Resource limits for interactive jobs

Notes:

  • Please do not use resources in this partition to run regular production jobs! This partition is meant for testing!
  • A given user account cannot run more than one job at a time.

Partition

Core counts and remarks

Run time limit (hours)

Memory limit (GBytes)

interactive nodes on CooLMUC-2

Maximum number of nodes in a job: 4

2

(default is 15 minutes)

56 per node


interactive nodes on CooLMUC-3

Maximum number of nodes in a job: 3

2

(default is 15 minutes)

~90 DDR per node, plus 16 HBM per node

Resource Limits for batch jobs

The following is an overview of the resource limits imposed for various classes of jobs. These are comprised of run time limits, limits on core counts for parallel jobs, and memory limits. Please consult the SLURM specifications subdocument for a more detailed explanation of parallel environments, in particular how to correctly specify memory requirements. With respect to run time limits it is recommended to always specify a target run time via the --time switch; this in particular for smaller jobs may allow the scheduler to perform backfilling. 

  • The designation "shared memory" for parallel jobs assumes that a number of cores assigned by SLURM will be used by threads; typically a command like export OMP_NUM_THREADS=<number> should be issued to achieve this.
  • The designation "distributed memory" for parallel jobs assumes that MPI is used to start one single-threaded MPI task per core assigned by SLURM. In principle it is also possible to run hybrid MPI + threaded programs, in which case the number of cores assigned by the system will be equal to the product (# of MPI tasks) * (# of threads), rounded up if necessary.

Job Type

SLURM Cluster

SLURM Partition

Node range

Run time limit (hours)

Memory limit (GByte)

CoolMUC-2: 28-way Haswell-EP nodes with Infiniband FDR14 interconnect and 2 hardware threads per physical core
(see also example job scripts)

Small distributed memory parallel (MPI) job

--clusters=cm2_tiny

--partition=cm2_tiny


1-4

72

56 per node

Standard distributed memory parallel (MPI) job

--clusters=cm2

--partition=cm2_std


3-24

72

56 per node

Large distributed memory parallel (MPI) job

--clusters=cm2

--partition=cm2_large


25-64

48

56 per node

Shared memory parallel job

--clusters=cm2_tiny

--partition=cm2_tiny1

72

56

CoolMUC-3: 64-way Knight's Landing 7210F nodes with Intel Omnipath 100 interconnect and 4 hardware threads per physical core
(see also example job scripts)

Distributed memory parallel job

--clusters=mpp3

--partition=mpp3_batch (optional)

1-32

48

~90 DDR per node, plus 16 HBM per node

 Teramem: HP DL580 shared memory system
(see also example job scripts)
Shared memory thread-parallel job--clusters=inter

Specify

--partition=teramem_inter

as well as the number of cores needed by the executable(s)  to be started.

1

(up to 64 logical cores)

48

(default 8)

~60 per physical core

(each physical core has 2 hyperthreads)

If a job appears to not use resources properly, it will be terminated at LRZ staff's or surveillance system's discretion.

Resource limits on housed systems

The clusters and partitions listed in this section are only available for institutes that have a housing contract with LRZ.

Job Type

Architecture

Core counts and remarks

Run time limit (hours)

Memory limit (GByte)

Distributed memory parallel (MPI) jobs28-way Haswell-EP nodes with Infiniband FDR14 interconnect

Please specify the cluster

--clusters=tum_chem

and one of the partitions

--partition=[tum_chem_batch, tum_chem_test]

Up to 392 core jobs are possible (56 in the test queue).

Dedicated to TUM Chemistry.

 

384

(test queue: 12)

2 per task (in MPP mode, using 1 physical core/task)
Distributed memory parallel (MPI) jobs28-way Haswell-EP nodes with Infiniband FDR14 interconnect

Please specify the cluster

--clusters=hm_mech

Up to 336 core jobs are possible (if hyperthreading is exploited, double that number)

Dedicated to Hochschule München Mechatronics

336

18 per task (in MPP mode, using 1 physical core/task)
Serial or shared memory jobs28-way Haswell-EP nodes with Ethernet interconnect

Please specify the cluster

--clusters=tum_geodesy

Dedicated to TUM Geodesy

240

2 per task / 60 per node
Shared memory parallel jobIntel- or AMD-based shared memory systems

Please specify the cluster

--clusters=myri

as well as one of the partitions

--partition=myri_[p,u]

Dedicated to TUM Mathematics

144

3.9 per core

Details on Policies

Policies for interactive jobs

Limitations

  • On login shells, parallel programs should not be started directly. Please always use the salloc command to initialize a time-limited interactive parallel environment. Note that the shell initialized by the salloc command will still run on the login node, but executables started with srun (or mpiexec) will be started up on the interactive partitioned which was assigned.

Policies for queued batch jobs

General restrictions

  • The job name should not exceed 10 characters. If no job name is specified, please do not use excessively long script names.
  • Do not use the xargs command to generate command line arguments at submission time. Instead, generate any necessary arguments inside your script.

Scheduling

  • For parallel jobs, it is recommended to explicitly specify the run time limit. This may shorten the waiting time, since the job might be run in backfill mode (in other words: use resources that are free while the scheduler tries to fit another large job into the system). Your specification gives the scheduler the information required to organize this.

Jobs in Hold

Jobs in user hold will be removed at the LRZ administrators' discretion if older than 8 weeks.

Job Submissions

  • Submission of large numbers of jobs (>100, including array jobs) with very short run time (< 1min) is considered a misuse of resources. It causes both waste of computational resources and - if mail notifications are used - disruption of the notification system. Users that submit such jobs will be banned from further use of the batch system. Bundle the individual jobs into a much bigger one!
  • There are maximum numbers of jobs that can be submitted by a user. These limits are different for each cluster and may change over time, depending on the cluster load.


ClusterLimit on job submissionLimit on running jobs
inter21
rvs11
mpp3unlimited50
serial250100
cm2_tiny5010
cm2504 / 2 *

*4 jobs on cm2_std, 2 jobs on cm2_large

Memory use

  • Jobs exceeding the physical memory available on the selected node(s) will be removed, either by SLURM itself, or the OOM ("out of memory") killer in the operating system kernel, or at LRZ's discretion since such a usage typically has a negative impact on system stability.

Limits on queued jobs

  • In order to prevent monopolization of the clusters by a single user, a limit of 50 queued jobs is imposed on both CooLMUC-2 and CooLMUC-3 These limits may change over time, dependent on the cluster load.

Software licenses

  • Many commercial software packages have been licensed for usage on the cluster; most of these require the use of so-called floating licenses, only a limited amount of which are typically available. Since it is not possible to check whether a license is available before a batch job starts, LRZ cannot provide any guarantees that a batch job requesting use of such a license will run successfully.