Page tree
Skip to end of metadata
Go to start of metadata
Page Content


Overview of clusters, limits and job processing


For details, please also read the Linux-Cluster subchapters!

Cluster specificationsLimits

Cluster- / Partition-specific
Slurm job settings

Typical job type

Common/Exemplary Slurm commands for job management via

squeue (show waiting/running jobs),
scancel (abort job),
sacct (show details on waiting, running, finished jobs)

Slurm
cluster
Slurm
partition

Nodes
in partition

Node range
per job
min - max
Maximum
runtime
(hours)

Maximum
running
(submitted)
jobs
per user

Memory limit
(GByte)
Cluster system: CoolMUC-2 (28-way Haswell-EP nodes with Infiniband FDR14 interconnect and 2 hardware threads per physical core)



cm2



cm2_large

404

(overlapping
partitions)

25 - 6448

2

(30)











56

per node





--clusters=cm2
--partition=cm2_large
--qos=cm2_large
squeue -M cm2 -u $USER
scancel -M cm2 <JOB-ID>
sacct -M cm2 -X -u $USER --starttime=2021-01-01T00:00:01
cm2_std3 - 2472

4

(50)

--clusters=cm2
--partition=cm2_std
--qos=cm2_std
cm2_tinycm2_tiny3001 - 472

10

(50)

--clusters=cm2_tiny
squeue -M cm2_tiny -u $USER
scancel -M cm2_tiny <JOB-ID>
sacct -M cm2_tiny -X -u $USER --starttime=2021-01-01T00:00:01



serial

serial_std

96

(overlapping
partitions)

1 - 196

dynamically
adjusted
depending on
workload

(250)

--clusters=serial
--partition=serial_std
--mem=<memory_per_node>MB

Shared use of compute nodes among users!
Default memory = 
memnode / Ncores_node

squeue -M serial -u $USER
scancel -M serial <JOB-ID>
sacct -M serial -X -u $USER --starttime=2021-01-01T00:00:01
serial_long1 - 1> 72
(currently
480)
--clusters=serial
--partition=serial_long
--mem=<memory_per_node>MB

inter

cm2_inter121 - 42

1

(2)

--clusters=inter
--partition=cm2_inter

Do not run production jobs!

squeue -M inter -u $USER
scancel -M inter <JOB-ID>
sacct -M inter -X -u $USER --starttime=2021-01-01T00:00:01
Cluster systemTeramem (HP DL580 shared memory system, in total 96 physical cores, each physical core has 2 hyperthreads)

inter

teramem_inter1

1 - 1
(up to 64 logical
cores)

240

1

(2)

approx. 60
per
physical core

--clusters=inter
--partition=teramem_inter
Cluster system: CoolMUC-3 (64-way Knight's Landing 7210F nodes with Intel Omnipath 100 interconnect and 4 hardware threads per physical core)
intermpp3_inter31 - 32

1

(2)


approx. 90 DDR
plus 16 HBM
per node

--clusters=inter
--partition=mpp3_inter

Do not run production jobs!

mpp3mpp3_batch1451 - 3248

50

(dynamically
adjusted
depending on
workload)

--clusters=mpp3
--partition=mpp3_batch
squeue -M mpp3 -u $USER
scancel -M mpp3 <JOB-ID>
sacct -M mpp3 -X -u $USER --starttime=2021-01-01T00:00:01

Submit hosts

Submit hosts are usually login nodes that permit to submit and manage batch jobs.

Cluster segmentSubmit hosts
CooLMUC-2lxlogin1, lxlogin2, lxlogin3, lxlogin4
CooLMUC-3lxlogin8, lxlogin9
Teramemlxlogin8
IvyMUClxlogin10

However note that cross-submission of jobs to other cluster segments is also possible. The only thing you need to take care of is that different cluster segments support different instructions sets, so you need to make sure that your software build produces the appropriate binary that can execute on the targeted cluster segment.


  • No labels