- For details, please also read the Linux-Cluster subchapters!
- Status of Linux Cluster: High Performance Computing
Overview of cluster specifications and limits
Cluster specifications | Limits | |||||
---|---|---|---|---|---|---|
Slurm cluster | Slurm partition | Nodes | Node range | Maximum | Maximum running (submitted) jobs per user | Memory limit |
Cluster system: CoolMUC-2 (28-way Haswell-EP nodes with Infiniband FDR14 interconnect and 2 hardware threads per physical core) | ||||||
cm2 | cm2_large | 404 (overlapping | 25 - 64 | 48 | 2 (30) | 56 per node |
cm2_std | 3 - 24 | 72 | 4 (50) | |||
cm2_tiny | cm2_tiny | 300 | 1 - 4 | 72 | 10 (50) | |
serial | serial_std | 96 (overlapping | 1 - 1 | 96 | dynamically adjusted (250) | |
serial_long | 1 - 1 | > 72 (currently 480) | ||||
inter | cm2_inter | 12 | 1 - 12 | 2 | 1 (2) | |
cm2_inter_large_mem | 6 | 1 - 6 | 96 | 1 (2) | 120 per node | |
Cluster system: Teramem (HP DL580 shared memory system, in total 96 physical cores, each physical core has 2 hyperthreads) | ||||||
inter | teramem_inter | 1 | 1 - 1 (up to 64 logical cores) | 240 | 1 (2) | approx. 60 per physical core |
Cluster system: CoolMUC-3 (64-way Knight's Landing 7210F nodes with Intel Omnipath 100 interconnect and 4 hardware threads per physical core) | ||||||
mpp3 | mpp3_batch | 145 | 1 - 32 | 48 | 50 (dynamically adjusted | approx. 90 DDR plus 16 HBM per node |
inter | mpp3_inter | 3 | 1 - 3 | 2 | 1 (2) |
Overview of job processing
Slurm partition | Cluster- / Partition-specific | Typical job type | Common/Exemplary Slurm commands for job management via squeue (show waiting/running jobs), |
---|---|---|---|
cm2_large | --clusters=cm2 | squeue -M cm2 -u $USER | |
cm2_std | --clusters=cm2 | ||
cm2_tiny | --clusters=cm2_tiny | squeue -M cm2_tiny -u $USER | |
serial_std | --clusters=serial | Shared use of compute nodes among users! | squeue -M serial -u $USER |
serial_long | --clusters=serial | ||
cm2_inter | --clusters=inter | Do not run production jobs! | squeue -M inter -u $USER |
cm2_inter_large_mem | --clusters=inter |
| |
teramem_inter | --clusters=inter | ||
mpp3_inter | --clusters=inter | Do not run production jobs! | |
mpp3_batch | --clusters=mpp3 | squeue -M mpp3 -u $USER |
Submit hosts
Submit hosts are usually login nodes that permit to submit and manage batch jobs.
Cluster segment | Submit hosts | Remarks |
---|---|---|
CooLMUC-2 | lxlogin1, lxlogin2, lxlogin3, lxlogin4 | |
CooLMUC-3 | lxlogin8, lxlogin9 | lxlogin9 is accessible from lxlogin8 via ssh mpp3-login9 lxlogin9 is KNL architecture. Thus, it can be used to build software for CoolMUC-3. |
Teramem | lxlogin8 |
However, note that cross-submission of jobs to other cluster segments is also possible. The only thing you need to take care of is that different cluster segments support different instructions sets, so you need to make sure that your software build produces the appropriate binary that can execute on the targeted cluster segment.
Documentation of SLURM
- SLURM Workload Manger (commands and links to examples).
- Available SLURM clusters and features
- Guidelines for resource selection
- Running parallel jobs on the Linux-Cluster
- Running serial jobs on the Linux-Cluster