Decommisioned CoolMUC-2
A new cluster segment, “CooLMUC-2,” with the same processor and cooling technology as SuperMUC Phase 2, was installed and commissioned in two stages. The first part of the application (state-funded large-scale equipment pursuant to Article 143c of the German Constitution) was already approved in 2014. Installation of the first subsystem began in December 2014, and user operation commenced in May 2015. In addition to additional computing nodes, the installation of Phase 2 (large research equipment pursuant to Art. 91b GG) also included six adsorption chillers installed in collaboration with Sortech AG to use the waste heat from the computers to generate cooling capacity. The use of improved technologies and control engineering enables reliable year-round cooling of the remaining air-cooled components of phase 2 of SuperMUC. The final expansion of CooLMUC-2 ranks 261st on the TOP500 list, November 2015 edition. In addition, the provision of a GPFS-based high-performance file system as scratch storage has eliminated a growing bottleneck in the processing of I/O-heavy computing jobs. CooLMUC-2 replaces the previous “CooLMUC-1” cluster, which is scheduled to be decommissioned in 2016. The Nehalem-based sgi ICE cluster was already taken out of user operation in the summer of 2015. This was followed in the fall by the decommissioning of the existing serial cluster; accordingly, a subset of the nodes was moved to CooLMUC-2.
The first steps toward integrating the future big data infrastructure with the HPC systems were taken with the connection of the CooLMUC-2 login nodes to the Data Science Storage (DSS). In addition to productive computing operations, the CoolMUC-2 system also served as a research object for innovative and energy-efficient cooling concepts. In addition to the hot water cooling system that had been established at the LRZ for years, it also had six adsorption chillers from Sortech. These made it possible to generate cooling from the waste heat of the computer nodes with low electrical energy consumption, which was used to cool the storage system of SuperMUC Phase2. The technology proved to be very reliable and efficient: in 2016, an average of 120 kW of waste heat at 45°C was used to generate approximately 50 kW of cooling at 21°C. The coefficient of performance of the overall system was 12. This means that 1 kW of electrical energy had to be expended for every 12 kW of cooling capacity. This made the adsorption chillers about three times more efficient than traditional compressor-based chillers.
After 9 years of operation, the CoolMUC-2 system was shut-off Friday, 13.12.2024.
CoolMUC-2: System Overview
Hardware | |
Number of nodes | 812 |
Cores per node | 28 |
Hyperthreads per core | 2 |
Core nominal frequency | 2.6 GHz |
Memory (DDR4) per node | 64 GB (Bandwidth 120 GB/s - STREAM) |
Bandwidth to interconnect per node | 13,64 GB/s (1 Link) |
Bisection bandwidth of interconnect (per island) | 3.5 TB/s |
Latency of interconnect | 2.3 µs |
Peak performance of system | 1400 TFlop/s |
Infrastructure | |
Electric power of fully loaded system | 290 kVA |
Percentage of waste heat to warm water | 97% |
Inlet temperature range for water cooling | 30 … 50 °C |
Temperature difference between outlet and inlet | 4 … 6 °C |
Software (OS and development environment) | |
Operating system | SLES15 SP1 Linux |
MPI | Intel MPI 2019, alternatively OpenMPI |
Compilers | Intel icc, icpc, ifort 2019 |
Performance libraries | MKL, TBB, IPP |
Tools for performance and correctness analysis | Intel Cluster Tools |
Overview of cluster specifications and limits
Cluster specifications | Limits | |||||
---|---|---|---|---|---|---|
Slurm cluster | Slurm partition | Nodes | Node range | Maximum | Maximum running (submitted) jobs per user | Memory limit |
Cluster system: CoolMUC-2 (28-way Haswell-EP nodes with Infiniband FDR14 interconnect and 2 hardware threads per physical core) | ||||||
cm2 | cm2_large | 404 (overlapping | 25 - 64 | 48 | 2 (30) | 56 per node |
cm2_std | 3 - 24 | 72 | 4 (50) | |||
cm2_tiny | cm2_tiny | 288 | 1 - 4 | 72 | 10 (50) | |
serial | serial_std | 96 (overlapping | 1 - 1 | 96 | dynamically adjusted (250) | |
serial_long | 1 - 1 | > 72 (currently 480) | ||||
inter | cm2_inter | 12 | 1 - 12 | 2 | 1 (2) | |
cm2_inter_large_mem | 6 | 1 - 6 | 96 | 1 (2) | 120 per node | |
Cluster system: HPDA LRZ Cluster (80-way Ice Lake nodes, 2 hardware threads per physical core) | ||||||
inter | cm4_inter_large_mem | 9 | 1 - 1 | 96 | 1 (2) | 1000 per node |
Cluster system: Teramem (single-node shared-memory system, 4 x Intel Xeon Platinum 8360HL, in total 96 physical cores, 2 hyperthreads per physical core, 6 TB memory) | ||||||
inter | teramem_inter | 1 | 1 - 1 (up to 64 logical cores) | 240 | 1 (2) | approx. 60 |
Cluster system: CoolMUC-3 (64-way Knight's Landing 7210F nodes with Intel Omnipath 100 interconnect and 4 hardware threads per physical core) | ||||||
mpp3 | mpp3_batch | 145 | 1 - 32 | 48 | 50 (dynamically adjusted | approx. 90 DDR plus 16 HBM per node |
inter | mpp3_inter | 3 | 1 - 3 | 2 | 1 (2) |
Overview of job processing
Slurm partition | Cluster- / Partition-specific | Typical job type | Recommended | Common/Exemplary Slurm commands for job management via squeue (show waiting/running jobs), |
---|---|---|---|---|
cm2_large | --clusters=cm2 |
| lxlogin1 lxlogin2 lxlogin3 lxlogin4 | squeue -M cm2 -u $USER |
cm2_std | --clusters=cm2 |
| ||
cm2_tiny | --clusters=cm2_tiny |
| squeue -M cm2_tiny -u $USER | |
serial_std | --clusters=serial |
Shared use of compute nodes among users! | squeue -M serial -u $USER | |
serial_long | --clusters=serial | |||
cm2_inter | --clusters=inter |
Do not run production jobs! | squeue -M inter -u $USER | |
cm2_inter_large_mem | --clusters=inter |
| ||
cm4_inter_large_mem | --clusters=inter |
| lxlogin5 | |
teramem_inter | --clusters=inter |
| lxlogin[1...4] lxlogin8 | |
mpp3_inter | --clusters=inter |
Do not run production jobs! | lxlogin8 | |
mpp3_batch | --clusters=mpp3 |
| squeue -M mpp3 -u $USER |