CoolMUC-4
This page contains preliminary information which might be subject to change!
Start of User Operation
CoolMUC-4 has been put into operation. Please check the timeline with detailed steps:
https://status.lrz.de/issues/linux-cluster/20241024_cm4_announcement/
Access
There are two login nodes available, which can be reached through a load balancer, which is named cool.hpc.lrz.de. Assignment to the least loaded Linux Cluster login node is then made automatically.
Access to the CoolMUC-4 login nodes is then granted via:
ssh -Y cool.hpc.lrz.de -l xxyyyzz |
For details, please refer to Access and Login to the Linux-Cluster
New System, New (Old) Rules
Basic Rules of Job Processing on the Linux Cluster
- Choose the compute resources of the Linux Cluster with great care! Although the CoolMUC-4 nodes have many more CPU cores than the old CoolMUC-2 system, the total number of nodes has been reduced significantly!
- Select the cluster segments which fit your needs! Documentation and guidelines will be updated!
- Do not waste/misuse resources! For example, If you intend to run a job on a single CPU core only, then the "serial" cluster segment is your choice. Running such a job on the "cm4_tiny" partition would block an entire compute node. 111 out of 112 available cores would do nothing!
- Important Note: Any jobs requesting less then 56 CPU cores should be directed into the serial queue. cm4_tiny/std queues are reserved for consequently and efficiently parallelized workloads.
- Linux Cluster system administrators are monitoring the CoolMUC-4 system! If resources are used inappropriately, we get in touch with users for further consulting on the issue.
- Please do not run compute jobs on the login nodes! Read more here: Usage Policy on Login Nodes
- In recurring cases of misuse, access to the HPC system might be blocked!
Hardware Architecture
Login Nodes
Architecture | Number of | Memory | Operating system |
---|---|---|---|
Intel(R) Xeon(R) Platinum 8380 CPU (Ice Lake) | 80 | 1 TB | SLES 15 SP6 |
Compute Nodes
Architecture | Number of nodes | Number of cores per node | Total number of cores | Memory per node | Operating system | local temporary file system (attached to the node) | temporary file system (across all nodes) | Remarks |
---|---|---|---|---|---|---|---|---|
Intel(R) Xeon(R) Platinum 8380 CPU (Ice Lake) | 6 | 80 | 480 | 1 TB | SLES 15 SP6 | 1.7 TB via "/tmp" (SSD) | $SCRATCH_DSS | New CoolMUC-4 |
Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) | 106 | 112 | 11872 | 512 GB | ||||
Intel(R) Xeon(R) Platinum 8360HL CPU (Cooper Lake) | 1 | 96 | 96 | 6 TB | Already existing Large Memory Teramem System |
Default File Systems
On login nodes and compute nodes, users have access to:
- DSS HOME ($HOME): Users of CoolMUC-2/-3 will keep their Home directory. No need to transfer data to the new system!
- Temporary file system ($SCRATCH_DSS): This is the same temporary file system as it was previously used on CoolMUC-2/-3.
Queues and Job Processing
Please refer to Job Processing on the Linux-Cluster
Software and Module Environment
The current default software stack is named spack/23.1.0. Users can switch to other available software stacks as required, e.g. spack/22.2.1).
It is important to note, that on CoolMUC-4 Intel compiler, MPI and MKL modules are no longer loaded by default. Users can activate the Intel environment by inserting the following commands in their corresponing SLURM scripts:
module load intel module load intel-mpi module load intel-mkl
Other versions of the Intel software are available in the spack stack and can be loaded by users depending on their requirements.
Further details are coming soon.