CoolMUC-4

This page contains preliminary information which might be subject to change!

Start of User Operation

CoolMUC-4 has been put into operation. Please check the timeline with detailed steps:

https://status.lrz.de/issues/linux-cluster/20241024_cm4_announcement/

Access

There are two login nodes available, which can be reached through a load balancer, which is named cool.hpc.lrz.de. Assignment to the least loaded Linux Cluster login node is then made automatically.
Access to the CoolMUC-4 login nodes is then granted via:

ssh -Y cool.hpc.lrz.de -l xxyyyzz

For details, please refer to Access and Login to the Linux-Cluster

New System, New (Old) Rules

Basic Rules of Job Processing on the Linux Cluster

  • Choose the compute resources of the Linux Cluster with great care! Although the CoolMUC-4 nodes have many more CPU cores than the old CoolMUC-2 system, the total number of nodes has been reduced significantly!
  • Select the cluster segments which fit your needs! Documentation and guidelines will be updated!
  • Do not waste/misuse resources! For example, If you intend to run a job on a single CPU core only, then the "serial" cluster segment is your choice. Running such a job on the "cm4_tiny" partition would block an entire compute node. 111 out of 112 available cores would do nothing!
  • Important Note: Any jobs requesting less then 56 CPU cores should be directed into the serial queue. cm4_tiny/std queues are reserved for consequently and efficiently parallelized workloads.
  • Linux Cluster system administrators are monitoring the CoolMUC-4 system! If resources are used inappropriately, we get in touch with users for further consulting on the issue.
  • Please do not run compute jobs on the login nodes! Read more here: Usage Policy on Login Nodes
  • In recurring cases of misuse, access to the HPC system might be blocked!

Hardware Architecture

Login Nodes

Architecture

Number of
physical cores

MemoryOperating system
Intel(R) Xeon(R) Platinum 8380 CPU (Ice Lake)801 TBSLES 15 SP6

Compute Nodes

ArchitectureNumber
of nodes
Number of
cores per node
Total number
of cores
Memory
per node
Operating
system
local temporary file system (attached to the node)temporary file system (across all nodes)Remarks
Intel(R) Xeon(R) Platinum 8380 CPU (Ice Lake)6804801 TBSLES 15 SP61.7 TB via "/tmp" (SSD)$SCRATCH_DSS

New CoolMUC-4

Usage: Job Processing on the Linux-Cluster

Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids)10611211872512 GB
Intel(R) Xeon(R) Platinum 8360HL CPU (Cooper Lake)196966 TB

Already existing Large Memory Teramem System

Usage: Job Processing on the Linux-Cluster

Default File Systems

On login nodes and compute nodes, users have access to:

  • DSS HOME ($HOME): Users of CoolMUC-2/-3 will keep their Home directory. No need to transfer data to the new system!
  • Temporary file system ($SCRATCH_DSS): This is the same temporary file system as it was previously used on CoolMUC-2/-3.

Queues and Job Processing

Please refer to Job Processing on the Linux-Cluster

Software and Module Environment

The current default software stack is named spack/23.1.0. Users can switch to other available software stacks as required, e.g. spack/22.2.1).
It is important to note, that on CoolMUC-4 Intel compiler, MPI and MKL modules are no longer loaded by default. Users can activate the Intel environment by inserting the following commands in their corresponing SLURM scripts:

module load intel
module load intel-mpi
module load intel-mkl

Other versions of the Intel software are available in the spack stack and can be loaded by users depending on their requirements.
Further details are coming soon.