This page contains preliminary information which might be subject to change!

Start of User Operation

CoolMUC-4 has been put into operation. Please check the timeline with detailed steps:

https://status.lrz.de/issues/linux-cluster/20241024_cm4_announcement/

Access

There are two login nodes available, which can be reached through a load balancer, which is named cool.hpc.lrz.de. Assignment to the least loaded Linux Cluster login node is then made automatically.
Access to the CoolMUC-4 login nodes is then granted via:

ssh -Y cool.hpc.lrz.de -l xxyyyzz

For details, please refer to Access and Login to the Linux-Cluster

New System, New (Old) Rules

Basic Rules of Job Processing on the Linux Cluster

Choose the compute resources of the Linux Cluster with great care! Although the CoolMUC-4 nodes have many more CPU cores than the old CoolMUC-2 system, the total number of nodes has been reduced significantly!
Select the cluster segments which fit your needs! Documentation and guidelines will be updated!
Do not waste/misuse resources! For example, If you intend to run a job on a single CPU core only, then the "serial" cluster segment is your choice. Running such a job on the "cm4_tiny" partition would block an entire compute node. 111 out of 112 available cores would do nothing!
Important Note: Any jobs requesting less then 56 CPU cores should be directed into the serial queue. cm4_tiny/std queues are reserved for consequently and efficiently parallelized workloads.
Linux Cluster system administrators are monitoring the CoolMUC-4 system! If resources are used inappropriately, we get in touch with users for further consulting on the issue.
Please do not run compute jobs on the login nodes! Read more here: Usage Policy on Login Nodes
In recurring cases of misuse, access to the HPC system might be blocked!

Hardware Architecture

Login Nodes

Architecture	Number of login nodes	Number of physical cores	Memory	Operating system
Intel(R) Xeon(R) Platinum 8380 CPU (Ice Lake)	2	80	1 TB	SLES 15 SP6

Compute Nodes

Architecture

Number
of nodes

Number of
cores per node

Total number
of cores

Memory
per node

Operating
system

local temporary file system (attached to the node)

temporary file system (across all nodes)

Remarks

Intel(R) Xeon(R) Platinum 8380 CPU (Ice Lake)

12

80

480

1 TB

SLES 15 SP6

1.7 TB via "/tmp" (SSD)

$SCRATCH_DSS

New CoolMUC-4

Usage: Job Processing on the Linux-Cluster

Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids)

106

112

11872

512 GB

Intel(R) Xeon(R) Platinum 8360HL CPU (Cooper Lake)

1

96

6 TB

Already existing Large Memory Teramem System

Usage: Job Processing on the Linux-Cluster

Default File Systems

On login nodes and compute nodes, users have access to:

DSS HOME ($HOME): Users of CoolMUC-2/-3 will keep their Home directory. No need to transfer data to the new system!
Temporary file system ($SCRATCH_DSS): This is the same temporary file system as it was previously used on CoolMUC-2/-3.

Queues and Job Processing

Please refer to Job Processing on the Linux-Cluster

Software and Module Environment

The current default software stack is named spack/23.1.0. Users can switch to other available software stacks as required, e.g. spack/22.2.1).
It is important to note, that on CoolMUC-4 Intel compiler, MPI and MKL modules are no longer loaded by default. Users can activate the Intel environment by inserting the following commands in their corresponing SLURM scripts:

module load intel
module load intel-mpi
module load intel-mkl

Other versions of the Intel software are available in the spack stack and can be loaded by users depending on their requirements.
Further details are coming soon.

Sapphire Rapids