High Performance Computing

<< Zurück zur Dokumentationsstartseite

High Performance Computing

 

Forgot your Password? click here
Add new user (only for SuperMUC-NG)?
click here

Add new IP(only for SuperMUC-NG)?
click here
How to write good LRZ Service Requests? click here

How to setup two-factor authentication (2FA) on HPC systems? click here


New: Virtual HPC Lounge to ask question and get advice. Every Wednesday, 2:00pm - 3:00pm

System Status (see also: Access and Overview of HPC Systems)

GREEN = fully operational YELLOW = operational with restrictions (see messages below) RED = not available = see messages below



 

Höchstleistungsrechner RUNNING
(SuperMUC-NG)


login nodes: skx.supermuc.lrz.de UP

Partitions/Queues: 
TEST MICRO GENERAL LARGE FAT

login nodes: pvc.supermuc.lrz.de UP

Partitions/Queues: 
 TEST GENERAL LARGE

HOME WORK SCRATCH DSS DSA

SuperMUC-NG Phase 2 only: DAOS

Further documentation

Submit an Incident Ticket for the SuperMUC-NG

Add new user? click here

Add new IP? click here

Questions about 2FA on SuperMUC-NG? click here


 

Linux Cluster RUNNING


CoolMUC-4

login nodes: cool.hpc.lrz.de

UP


serial partition serial_std

serial partition serial_long

parallel partitions cm4_ (tiny | std)

interactive partition: cm4_inter

MOSTLY UP

UP

UP

UP



 

teramem_inter

UP


Housing Clusters
(Access restricted to owners/users)

kcs

PARTIALLY UP

 

biohpc

MOSTLY UP

 

hpda

UP

 

File Systems

HOME
SCRATCH_DSS
DSS
DSA

UP
UP
UP
UP


 

Detailed node status
Detailed queue status



Details:

Submit an Incident Ticket for the Linux Cluster 

 

Messages

Messages for all HPC Systems

Currently users of the ANSYS software receive (misleading) messages, that the ANSYS licenses will expire soon. This is not the case. Only the currently used ANSYS license keys expire by 31. October 2025. We will receive new license keys exactly by the 31. October (not earlier) and the ANSYS licenses will be provided to all eligible ANSYS license users without any interruption.

The new ANSYS Release 2025.R2 has been installed, tested and rolled-out on SuperMUC-NG Phase 1 and CoolMUC-4. ANSYS 2025.R2 has been made the new default ANSYS Release on those systems and the LRZ documentation has been updated in due course.
In case of any related observations, please file an LRZ Service Request. The LRZ download portal for the ANSYS software will be updated with the new ANSYS installation files asap.
It might be worth mentioning, that with ANSYS Release 2025.R2 the Rocky DEM solver is supporting for the first time SuSE Linux Enterprise Server operating system, i.e. SLES 15 SP4,5,6.

The new Siemens PLM Release of StarCCM+ 2025.2.1 (= 2506.0001 = v20.04.008) has been installed, tested and rolled-out on SuperMUC-NG Phase 1 and CoolMUC-4. StarCCM+ 2025.2.1 has been made the new default StarCCM+ Release on those systems and the LRZ documentation has been updated in due course. At the same time the old StarCCM+ versions 2023.x.1, x=1,2,3 have been deprecated and removed from the LRZ HPC systems.
In case of any related observations, please file an LRZ Service Request.

SuperMUC-NG

Höchstleistungsrechner on LRZ Service Status
Ankündigungen und Vorfälle
[i] Shut-Down of HPC systems due to power-supply issues!
Do., 16.10.2025 10:55 – voraussichtlich bis Fr., 17.10.2025 17:15
Betroffene Services: [Linux Cluster],[Hoechstleistungsrechner],[AI Systems]

17.10.2025 - 17:15: Power supply issues for the HPC and AI systems have been resolved. All systems are back in operation. Singular nodes might still need extra attention. Please report any persistent issues. We will sort them out next week.

-—-

17.10.2025 - 16:45: Linux Cluster is back in operation.

-—-

17.10.2025 - 16:10: The AI Systems are back in operation.

-—-

17.10.2025 - 12:00: SuperMUC-NG Phase 2 is back in operation.

-—-

16.10.2025 - 19:30: Operation of SuperMUC-NG Phase 1 has restarted. The system is up and queues are running. Phase 2 will follow tomorrow.

-—-

16.10.2025 - 10:45 All HPC systems have been shut down due to power-supply issues! We are working to restore system operation as soon as possible.


Linux Cluster

Linux Cluster on LRZ Service Status
Ankündigungen und Vorfälle
[i] Shut-Down of HPC systems due to power-supply issues!
Do., 16.10.2025 10:55 – voraussichtlich bis Fr., 17.10.2025 17:15
Betroffene Services: [Linux Cluster],[Hoechstleistungsrechner],[AI Systems]

17.10.2025 - 17:15: Power supply issues for the HPC and AI systems have been resolved. All systems are back in operation. Singular nodes might still need extra attention. Please report any persistent issues. We will sort them out next week.

-—-

17.10.2025 - 16:45: Linux Cluster is back in operation.

-—-

17.10.2025 - 16:10: The AI Systems are back in operation.

-—-

17.10.2025 - 12:00: SuperMUC-NG Phase 2 is back in operation.

-—-

16.10.2025 - 19:30: Operation of SuperMUC-NG Phase 1 has restarted. The system is up and queues are running. Phase 2 will follow tomorrow.

-—-

16.10.2025 - 10:45 All HPC systems have been shut down due to power-supply issues! We are working to restore system operation as soon as possible.

[i] New job size limit on serial cluster!
Do., 25.09.2025 12:00 – voraussichtlich bis Fr., 31.10.2025 18:00
Betroffene Services: [Linux Cluster]

In order to optimize the workload on the serial cluster, we increase the maximum job size on both serial partitions from 20 to 32 cores. The current maximum amount of cores per user (96) remains in place. This limit might also be changed in further steps (if necessary). Use 32 cores in task-parallel jobs:

#SBATCH --ntasks=32
#SBATCH --ntasks-per-core=2
#SBATCH --cpus-per-task=1

and in shared-memory jobs:

#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
[i] CoolMUC Cheat Sheet published (updated in October 2025)
So., 01.06.2025 12:00 – voraussichtlich bis Fr., 31.10.2025 18:00
Betroffene Services: [Linux Cluster]

New quick reference. The first release of the CoolMUC Cheat Sheet has been published.

AI Systems

AI Systems on LRZ Service Status
Ankündigungen und Vorfälle
[i] Shut-Down of HPC systems due to power-supply issues!
Do., 16.10.2025 10:55 – voraussichtlich bis Fr., 17.10.2025 17:15
Betroffene Services: [Linux Cluster],[Hoechstleistungsrechner],[AI Systems]

17.10.2025 - 17:15: Power supply issues for the HPC and AI systems have been resolved. All systems are back in operation. Singular nodes might still need extra attention. Please report any persistent issues. We will sort them out next week.

-—-

17.10.2025 - 16:45: Linux Cluster is back in operation.

-—-

17.10.2025 - 16:10: The AI Systems are back in operation.

-—-

17.10.2025 - 12:00: SuperMUC-NG Phase 2 is back in operation.

-—-

16.10.2025 - 19:30: Operation of SuperMUC-NG Phase 1 has restarted. The system is up and queues are running. Phase 2 will follow tomorrow.

-—-

16.10.2025 - 10:45 All HPC systems have been shut down due to power-supply issues! We are working to restore system operation as soon as possible.

All (LRZ and MCML) DGX systems had to be powered off. This message will be updated as as new information becomes available.

The AI Systems (including the BayernKI and MCML system segments) will undergo a maintenance procedure between September 8th and 10th, 2025. On these days, the system will not be available to users. Normal user operation is expected to resume during the course of Wednesday, September 10th.

The LRZ AI Systems have to undergo a short maintenance early next week. For this, the system will be drained over the weekend and start-up of new jobs will be delayed until after the maintenance. We expect the actual downtime to not exceed more than 10 minutes.

The AI Systems (including the BayernKI and MCML system segments) will undergo a maintenance procedure between May 19th and 21st, 2025. On these days, the system will not be available to users. Normal user operation is expected to resume during the course of Wednesday, May 21st.


Compute Cloud

Compute Cloud on LRZ Service Status
Ankündigungen und Vorfälle
[Behoben] [Ausfall] Compute Cloud and Attended Cloud Housing Offline due to Power Cut
Do., 16.10.2025 10:50 – Fr., 17.10.2025 15:26
Betroffene Services: [Compute Cloud],[Attended Compute Cloud Housing]

The Compute Cloud and Attended Cloud Housing are offline due to power issues. Once power has been restored we will attempt to restore the system to normal operation soon.