High Performance Computing

<< Zurück zur Dokumentationsstartseite

High Performance Computing

 

Forgot your Password? click here
Add new user (only for SuperMUC-NG)?
click here

Add new IP(only for SuperMUC-NG)?
click here
How to write good LRZ Service Requests? click here

How to setup two-factor authentication (2FA) on HPC systems? click here

Access to the new CoolMUC-4 has been opened on Tuesday December 10th

New: Virtual "HPC Lounge" to ask question and get advice. Every Wednesday, 2:00pm - 3:00pm
For details and Zoom Link see: HPC Lounge

System Status (see also: Access and Overview of HPC Systems)

GREEN = fully operational YELLOW = operational with restrictions (see messages below) RED = not available = see messages below



Höchstleistungsrechner (SuperMUC-NG)

login nodes: skx.supermuc.lrz.de LOGIN

archive nodes: skx-arch.supermuc.lrz.de ARCHIVE

File Systems  
HOME WORK SCRATCH DSS DSA

Partitions/Queues: 
 MIRCRO GENERAL LARGE

  FAT TEST

Detailed node status

Details:

Submit an Incident Ticket for the SuperMUC-NG

Add new user? click here

Add new IP? click here

Questions about 2FA on SuperMUC-NG? click here


Linux Cluster 

CoolMUC-4

login nodes: cool.hpc.lrz.de

UP


serial partition serial_std

serial partition serial_long

parallel partitions cm4_ (tiny | std)

interactive partition: cm4_inter

PARTIALLY UP

UP

UP

MOSTLY UP


teramem_inter

UP


LXC Housing Clusters
(Access only by the specific owners/users of these systems.)

kcs

PARTIALLY UP

 

biohpc

MOSTLY UP

 

hpda

UP

 

File Systems

HOME
SCRATCH_DSS
DSS
DSA

UP
UP
UP
UP


 

Detailed node status
Detailed queue status



Details:

Submit an Incident Ticket for the Linux Cluster 


DSS Storage systems

For the status overview of the Data Science Storage please go to

https://doku.lrz.de/display/PUBLIC/Data+Science+Storage+Statuspage


Messages

see also: Aktuelle LRZ-Informationen / News from LRZ

Messages for all HPC System

New Version of StarCCM+ 2024.3.1 available

Today the new version of the CFD solver system StarCCM+ from Siemens PLM, Vers. 2024.3.1 (aka version 2410.0001 = 19.06.009) has been installed, tested and made available on CoolMUC-4 and SuperMUC-NG under operating system SLES 15. The new version has been made the new default version of StarCCM+.


Messages for SuperMUC-NG

Experienced system-wide issues with slurm at 8am.

UPDATE: Issues have been solved.

Messages for Linux Clusters

CoolMUC-2 and CoolMUC-3 finally switched off

After 9 years of operation the hardware of CoolMUC-2 could no longer offer reliable service. Hardware and software support for the Knights Landing nodes and the Omni Path network on CoolMUC-3 (mpp3_batch) has ended several years ago. We decided to switch off both systems. CoolMUC-2 and CoolMUC-3 are no longer available. That also applies to the login nodes lxlogin[1-4].lrz.de.

We kindly ask users to move to the new CoolMUC-4 system. As CoolMUC-4 uses the same file systems, users may continue work on their data on the new system.

Access to the new CoolMUC-4 has been opened

On Tuesday, the 10th December 2024 the access to the new CoolMUC-4 (CM4) Linux Cluster will be granted. The CM4 cluster comprises some ~12.000 cores based on Intel® Xeon®Platinum 8480+ (Sapphire Rapids) interconnected by an Infiniband network. 
Please be aware of larger changes to the LRZ Linux Cluster and HPC landscape and in case of issues please have a look for updated documentation before filing a ticket to the LRZ Service Desk. Please mind the changed outline of the module system and the number of 112 CPU cores and 512Gb RAM per compute node on CM4 hardware.
For more details please refer to the LRZ Status pages for the LRZ HPC Systems.

  • 09.12.2024: Module system partially updated for the upcoming opening of CM4 queues - (Haken) 
  • 10.12.2024: New access to CM4 login nodes via load balancer cool.hpc.lrz.de - (Haken) 
  • 10.12.2024: cm4_tiny | cm4_std queues are being introduced - (Haken) 
  • 10.12.2024: cm4_inter_large_mem - becomes deprecated and will no longer be available - (Fehler) 
  • 10.12.2024: cm4_inter is introduced as the new CM4 interactive queue - (Haken) 
  • 13.12.2024: cm2/cm3 have finally been switched off. (Fehler) 
  • 13.12.2024: Login nodes lxlogin1,2,3,4,8 have been switched off; no more access (Fehler) 
  • 16.12.2024: The serial queue (serial_std | serial_long) will refer to new CM4 hardware. - (Haken) 

Remarks on Spack software stack availability and INTEL related modules

Please note: Since the last maintenance at the end of November 2024 the latest LRZ software stack spack/23.1.0 is set as default on the CoolMUC-4 partitions! The old software stack spack/22.2.1 is still available via the according module.
Important Note 1: The naming conventions for INTEL related modules (Compiler, MPI, MKL) in both Spack software stacks differ from each other, so that users eventually need to update their SLURM scripts accordingly.
Important Note 2: On CM4 the Intel related modules are no longer preloaded as default modules. Users are self-responsible for loading required Intel compiler, Intel MPI and Intel MKL modules in their corresponding SLURM scripts. Various versions and flavours of these modules are provided through the Spack software stacks.

Messages for Compute Cloud and other HPC Systems

The AI Systems will be affected by an infrastructure power cut scheduled in November 2024. The following system partitions will become unavailable for 3 days during the specified time frame. We apologise for the inconvenience associated with that.

Calendar Week 46, 2024-11-11 - 2024-11-13

  • lrz-v100x2
  • lrz-hpe-p100x4
  • lrz-dgx-1-p100x8
  • lrz-dgx-1-v100x8
  • lrz-cpu (partly)
  • test-v100x2
  • lrz-hgx-a100-80x4
  • mcml-hgx-a100-80x4
  • mcml-hgx-a100-80x4-mig

The AI Systems (including the MCML system segment) are under maintenance between September 30th and October 2nd, 2024. On these days, the system will not be available to users. Normal user operation is expected to resume during the course of Wednesday, October 2nd.

The previously announced scheduled downtime between 2024-09-16 and 2024-09-27 (Calendar Week 38 & 39) has been postponed until further notice. (Warnung) The system will remain in user operation up to the scheduled maintenance at the end of September.