High Performance Computing

<< Zurück zur Dokumentationsstartseite

High Performance Computing

 

Forgot your Password? click here
Add new user (only for SuperMUC-NG)?
click here

Add new IP(only for SuperMUC-NG)?
click here
How to write good LRZ Service Requests? click here

How to setup two-factor authentication (2FA) on HPC systems? click here

New: Virtual "HPC Lounge" to ask question and get advice. Every Wednesday, 2:00pm - 3:00pm
For details and Zoom Link see: HPC Lounge

System Status (see also: Access and Overview of HPC Systems)

GREEN = fully operational YELLOW = operational with restrictions (see messages below) RED = not available = see messages below



Höchstleistungsrechner (SuperMUC-NG)

login nodes: skx.supermuc.lrz.de LOGIN

archive nodes: skx-arch.supermuc.lrz.de ARCHIVE

File Systems  
HOME WORK SCRATCH DSS DSA

Partitions/Queues: 
 MIRCRO GENERAL LARGE

  FAT TEST

Detailed node status

Details:

Submit an Incident Ticket for the SuperMUC-NG

Add new user? click here

Add new IP? click here

Questions about 2FA on SuperMUC-NG? click here


Linux Cluster 

CoolMUC-4

login nodes: cool.hpc.lrz.de

UP


serial partition serial_std

serial partition serial_long

parallel partitions cm4_ (tiny | std)

interactive partition: cm4_inter

UP

UP

UP

UP


teramem_inter

UP


LXC Housing Clusters
(Access only by the specific owners/users of these systems.)

kcs

PARTIALLY UP

 

biohpc

MOSTLY UP

 

hpda

UP

 

File Systems

HOME
SCRATCH_DSS
DSS
DSA

UP
UP
UP
UP


 

Detailed node status
Detailed queue status



Details:

Submit an Incident Ticket for the Linux Cluster 


DSS Storage systems

For the status overview of the Data Science Storage please go to

https://doku.lrz.de/display/PUBLIC/Data+Science+Storage+Statuspage


Messages

see also: Aktuelle LRZ-Informationen / News from LRZ

Messages for all HPC System

Currently there are no news.


Messages for SuperMUC-NG

Firmware update on SuperMUC-NG Phase 2

For a firmware update we have reserved all compute nodes from Thrusday 08:00:00  until Monday T08:00:00. Job processing ist suspended during this period.

Messages for Linux Clusters

CM4 Maintenance successfully finished

  • CPU limits of the serial and cm4_tiny queues have been changed. For details please check our documentation: Job Processing on the Linux-Cluster. Previously used SLURM job scripts may need to be modified to be compliant with the new implemented policies!

We strongly recommend that users of the "serial" cluster check whether their workflows can be parallelized, in order to be able to benefit from the parallel cluster segment "cm4". If even "cm4" is insufficient, you may also consider an application for a (test) project on the SuperMUC-NG.

Do you need further consulting? Don't hesitate to contact us via Servicedesk or via virtual Zoom meeting in our HPC Lounge.

Intro of the CoolMUC-4 Linux Cluster

In December 2024 the Linux Cluster CoolMUC-4 (CM4) has been introduced. The CM4 cluster comprises some ~12.000 cores based on Intel® Xeon®Platinum 8480+ (Sapphire Rapids) interconnected by an Infiniband network. Please have a look at the updated documentation before filing a ticket to the LRZ Service Desk. Please mind the changed outline of the module system and the number of 112 CPU cores and 512Gb RAM per compute node on CM4 hardware.

Remarks on Spack software stack availability and INTEL related modules

Please note: Since the last maintenance at the end of November 2024 the latest LRZ software stack spack/23.1.0 is set as default on the CoolMUC-4 partitions! The old software stack spack/22.2.1 is still available via the according module.
Important Note 1: The naming conventions for INTEL related modules (Compiler, MPI, MKL) in both Spack software stacks differ from each other, so that users eventually need to update their SLURM scripts accordingly.
Important Note 2: On CM4 the Intel related modules are no longer preloaded as default modules. Users are self-responsible for loading required Intel compiler, Intel MPI and Intel MKL modules in their corresponding SLURM scripts. Various versions and flavours of these modules are provided through the Spack software stacks.

Messages for Compute Cloud and other HPC Systems

The AI Systems (including the MCML system segment) will undergo a maintenance procedure between February 17th and 19th, 2025. On these days, the system will not be available to users. Normal user operation is expected to resume during the course of Wednesday, February 19th.

We are currently observing and investigating connection issues to https://login.ai.lrz.de. UPDATE: The issue has been resolved.

The AI Systems will be affected by an infrastructure power cut scheduled in November 2024. The following system partitions will become unavailable for 3 days during the specified time frame. We apologise for the inconvenience associated with that.

Calendar Week 46, 2024-11-11 - 2024-11-13

  • lrz-v100x2
  • lrz-hpe-p100x4
  • lrz-dgx-1-p100x8
  • lrz-dgx-1-v100x8
  • lrz-cpu (partly)
  • test-v100x2
  • lrz-hgx-a100-80x4
  • mcml-hgx-a100-80x4
  • mcml-hgx-a100-80x4-mig

The AI Systems (including the MCML system segment) are under maintenance between September 30th and October 2nd, 2024. On these days, the system will not be available to users. Normal user operation is expected to resume during the course of Wednesday, October 2nd.

The previously announced scheduled downtime between 2024-09-16 and 2024-09-27 (Calendar Week 38 & 39) has been postponed until further notice. (Warnung) The system will remain in user operation up to the scheduled maintenance at the end of September.