High Performance Computing

<< Zurück zur Dokumentationsstartseite

High Performance Computing

 

Forgot your Password? click here
Add new user (only for SuperMUC-NG)?
click here

Add new IP(only for SuperMUC-NG)?
click here
How to write good LRZ Service Requests? click here

How to setup two-factor authentication (2FA) on HPC systems? click here

For the FORTRAN Course there are still open seats!

New: Virtual "HPC Lounge" to ask question and get advice. Every Wednesday, 2:00pm - 3:00pm
For details and Zoom Link see: HPC Lounge

System Status (see also: Access and Overview of HPC Systems)

GREEN = fully operational YELLOW = operational with restrictions (see messages below) RED = not available = see messages below



Höchstleistungsrechner (SuperMUC-NG)

login nodes: skx.supermuc.lrz.de LOGIN

archive nodes: skx-arch.supermuc.lrz.de ARCHIVE

File Systems  
HOME WORK SCRATCH DSS DSA

Partitions/Queues: 
 MIRCRO GENERAL LARGE

  FAT TEST

Detailed node status

Details:

Submit an Incident Ticket for the SuperMUC-NG

Add new user? click here

Add new IP? click here

Questions about 2FA on SuperMUC-NG? click here


Linux Cluster 

CoolMUC-4

login nodes: cool.hpc.lrz.de

UP


serial partition serial_std

serial partition serial_long

parallel partitions cm4_ (tiny | std)

interactive partition: cm4_inter

UP

UP

UP

UP


teramem_inter

UP


LXC Housing Clusters
(Access only by the specific owners/users of these systems.)

kcs

PARTIALLY UP

 

biohpc

MOSTLY UP

 

hpda

UP

 

File Systems

HOME
SCRATCH_DSS
DSS
DSA

UP
UP
UP
UP


 

Detailed node status
Detailed queue status



Details:

Submit an Incident Ticket for the Linux Cluster 


DSS Storage systems

For the status overview of the Data Science Storage please go to

https://doku.lrz.de/display/PUBLIC/Data+Science+Storage+Statuspage


Messages

see also: Aktuelle LRZ-Informationen / News from LRZ

Messages for all HPC System

New Version of ANSYS 2025.R1 available

Today the new version of the ANSYS software, Vers. 2025.R1 has been installed, tested and made available on CoolMUC-4 and SuperMUC-NG under operating system SLES 15. The new version has been made the new default version for all major ANSYS software components.
Currently there is a still existing issue with LS-Dyna solver, version 2025.R1, so that currently this solver cannot yet be provided. Here the default version remains at 2024.R2 until further notice.
The LS-Dyna related module issue has been fixed by now for CM4 and SNG, so that by now the new release 2025.R1 became the default solver version for this software component as well.


Messages for SuperMUC-NG

Maintenance SuperMUC-NG Phase 2

We have started a planned maintenance period for the phase 2 system. The maintenance of SuperMUC-NG Phase 2  will continue for the whole week.

Messages for Linux Clusters

  Maintenance: Change in Slurm configuration on CoolMUC-4

Please take note of the following announcement! Although CoolMUC-4 has more powerful CPUs, the system has much less CPU cores and nodes than the predecessor system. The resources are limited. The demand is increasing. We are therefore forced to make changes to the Slurm configuration. This will mainly have an impact on the job processing on the cluster segments "serial" and "cm4". Please refer to Job Processing on the Linux-Cluster for details on the updated cluster configuration.

maintenance was carried out on Friday 21 to apply the changes. The most important changes are:

  • On "serial" cluster, CPU limits have been reduced to a maximum of 8 cores per job and a maximum of 24 cores in total over all running jobs of a user.
  • The time limit on serial_std has been decreased to 24 hours.
  • The nodes of partition "cm4_tiny" will be used as shared nodes, i.e., the nodes can be allocated to multiple jobs (of different users). Furthermore, the "minimum CPU limit" has been decreased to 8 cores per job.
  • The nodes of partition "cm4_inter" will be used as shared nodes as well.

Users need to do after maintenance:

  • Please check your submitted (pending) jobs right in time! Pending jobs violating the new limits might never start. You may cancel and resubmit jobs or adjust the job settings. Learn more in the Slurm Commands section here.
  • Slurm job scripts may need to be modified to be compliant with the new policies! Check here: Job Processing on the Linux-Cluster

We strongly recommend that users of the "serial" cluster check whether their workflows can be parallelized, in order to be able to benefit from the parallel cluster segment "cm4". If even "cm4" is insufficient, you may also consider an application for a (test) project on the SuperMUC-NG.

Do you need further consulting? Don't hesitate to contact us via Servicedesk or via virtual Zoom meeting in our HPC Lounge.

Access to the new CoolMUC-4 has been opened

In December 2024 the access to the new CoolMUC-4 (CM4) Linux Cluster has been granted. The CM4 cluster comprises some ~12.000 cores based on Intel® Xeon®Platinum 8480+ (Sapphire Rapids) interconnected by an Infiniband network. Please have a look at the updated documentation before filing a ticket to the LRZ Service Desk. Please mind the changed outline of the module system and the number of 112 CPU cores and 512Gb RAM per compute node on CM4 hardware.

Remarks on Spack software stack availability and INTEL related modules

Please note: Since the last maintenance at the end of November 2024 the latest LRZ software stack spack/23.1.0 is set as default on the CoolMUC-4 partitions! The old software stack spack/22.2.1 is still available via the according module.
Important Note 1: The naming conventions for INTEL related modules (Compiler, MPI, MKL) in both Spack software stacks differ from each other, so that users eventually need to update their SLURM scripts accordingly.
Important Note 2: On CM4 the Intel related modules are no longer preloaded as default modules. Users are self-responsible for loading required Intel compiler, Intel MPI and Intel MKL modules in their corresponding SLURM scripts. Various versions and flavours of these modules are provided through the Spack software stacks.

Messages for Compute Cloud and other HPC Systems

The AI Systems (including the MCML system segment) will undergo a maintenance procedure between February 17th and 19th, 2025. On these days, the system will not be available to users. Normal user operation is expected to resume during the course of Wednesday, February 19th.

We are currently observing and investigating connection issues to https://login.ai.lrz.de. UPDATE: The issue has been resolved.

The AI Systems will be affected by an infrastructure power cut scheduled in November 2024. The following system partitions will become unavailable for 3 days during the specified time frame. We apologise for the inconvenience associated with that.

Calendar Week 46, 2024-11-11 - 2024-11-13

  • lrz-v100x2
  • lrz-hpe-p100x4
  • lrz-dgx-1-p100x8
  • lrz-dgx-1-v100x8
  • lrz-cpu (partly)
  • test-v100x2
  • lrz-hgx-a100-80x4
  • mcml-hgx-a100-80x4
  • mcml-hgx-a100-80x4-mig

The AI Systems (including the MCML system segment) are under maintenance between September 30th and October 2nd, 2024. On these days, the system will not be available to users. Normal user operation is expected to resume during the course of Wednesday, October 2nd.

The previously announced scheduled downtime between 2024-09-16 and 2024-09-27 (Calendar Week 38 & 39) has been postponed until further notice. (Warnung) The system will remain in user operation up to the scheduled maintenance at the end of September.