Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

HideElements
breadcrumbtrue
titletrue
spacelogotrue

<< Zurück zur Dokumentationsstartseite

Lrz box
Picture/images/lrz/Icon_HPC.png
Heading1High Performance Computing

Forgot your Password? click here
Add new user (only for SuperMUC-NG)?
click here

Add new IP(only for SuperMUC-NG)?
click here
How to write good LRZ Service Requests? click here


System Status (see also: Access and Overview of HPC Systems)

Status
colourGreen
= fully operational
Status
colourYellow
= operational with restrictions (see messages below)
Status
colourRed
= not available



Höchstleistungsrechner (SuperMUC-NG)

System: 


login nodes: skx.supermuc.lrz.de

Status
colourGreen
titleupUP

archive nodes: skx-arch.supermuc.lrz.de

Status
colourGreen
titleup

File Systems
HOME
WORK
SCRATCH
DSS
DSA


Status
colourGreen
titleup

Status
colourGreen
titleup

Status
colourGreen
titleup

Status
colourGreen
titleup

Status
colourGreen
titleup

Partitions/Queues: 
micro, general, large

fat, test


Status
colourGreen
titleup

Status
colourGreen
titleup

 Globus Online File Transfer: 

Status
colourGreen
titleup

Detailed node status


Details:

Submit an Incident Ticket for the SuperMUC-NG

Add new user? click here

Add new IP? click here



Linux Cluster

CoolMUC-2
lxlogin(1,2,3,4).lrz.de

Status
colourGreen
titleupUP

serial partitions: serial

Status
colourGreen
titleup

parallel partitions cm2_(std,large)

Status
colourGreen
titleup

cluster cm2_tiny

Status
colourGreen
titleup

interactive partition: cm2_inter

Status
colourGreen
titleup

c2pap

Status
colourGreen
titleup

CoolMUC-3

lxlogin(8,9).lrz.de

parallel partition: mpp3_batch

interactive partition: mpp3_inter


Status
colourGreen
titleup

Status
colourGreen
titleup

Status
colourGreen
titleup

others

teramem_inter

kcs

biohpc

Status
colourGreen
titleup

Status
colourGreen
titleup

Status
colourGreen
titleup

File Systems

HOME
SCRATCH
DSS
DSA


Status
colourGreen
titleup

Status
colourGreen
titleup

Status
colourGreen
titleup

Status
colourGreen
titleup

Detailed node status
Detailed queue status


Details:

Submit an Incident Ticket for the Linux Cluster



colour

Compute Cloud and
other HPC Systems

Compute Cloud: (https://cc.lrz.de)

detailed status and free slots: https://cc.lrz.de/lrz

Status
colourGreen
titleup

LRZ AI Systems

Status
colourGreen
titleUP

RStudio Server

Status
RedtitleEnd of LIfe

Details:

Submit an Incident Ticket for the Compute Cloud

Submit an Incident Ticket for RStudio Server



Messages for SuperMUC-NG

09 Jun The problem with the software stack has been resolved.

as part of the recent maintenance, we shifted the software stack to a new filesystem. Yesterday, during the late afternoon a degradation occurred that led to segfaulting of applications. The root cause is still under investigation. 

As a temporary remedy, we switched back to the software stack on the previous filesystem yesterday evening (May 19). Today, we confirmed that most applications seem to work well.

We will keep you informed about the progress to a complete solution of the problem.

Apologies for any inconveniences.

Messages for Linux Clusters
A security maintenance impacting all cluster systems has been scheduled to start . Please read https://www.lrz.de/aktuell/ali00940.html for details

new ANSYS software release 2022.R2 has been installed and tested on SuperMUC-NG. Corresponding modules and updated documentation is provided. In turn several older solver versions had to be deactivated, since they were no longer functional on SLES 15 SP3. For details see here

The Omnipath issue appeared again causing partial unavailability of the WORK/SCRATCH filesystems which might cause job crashes. It is again under investigation.

Update : We experienced further problems with the HOME filesystem yesterday late afternoon, but now the whole system is back to normal operation. Executing jobs may have failed and will require resubmission if so. 

Unfortunately, the Omnipath issue from two days ago has resurfaced. It is under investigation, but we are observing job crashes and unmounts of file systems.

Update 14:50: The system is now back to operation.

Due to an internal networking problem access to the file systems SCRATCH and WORK was partially disrupted between ~9:30 and 11:30. The system is back to operation again, but during the referenced time interval, executing jobs may have failed and will require resubmission if so. 



Messages for Linux Clusters

  The new ANSYS software release 2022.R2 has been installed and tested on Linux Clusters CMUC2/3 and on old as well as on "new" RVS systems. For more details on this ANSYS software release see here

We encountered temporary problems with the HOME filesystem yesterday late afternoon, but the system is now back to normal operation. Executing jobs may have failed and will require resubmission if so. 

SCRATCH is currently only partially available because some storage servers have gone offline. We are working to revive them.

The implication is that not all data on the file system are currently accessible, so jobs needing pre-existing data may fail.

Update 15:40: SCRATCH is now fully available again.

There are 4 "new" Remote Visualization (RVS_2021) nodes available. The machines are in production mode. Nodes are operated under Ubuntu OS and NoMachine. Usage is limited to 2 hours and if you need a longer period of time, please file an LRZ Service Request. For more details please refer to the documentation.

Temporarily and as an "experiment" the maximum usage time of a "new" RVS node has been increased to 8 hours.



Messages for Compute Cloud and other HPC Systems

We have observed and addressed an issue with the LRZ AI Systems that concerned some running user jobs. As of now, newly started jobs should not be affected anymore. 
The work on

The maintenance of the LRZ AI Systems

to address the recently observed stability issues

has

been

concluded.

All users are invited to continue their work. We closely monitor system operation and will provide additional updates if needed. Thank you for your patience and understanding.

We have identified the likely root cause for the ongoing issues with the LRZ AI and MCML Systems following the latest maintenance downtime. We continue work towards a timely resolution and can currently not guarantee uninterrupted & stable system availability. For further detailsNote that the storage options have changed fundamentally. The previous home directories have been superseded by the default Linux Cluster home directories, please see LRZ AI Systems27 Apr for additional details.

The LRZ AI and MCML Systems did will undergo a maintenance procedure from April 25th to April 27th (both inclusive.) During this periodbetween August 1st and 3rd, 2022. On these days, the system was will not be available to users. Normal user operation did resume on 2022-04-27 16:30is expected to resume on August 4th. Apologies for any inconveniences. Please see LRZ AI Systems for additional details as they become available.



More Links

Children Display