High Performance Computing
<< Zurück zur Dokumentationsstartseite
High Performance Computing
Forgot your Password? click here
Add new user (only for SuperMUC-NG)? click here
Add new IP(only for SuperMUC-NG)? click here
How to write good LRZ Service Requests? click here
How to setup two-factor authentication (2FA) on HPC systems? click here
End of Life: CoolMUC-2 and CoolMUC-3 will be switched off on Friday December 13th
New: Virtual "HPC Lounge" to ask question and get advice. Every Wednesday, 2:00pm - 3:00pm
For details and Zoom Link see: HPC Lounge
System Status (see also: Access and Overview of HPC Systems)
GREEN = fully operational YELLOW = operational with restrictions (see messages below) RED = not available = see messages below
Höchstleistungsrechner (SuperMUC-NG) | |
login nodes: skx.supermuc.lrz.de LOGIN | |
archive nodes: skx-arch.supermuc.lrz.de ARCHIVE | |
File Systems | |
Partitions/Queues: FAT TEST | |
Detailed node status | |
Details:
| |
Submit an Incident Ticket for the SuperMUC-NG Add new user? click here Add new IP? click here Questions about 2FA on SuperMUC-NG? click here |
Linux Cluster | |||
CoolMUC-2 | see messages below | ||
lxlogin(1,2,3,4).lrz.de | ISSUES |
| |
serial partition serial_std | UP |
| |
serial partition serial_long | UP | ||
parallel partitions cm2_(std,large) | MOSTLY UP | ||
cluster cm2_tiny | UP | ||
interactive partition: cm2_inter | UP | ||
c2pap | UP |
| |
C2PAP Work filesystem: /gpfs/work | READ-ONLY | ||
CoolMUC-3 lxlogin(8,9).lrz.de parallel partition: mpp3_batch interactive partition: mpp3_inter | 2FA ISSUES MOSTLY UP PARTIALLY UP | ||
CoolMUC-4 lxlogin5.lrz.de interactive partition: cm4_inter_large_mem | UP UP | ||
others | |||
teramem_inter | UP |
| |
kcs | MOSTLY UP |
| |
biohpc | UP |
| |
hpda | UP |
| |
File Systems HOME | ISSUES | | |
Details: | |||
|
Compute Cloud and | ||
---|---|---|
Compute Cloud: (https://cc.lrz.de) detailed status: Status | UP | |
LRZ AI Systems | UP | |
Details: | ||
DSS Storage systems |
---|
For the status overview of the Data Science Storage please go to https://doku.lrz.de/display/PUBLIC/Data+Science+Storage+Statuspage |
Messages
see also: Aktuelle LRZ-Informationen / News from LRZ
Messages for all HPC System |
A new software stack (spack/23.1.0) is available on the CoolMUC- 2 and SuperMUC-NG. Release Notes of Spack/23.1.0 Software Stack |
Messages for SuperMUC-NG |
Maintenance finished. System is back in operation. |
Messages for Linux Clusters |
Cluster maintenance from Nov 11th 2024 until Nov 15th 2024Due to works on the power grid infrastructure and security relevant system updates all denoted cluster segments are in maintenance from Monday, Nov 11th 2024 at 06:30am until Friday, Nov 15th 2024 at approx. 6:00pm: CoolMUC-3 Cluster:
CoolMUC-4 Cluster:
This means that neither scripted batch jobs nor “salloc” style interactive jobs will execute. lxlogin[1-4] can be used continuously as CoolMUC-2 stays in operation. |
CoolMUC-2/-3: For the CM-2 queues due to degeneration of the cluster communication network they are open for single-node jobs only. SLURM restrictions apply. For CM-3 multi-node jobs can be submitted again. Please abstain from submitting tickets about software modernization requests on both systems. The systems are provided "as is" for the remaining lifetime. (see below) |
Legacy SCRATCH File System of CoolMUC-2/3 Broken - Data recovery On severe hardware failures occured on the CoolMUC clusters (SCRATCH filesystem, switches). As a mitigation, until end-of-life of CoolMUC-2/3, we have mapped the SCRATCH variable to SCRATCH_DSS (/dss/lxclscratch/.../$USER) also accessible now on CoolMUC-2. Update: Our administrators managed to bring the filesystem back up in read-only mode:
Please do not use the $SCRATCH environment variable, rather absolute paths, e.g., /gpfs/scratch/<project-id>/<user-id>. We cannot guarantee data integrity or completeness. Please save all relevant files as soon as possible. We try to keep the filesystem alive and mounted until 8 November (12pm noon). After 8 November 12pm, it will be permanently shut down. |
End-of-Life Announcement for CoolMUC-2After 9 years of operation the hardware of CoolMUC-2 can no longer offer reliable service. The system is targeted to be turned off latest Friday . Due to network degradation we can only support single node jobs on a best-effort basis until then. In case of further hardware problems, the shutdown date might be much earlier. |
End-of-Life Announcement of CoolMUC-3Hardware and software support for the Knights Landing nodes and the Omni Path network on CoolMUC-3 (mpp3_batch) has ended several years ago and needs to be decommissioned. The system is targeted to be turned off Friday along with CoolMUC-2. In case of further hardware problems, the shutdown date might be earlier. |
New Cluster Segment CoolMUC-4Hardware for a new cluster system, CoolMUC-4, has been delivered and is currently being installed and tested. The cluster comprises some ~12.000 cores based on Intel® Xeon®Platinum 8480+ (Sapphire Rapids). We expect start of user operation beginning of December 2024. |
Messages for Compute Cloud and other HPC Systems |
The AI Systems will be affected by an infrastructure power cut scheduled in November 2024. The following system partitions will become unavailable for 3 days during the specified time frame. We apologise for the inconvenience associated with that. Calendar Week 46, 2024-11-11 - 2024-11-13
The AI Systems (including the MCML system segment) are under maintenance between September 30th and October 2nd, 2024. On these days, the system will not be available to users. Normal user operation is expected to resume during the course of Wednesday, October 2nd. The previously announced scheduled downtime between 2024-09-16 and 2024-09-27 (Calendar Week 38 & 39) has been postponed until further notice. The system will remain in user operation up to the scheduled maintenance at the end of September. |
HPC Services
Attended Cloud Housing |
More Links