LRZ AI Systems
NOTICE
This system is currently in pilot operation.
MAINTENANCE (INCLUDING DOWNTIME)
ANNOUNCEMENT: The AI Systems (including the MCML system segment) will undergo a maintenance procedure between September 30th and October 2nd, 2024. On these days, the system will not be available to users. Normal user operation is expected to resume during the course of Wednesday, October 2nd.
SCHEDULED DOWNTIME
UPDATE: The announced scheduled downtimes for 16/09/2024 to 20/09/2024 (Calendar Week 38) and 23/09/2024 to 27/09/2024 (Calendar Week 39) have been postponed until further notice. The system will remain in user operation without interruption. We will update this message as soon as the downtimes associated with the necessary power cuts have been re-scheduled.
SCHEDULED DOWNTIME: The AI Systems will be affected by an infrastructure power cut scheduled later this year. Based on current information, the following system partitions will become unavailable during the specified time frames (scope and dates may be subject to change; this notice will be updated as needed).
16/09/2024 to 20/09/2024 (Calendar Week 38)
lrz-v100x2lrz-hpe-p100x4lrz-dgx-1-p100x8lrz-dgx-1-v100x8lrz-cpu (partly)test-v100x2
23/09/2024 to 27/09/2024 (Calendar Week 39)
lrz-hgx-a100-80x4mcml-hgx-a100-80x4mcml-hgx-a100-80x4-mig
SYSTEM ACCESS
Access to this system is only granted to existing Linux Cluster accounts upon additional request (see 3. Access and Getting Started). If you have not requested access, you will not be able to use the system. Additionally, the LRZ AI Systems are currently only reachable from within the Munich Scientific Network ("Münchner Wissenschaftsnetz", MWN; including VPN).
JOB SUBMISSION: --gres=gpu:X required
You must always indicate the --gres=gpu option when requesting a GPU resources allocation.
e.g., if you want to use 2 GPUs on a system, you must add --gres=gpu:2 when allocating resources
Documentation
- 1. General Description and Resources
- 2. Storage on the LRZ AI Systems
- 3. Access and Getting Started
- 4. Introduction to Enroot: The Software Stack Provider for the LRZ AI Systems
- 5. Using NVIDIA NGC Containers on the LRZ AI Systems
- 6. Running Applications as Interactive Jobs on the LRZ AI Systems
- 7. Running Applications as Batch Jobs on the LRZ AI Systems
- 8. Multi-GPU Jobs on the LRZ AI Systems
- 9. Creating and Reusing a Custom Enroot Container Image
- 10. Interactive Web Servers on the LRZ AI Systems
- 11. Public Datasets and Containers on the LRZ AI Systems
- 98. AI Systems Reference
- 99. AI Systems Announcements