LRZ AI Systems

NOTICE

This system is currently in pilot operation.

MAINTENANCE (INCLUDING DOWNTIME)

ANNOUNCEMENT: The AI Systems (including the MCML system segment) will undergo a maintenance procedure between September 30th and October 2nd, 2024. On these days, the system will not be available to users. Normal user operation is expected to resume during the course of Wednesday, October 2nd.

SCHEDULED DOWNTIME

UPDATE: The announced scheduled downtimes for 16/09/2024 to 20/09/2024 (Calendar Week 38) and 23/09/2024 to 27/09/2024 (Calendar Week 39) have been postponed until further notice. (warning) The system will remain in user operation without interruption. We will update this message as soon as the downtimes associated with the necessary power cuts have been re-scheduled.

SCHEDULED DOWNTIME: The AI Systems will be affected by an infrastructure power cut scheduled later this year. Based on current information, the following system partitions will become unavailable during the specified time frames (scope and dates may be subject to change; this notice will be updated as needed).

16/09/2024 to 20/09/2024 (Calendar Week 38)

  • lrz-v100x2
  • lrz-hpe-p100x4
  • lrz-dgx-1-p100x8
  • lrz-dgx-1-v100x8
  • lrz-cpu (partly)
  • test-v100x2

23/09/2024 to 27/09/2024 (Calendar Week 39)

  • lrz-hgx-a100-80x4
  • mcml-hgx-a100-80x4
  • mcml-hgx-a100-80x4-mig

SYSTEM ACCESS

Access to this system is only granted to existing Linux Cluster accounts upon additional request (see 3. Access and Getting Started). If you have not requested access, you will not be able to use the system. Additionally, the LRZ AI Systems are currently only reachable from within the Munich Scientific Network ("Münchner Wissenschaftsnetz", MWN; including VPN).

JOB SUBMISSION: --gres=gpu:X required

You must always indicate the --gres=gpu option when requesting a GPU resources allocation.

e.g., if you want to use 2 GPUs on a system, you must add --gres=gpu:2 when allocating resources

Documentation