1. General Description and Resources

MAINTENANCE NOTICE

"lrz-v100", "lrz-hpe-p100" and "lrz-cpu" SLURM partitions have been operating in degraded mode since 13.04.2024. They may be slower or temporarily unavailable. No other partition will be affected by this ongoing worker nodes maintenance.

Aim

This system is primarily oriented towards Big Data & AI communities with a focus on GPU resource needs. Support for other use cases of these GPUs is currently limited.

Compute Hardware

The following table summarizes the available compute hardware resources and the Slurm partitions to which jobs targeting these resources need to be submitted (partitions in grey are currently primarily dedicated to be used interactively via Interactive Web Servers and can typically not be targeted directly). The default time limit for individual jobs (allocations) is one hour, and the maximum is 2 days (--time=2-00:00:00).

	Slurm Partition	Number of nodes	CPUs per node	Memory per node	GPUs per node	Memory per GPU
HGX H100 Architecture (BayernKI)	lrz-hgx-h100-94x4	30	96	768 GB	4 NVIDIA H100	94 GB HBM2
HGX A100 Architecture	lrz-hgx-a100-80x4	5	96	1 TB	4 NVIDIA A100	80 GB HBM2
DGX A100 Architecture	lrz-dgx-a100-80x8	4	252	2 TB	8 NVIDIA A100	80 GB HBM2
DGX A100 Architecture	lrz-dgx-a100-40x8-mig	1	252	1 TB	8 NVIDIA A100	40 GB HBM2
DGX-1 V100 Architecture	lrz-dgx-1-v100x8	1	76	512 GB	8 NVIDIA Tesla V100	16 GB HBM2
DGX-1 P100 Architecture	lrz-dgx-1-p100x8	1	76	512 GB	8 NVIDIA Tesla P100	16 GB HBM2
HPE Intel Skylake + NVIDIA Node	lrz-hpe-p100x4	1	28	256 GB	4 NVIDIA Tesla P100	16 GB HBM2
V100 GPU Nodes	lrz-v100x2 (default)	4	19	368 GB	2 NVIDIA Tesla V100	16 GB HBM2
CPU Nodes	lrz-cpu	12	18 / 28 / 38 / 94	min. 360 GB	--	--