99. AI Systems Announcements
Latest Announcement
Maintenance 2024-03 Changelog
Previous Announcements
Maintenance 2024-02 Changelog
Various system components have been updated during the maintenance procedure on July 1st-3rd, 2024:
Added:
- Recent additions of HGX-based nodes with A100 GPUs have been finalized (
lrz-hgx-a100-80x4
andmcml-hgx-a100-80x4
partitions) - Some additional CPU resources have been made available as part of the
lrz-cpu
partition
Changed:
- The Enroot container runtime has been updated to release 3.5.0
- The web-based frontend, Open OnDemand, has been updated to version 3.1.7
- Jupyter Notebook / JupyterLab container images have been updated
- RStudio Server container images have been updated and provide new R / RStudio Server versions
- The operating system kernel and packages of all AI Systems nodes as well as the Nvidia drivers and GPFS storage applications have been been updated to recent point releases providing stability and security fixes
Removed
- RStudio Server container images with R versions prior to 4.4.0 have been removed; if absolutely necessary, these can still be provided and used as custom container images
Maintenance 2024-01 Changelog
Multiple system components have been updated and there are various user-facing changes that were introduced during the maintenance procedure on March 11th-14th, 2024:
Breaking:
enroot start
currently cannot be used directly with a sqsh container image. Instead, it requires an existing container. The following commands show an example of how to create a container and useenroot start
:Alternatively, use the Pyxisenroot import <container-tag> # when importing from a registry; skip if local image file is available enroot create --name <container-name> <image-file> # -n; this step may have been skipped previously enroot start <container-name>
--container-image
option when usingsrun
or in the preamble of your batch script (for additional details see Removed section below).
Added:
- A "Globus" button has been added to the file manager application of the web-based frontend and provides direct access to the active directory within the Globus research management portal. This allows for improved file management and data transfer capabilities (for further details see Using DSS world wide via Globus Online)
Changed:
- The operating system of all AI Systems nodes has been updated to Ubuntu 22.04 LTS / Nvidia DGX OS 6; various DGX firmware components have been updated
- The Nvidia drivers have been updated to version R535
- The login infrastructure has been reworked and fully virtualized to provide increased stability, redundancy and future-proof flexibility
- The web-based frontend, Open OnDemand, has been updated to release 3.1.1
- Jupyter Notebook / JupyterLab container images have been updated and provide new PyTorch and TensorFlow versions
- RStudio Server container images have been updated and provide new R / RStudio Server versions (older versions have been thinned out)
Removed:
- Due to a bug in Ubuntu 22.04's fuse-overlay package, it had to be removed. This breaks the possibility to start container images directly without the need to create containers first, as had become possible in recent Kernel versions (see Enroot's documentation and Breaking section above). We are exploring and evaluating various options for a future course of action.
Maintenance 2023-04 Changelog
Various system components have been updated during the maintenance procedure on December 4th-6th, 2023:
- General OS updates and system firmware updates
- Slurm Workload Manager has been updated to release 23.02.6
- The web-based frontend, Open OnDemand, has been updated to release 3.0.3
Maintenance 2023-03 Changelog
The following list of user-facing changes was introduced during the maintenance procedure between July 24th and 25th, 2023:
- The primary address of the web-based frontend has been changed. Use login.ai.lrz.de for all connections to the LRZ AI Systems. All previous addresses may still be functional, but are going to be removed in the future (deprecation notice).
- The available CPU options for interactive applications in the web-based frontend have been adjusted for some cases/usage combinations.
- Jupyter Notebook/JupyterLab container images have been updated and provide a new PyTorch version.
Maintenance 2023-02 Changelog
The following list of user-facing changes was introduced during the maintenance procedure between June 5th and 7th, 2023:
- The primary address of the SSH login node has been changed. Use login.ai.lrz.de for all SSH connections to the LRZ AI Systems. The previous address is not functional anymore (see deprecation notice below).
- The NVIDIA drivers have been updated to version R525 for full compatibility with the recently released CUDA 12
- The software component providing the web-based frontend, Open OnDemand, has been updated to release 3.0.1
Maintenance 2023-01 Changelog
Please note the following list of user-facing changes introduced during the maintenance procedure between March 13th and 15th, 2023:
- The primary address of the SSH login node has been changed. Going forward, please use login.ai.lrz.de for all SSH connections to the LRZ AI Systems. The previous address is still functional, but will be removed in the future (deprecation notice).
- TensorBoard has been added as new application to the available web servers of https://datalab3.srv.lrz.de
Maintenance 2022-04 Changelog
The resource selection for the OnDemand-based interactive apps (Jupyter Notebook, JupyterLab, RStudio Server) has been updated and unified. It does now allow for the allocation of single GPUs (in addition to combinations of CPU cores and RAM size) with all these front ends.
Reminder October 2022
The previous LRZ AI Systems home directories, accessible from the LRZ AI Systems login nodes under /home/<lrz-account> (read-only since the latest maintenance), have been decommissioned by 2022-10-31.
Maintenance 2022-03 Changelog
Please note the following user-facing changes to the LRZ AI Systems, which took effect during the latest maintenance:
Most importantly, availability of storage options on the LRZ AI (and MCML) Systems changed. The previous home directories have been superseded by the default Linux Cluster home directories. The very same files and data can now be accessed in the default home directories directly after login, irrespective of using a Linux Cluster or AI Systems login node, i.e. the LRZ AI Systems and LRZ Linux Cluster now provide unified home directories.
- The previous LRZ AI Systems home directories are still accessible from the LRZ AI Systems login nodes under /home/<lrz-account> (read-only). These directories will be decommissioned by 2022-10-31, so make sure to copy your files into the new unified home directories as soon as possible!
- In addition, the full offer of Data Science Storage (DSS) systems and containers can now directly be accessed from all LRZ AI Systems login and compute nodes.
- Use the command
dssusrinfo all
on the login nodes to get an overview of all individually accessible DSS containers and their utilization.
For further details see Storage on the LRZ AI Systems