NVIDIA NGC offers a catalogue of containers covering a broad spectrum of software packages (see 4.2 Enroot - Images from Nvidia NGC). These containers supply the CUDA Toolkit, cuDNN libraries, and NVIDIA dependencies and it recommended to use containers from NGC. However, it is also possible to use containers from a different container registry or catalogue, which may not be optimized for Nvidia GPUs.

No matter where your container image comes from, your workload might depend on a package not provided by any image. This guide describes how to create a new custom Enroot container image by extending an existing one.

GPU Enable Container

In general, the CUDA driver should always be installed on the host, while the CUDA toolkit can either be installed on the host or within the container. Therefore, there are two ways to enable GPU support inside a container:

Installing the NVIDIA Container Toolkit within the container: This approach is documented by NVIDIA in their official guide here.
Passing the CUDA toolkit and driver utilities from the host to the container: For simplicity, we focus on this pass-through method, which only requires setting a few environment variables inside the container.

The following environment variables control how CUDA is exposed to the container (see here for more information):

NVIDIA_DRIVER_CAPABILITIES – specifies which driver features the container should access (e.g., compute, utility, video, or graphics).
NVIDIA_VISIBLE_DEVICES – determines which GPUs are visible to the container.

After identifying the variables and their values according to your needs, add them to /etc/environment inside your container.

The following example shows a typical configuration:

echo "NVIDIA_DRIVER_CAPABILITIES=compute,utility" >> /etc/environment
echo "NVIDIA_VISIBLE_DEVICES=all" >> /etc/environment

Avoiding Conflicts from Preinstalled Libraries

The presence of Nvidia libraries within a container image might produce crashes. If you are not using a container from NGC, be sure your image does not include:

The CUDA toolkit library
The Nvidia libcontainer toolkit library (libnvidia-container)

Let the Enroot runtime add these required libraries to your containers.

Custom Image Example

This example is for teaching purposes only and does not represent a real use case. The described functionality is already provided by NGC containers.

Start an Interactive Session

First, create an interactive allocation. Since Enroot is only available on the compute nodes, you need to start a Slurm interactive session using the following command. For more details, see 5.1 Slurm Interactive Jobs:

salloc -p lrz-v100x2 --gres=gpu:1
srun --pty bash

Import a Base Image

Import the Ubuntu base image. This command will return an error if ubuntu.sqsh already exists.

enroot import docker://ubuntu
enroot create ubuntu.sqsh

GPU Enable the Container

If you need to modify the container contents during runtime (e.g., install software, change configurations), you must start the container with the --rw option to enable write access. Additionally, if these modifications require administrative privileges, use the --root option to run with root permissions inside the container. However, on the AI Systems the --rw option is set per default in the config file: enroot.conf.

Now, pass the CUDA driver and CUDA toolkit from the host to the container by setting the following environment variables when starting it:

enroot start --root --rw --env NVIDIA_DRIVER_CAPABILITIES=compute,utility --env NVIDIA_VISIBLE_DEVICES=all ubuntu

To make the environment persistent in the final exported Enroot image, add the variables to the /etc/environment file:

echo "NVIDIA_DRIVER_CAPABILITIES=compute,utility" >> /etc/environment
echo "NVIDIA_VISIBLE_DEVICES=all" >> /etc/environment

Install Software

In the following steps, we will install packages in the container using the distribution’s package manager, create a Python virtual environment, and add a small snippet to the container’s .bashrc file to automatically activate the virtual environment by default.

Install packages using the apt package manager:

apt-get update && apt-get install -y build-essential cmake && exit

Persist the Changes

Finally, exit the container and export it.

enroot export -o ubuntu-cuda.sqsh ubuntu && enroot create ubuntu-cuda.sqsh