NVIDIA NGC offers a catalogue of containers covering a broad spectrum of software packages (see 5. Using NVIDIA NGC Containers on the LRZ AI Systems.) These containers supply the cuda toolkit, cudnn libraries, and NVIDIA dependencies. It is also possible to use containers from a different container registry or catalogue (for which the latter might not hold true.)   

No matter where your container image comes from, your workload might depend on a package not provided by that image. This guide describes how to create a new Enroot container image by extending an existing container image. The required steps depend on whether your image comes from the NGC catalogue or not. 

 Choose a base image and a target system/partition

We refer in this guide to an image of the NVIDIA NGC or other catalogues (e.g., Docker) as base image.  For example, the docker image docker://nvcr.io#nvidia/tensorflow:20.12-tf1-py3 (from NVIDIA NGC catalogue) is used in Section 3 of this guide as base image. Let us assume then we have the variable where the label of that image is stored. (e.g., BASE_IMAGE=docker://nvcr.io#nvidia/tensorflow:20.12-tf1-py3).

Choose the target system where our final custom image will be used (see 1. General Description and Resources for available target systems). For example, the partition dgx-1-p100 is used in this guide.  

Create an interactive allocation of resources on the target system. A single GPU suffices for this task.

$ salloc -p lrz-v100x2 --gres=gpu:1

Execute a terminal within the allocated machine. 

$ srun --pty bash 

If you choose an image from NVIDIA NGC catalog skip the next section and go directly to Section 3.  

Dealing with base images from other catalogues

If your chosen image does not supply the cuda toolkit, do not install it within the image. Installing the cuda toolkit yourself within the image results on fixing paths to the existing NVIDIA driver on the target machine and this might crash in production if the NVIDIA driver is upgraded. Instead, let the Enroot runtime deal with it as follows. 

First, create an Enroot container out of the chosen base image. 

$ enroot import -o image-no-cuda.sqsh $BASE_IMAGE             # creates an Enroot container image out of that docker container
$ enroot create --name my_container_first_step image-no-cuda.sqsh # creates an Enroot container named "my_container_firs_step"

Start bash within the created container. 

$ enroot start my_container_first_step bash

You must add a couple of environment variables within this container. These variables will let Enroot know you want to use cuda and the runtime will copy within the container the needed libraries (see https://github.com/NVIDIA/NVIDIA-container-runtime#environment-variables-oci-spec). Examples of the variables described on the NVIDIA documentation are the following.

NVIDIA_DRIVER_CAPABILITIES # what do you need from the driver computing, utilities, rendering? NVIDIA_REQUIRE_CUDA # what version of Cuda do you need for your application NVIDIA_VISIBLE_DEVICES # which devices should be visible for this container

Once you have figure out the needed variables and their values (this depends on what you need, check the NVIDIA documentation or get in touch with us,) add these variables into the file /etc/environment of your container. The following code block shows an example.

echo "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video" >> /etc/environment
echo "NVIDIA_REQUIRE_CUDA=cuda>=9.0" >> /etc/environment
echo "NVIDIA_VISIBLE_DEVICES=all" >> /etc/environment

Exit the container and export it as an Enroot image.

$ exit
$ enroot export --output my_temporal_container.sqsh my_container_first_step # creates an Enroot image called my_temporal_container.sqsh in the current path 

Now you have a container prepared for cuda and the Enroot runtime will rely on the added variables to add what is needed automatically within it. Go to Section 3 with BASE_IMAGE="$PWD/my_temporal_container.sqsh"

Creating an extended Enroot image

Create an Enroot container out of the base image: 

$ enroot import -o base_image.sqsh  $BASE_IMAGE       # creates an Enroot container image out of that docker container
$ enroot create --name my_container base_image.sqsh # creates an Enroot container named "my_container"

Start the created Enroot container and install any needed package (this example assumes the matplotlib python package needs to be added.) Exit the container once the packages have been added. 

$ enroot start my_container
$ pip3 install matplotlib
$ exit

Export the modified Enroot container as an Enroot container image.

$ enroot export --output my_container.sqsh my_container # creates an Enroot image called my_container.sqsh in the current path 
														# (assuming PWD=/my-path the complete path to the created image is /my-path/my-container.sqsh)

Release the allocated resources. 

$ exit

Reuse the custom image in your jobs

For reusing your custom Enroot container image, you just need to indicate that image in the --container-image option when submitting jobs (interactive or batch ones) to the target system. For example (assuming you have already an allocation on dgx-1-p100)

$ srun --pty --container-image='/my-path/my-container.sqsh' bash # will execute bash on a container created out of your custom Enroot container image
  • No labels