10.0 Managing R Packages in a Containerized Environment

All interactive RStudio Server sessions on the LRZ AI Systems are containerized, i.e. running inside a container environment (see https://en.wikipedia.org/wiki/OS-level_virtualization). This provides a consistent and fully reproducible environment, but relies on the state of the container image whenever the session is initialized. As such, changes made inside (but not committed to) the running container are not persistent.

Working with R (and RStudio Server), there is a common need to modify the set of installed R packages and have customized package libraries persistently available across sessions.

There are essentially two ways of making such modifications in a containerized environment:

  1. Store and retrieve data from (a persistent location) outside of the running container.
    In the case of R packages/libraries, this would mean to create a new custom library in e.g. your home directory (which is mounted inside each container session per default), install packages to this library specifically and (re-)add its path to the library tree in each new interactive session.
    To achieve this, use e.g. .libPaths() (providing the new argument, see https://stat.ethz.ch/R-manual/R-patched/library/base/html/libPaths.html) or a combination of install.packages()'s lib argument (https://stat.ethz.ch/R-manual/R-patched/library/utils/html/install.packages.html) and the lib.loc argument of library() (https://stat.ethz.ch/R-manual/R-patched/library/base/html/library.html).
  2. Alternatively, and possibly more elegantly, a container (image) can (permanently) be modified to provide any software/package you need to work with and this (modified) state can then be (re-)used to initialize future sessions.
    In a reproducible/re-usable and shareable sense, this should be the preferred method, as the container then contains the complete software environment needed and there is no reliance on external, separate storage that may (or may not) exist/contain the additionally needed data.

To get started working on option 2, you need access to an environment that has all the right tools available for dealing with containers. This could be your local machine running e.g. Docker, but - for reasons shown below - in case of the LRZ AI Systems, the obvious choice are the compute nodes of this system (which will later also be used to run the modified container).

The general idea is the following: the RStudio Server containers provided by LRZ are generally based on images of the Rocker Project (https://www.rocker-project.org/). Specifically, different versions of the "ml-verse" containers are typically used. Assuming that these specific containers provide a suitable base for your use case, the process is to take such a container image, create a running container environment of the image, modify this container and finally export a new, modified image containing all of your changes. This can then be used to create future interactive sessions. Alternatively, you could of course choose a different container to start with (or even create one from scratch), but this should not be too much of an additional hurdle, once the process is clear.

For an overview and general information on this approach see Introduction to Enroot: The Software Stack Provider for the LRZ AI Systems and Creating and Reusing a Custom Enroot Container Image

The following provides step-by-step instructions for R/RStudio Server users.

After connecting to login.ai.lrz.de (via SSH, see Access and Getting Started), create a new directory for container images, e.g.

login-x:~$ mkdir containers

and change into this directory

login-x:~$ cd containers

Then, create an interactive allocation on a node of the of the cpu partition to work with (this is typically used for CPU-only interactive web sessions; if you need GPU support, choose your target partition accordingly):

login-x:~/containers$ srun -p lrz-cpu -q cpu --pty bash
srun: job xxxxx queued and waiting for resources
srun: job xxxxx has been allocated resources
cpu-xxx:~/containers$

Once logged in to the compute node, use the Enroot container runtime (this is the tool on the system to actually interact with containers; it's an alternative to Docker) to import a (base) container from Dockerhub. As said above, we're typically using a "ml-verse" container from the Rocker Project, see https://hub.docker.com/r/rocker/ml-verse/tags for different container image versions (choose one that fits your needs).

cpu-xxx:~/containers$ enroot import docker://rocker/ml-verse:<version>

This import step  will take some time and create a file called "rocker+ml-verse+<version>.sqsh" in your current directory. This is the container image. Use ls to list the contents of the container directory.

cpu-xxx:~/containers$ ls
rocker+ml-verse+<version>.sqsh

Next, create an Enroot container based on this image. You can chose any name for the container.

cpu-xxx:~/containers$ enroot create --name <your-custom-container> rocker+ml-verse+<version>.sqsh

You can check if this creation step was successful by running enroot list, which does list all the containers on the system. The name you have chosen should appear.

Then it's time to actually start this container and make changes to it. The following command will start a bash shell inside the container (instead of starting up RStudio Server etc. which is the default behavior of this container). If successful, you should notice a slight change in the terminal prompt. In case you want to make changes to additional system components of the container, provide the --root option - this is not needed for R package installations.

cpu-xxx:~/containers$ enroot start <your-custom-container> bash
cpu-xxx:/$

Everything that happens from now on takes place inside the container and modifies the container environment. This means now is the time to e.g. interactively start R and install any packages you need (note that R would not have been available outside of the container, as it is not installed on the host system itself).

cpu-xxx:/$ R
 > install.packages(...)
 > ...
 > q()
cpu-xxx:/$

Once you have applied all the changes you want, exit the container.

cpu-xxx:/$ exit
cpu-xxx:~/containers$

Now, all that is left is to store this modified container in a persistent image again. This can then be used in the future to start new containerized sessions including the changes just made. Again, you can chose any name for this new image, just make sure to add the .sqsh file extension.

cpu-xxx:~/containers$ enroot export --output <your-custom-image>.sqsh <your-custom-container>

At the end of this process, there should now be a file called "<your-custom-image>.sqsh" inside your container directory, next to the "rocker+ml-verse+<version>.sqsh" file imported earlier (once again, use the ls command to check).

Once done, quit the interactive allocation.

cpu-xxx:~/containers$ exit
login-x:~/containers$

Now you can return to the web-interface of https://login.ai.lrz.de. For your next RStudio Server session choose the "Custom..." container option and provide the full path to the image you created. This should be something like "/dss/dsshome1/.../<account>/containers/<your-custom-image>.sqsh" Now, when your session starts, you should e.g. have all R packages installed earlier readily available.