R on HPC Systems

What is R?

R is a free and open source programming language and software environment for statistical computing and graphics (see https://www.r-project.org/).

Getting Started

Different release versions of R are installed on (most) LRZ HPC systems. All of them are configured with X11 support (for plotting and displaying graphics) and specific versions linked to Intel's Math Kernel Library (MKL) are available.

All versions can be accessed by using the environment module system (see Environment Modules). To search for all available versions use:

$ module available r

Please note: going forward, all modules providing (the latest) versions of R follow a unified naming scheme and R is represented by a lowercase letter, e.g. r/3.6.3-gcc8-mkl (see Environment Modules)

To load/activate a specific version of R, use the full module name, e.g.:

$ module load r/3.6.3-gcc8-mkl

Afterwards, the R interpreter can be started using:

$ R

R Package Management

Any (additional) R packages have to be installed into package libraries. These are (just) directories on the file system with subdirectories for each installed package. The installations of R provided by the module system contain only a standard set of R packages. On multiuser systems, regular users cannot add/install packages directly into this default system library.

Individual users can create (one or more) additional user libraries. If no suitable one exists, R will prompt you to define a user library when installing packages for the first time.

On GNU/Linux-based systems (most) "add-on" R packages will be compiled from source. For this to work, required compilers, tools and additional dependencies must be available on the system. For best compatibility use the same compiler that has been used for building R itself when installing additional packages. This is indicated by a corresponding suffix in the module name. Currently, this is typically a fairly recent version of the GNU Compiler Collection (GCC), e.g. GCC 8 for r/3.6.3-gcc8-mkl. Make sure to properly set up your environment before starting R and installing add-on packages:

$ module rm intel-mpi intel
$ module load gcc

Currently, Intel MPI and Compiler are automatically unloaded, and gcc loaded, when you load the r module. But please, check whether this automatism works!

Example: Installation of "sf" package

The R packages to be installed are in many cases relying on external dependencies. Unfortunately, there is no uniform scheme or mechanism as to how these dependencies are to be provided to the R package build-system.
As an example, we use the popular package "sf", for which several dependencies are required (see https://github.com/r-spatial/sf#linux but note that even this is incomplete). Additional missing dependencies will be revealed by error messages during the build process.

$ module use /lrz/sys/share/modules/extfiles/spack_modules/22.2.1/linux-sles15-haswell
$ module load r/4.1.2-gcc11-mkl gdal proj geos sqlite udunits
$ module save my_r_env
$ export LIBRARY_PATH=$LIBRARY_PATH:$UDUNITS_BASE/lib:$SQLITE_BASE/lib
$ export CPATH=$CPATH:$UDUNITS_BASE/include:$SQLITE_BASE/include
$ R
> install.packages("sf")

Explanations:

1) The primary approach for making required dependencies available should be via Environment Modules (just like making R itself available on the system). In the present case, the extfiles module path is added to the module search path (module use ...) because while udunits is installed in our software stack, a module is usually not provided. For your convenience, it has been added to the extfiles module path.
It is possible for individual users to create additional custom module files from our installed software stack or even install additional software/dependencies (and their modules) by using the Spack package manager, please see the user_spack documentation.

3) module save ... creates a so-called module collection - that's a name for the currently loaded set of modules. Freely chose a name for the collection. One can later conveniently restore this environment, as is shown below.

4) udunits and sqlite are still not found during the build process with only the modules loaded. This would not happen if these libraries were installed in system paths. However, the LRZ user software stack is not installed in default system locations. Therefore, even loading the modules, the configure tools of the R package build system may not find them. In order to address this, there are several options. Using install.packages() within R allows for extra parameters, which can be used to provide e.g. the information on where the build should look for headers and libraries. Here, however, we show a slightly different approach. Modules provide environment variables named <module name>_BASE, which contain the install location of the software referenced by the module. We use these to set the CPATH (include) and LIBRARY_PATH (lib or lib64) variables (colon separated paths). The GCC compiler (C/C++) uses these to search for headers and libraries. This is only needed for the initial build process of R packages!

Subsequent use of the package requires the necessary modules to be loaded, potentially via module collection:

$ module restore my_r_env  # refer to collection defined above
$ R
> library("sf")

Of course, module restore ... can also be used in Slurm scripts. This should then be done prior to loading any other modules.

Support and Additional Ressources

In case of any issues with the usage of R on LRZ HPC Systems or any arising questions, please feel free to contact the LRZ Servicedesk

Documentation covering different aspects of working with R can be found on the official project site: https://cran.r-project.org/manuals.html