How to run jobs with large RAM

The node teramem1 is a single node with 6TB main memory. It is part of the normal Linuxcluster infrastructure at LRZ which means that users can access their $HOME and $PROJECT directories as on every other node in the cluster. However, its mode of operation slightly differs from the remaining cluster nodes which can only be used in batch mode. As the teramem1 is the only system at LRZ, which can currently satisfy memory requirements beyond 1TB in a single node, users can choose between using the system in batch or interactive mode depending on their specific needs. Both options are described below.

Interactive SLURM shell

An interactive SLURM shells can be generated to execute tasks on the new multi-terabyte HP DL580 system "teramem". The following procedure can be used on one of the login nodes of CooLMUC2:

module load salloc_conf/teramem
salloc --cpus-per-task=32 --mem=2000000
srun ./my_shared_memory_program.exe

The above commands execute the binary "my_shared_memory_program.exe" using 32 threads and up to 2 TBytes of memory (the units are MBytes). Additional tuning and resource settings (e.g. OpenMP environment variables) can be explicitly performed before executing the srun command. Please note that the target system currently (still) uses the NAS-based SCRATCH area (as opposed to the GPFS based area available on CooLMUC2). Please note that the DL580 can also be used by script-driven jobs (see the examples document linked below).

Batch SLURM script

Shared memory job on HP DL 580 "teramem1"

(using 32 logical cores. Note that this system is targeted not for best performance, but for high memory usage)

#!/bin/bash
#SBATCH -o /home/hpc/.../.../myjob.%j.%N.out
#SBATCH -D /home/hpc/.../.../mydir
#SBATCH -J Jobname 
#SBATCH --get-user-env
#SBATCH --clusters=inter
#SBATCH --partition=teramem_inter
#SBATCH --mem=2600000mb
#SBATCH --cpus-per-task=32
#SBATCH --mail-type=end
#SBATCH --mail-user=xyz@xyz.de
#SBATCH --export=NONE
#SBATCH --time=18:00:00 
 
source /etc/profile.d/modules.sh
cd mydir
export OMP_NUM_THREADS=32
./myprog.exe

How to store large amounts of data at LRZ

See Data Science Storage

Outdated: How to run R programs on Linux Cluster

Preliminaries

R is a dynamic language for numerical computing and graphics with a strong affinity to statistics. R is available as Free Software under the terms of the GNU General Public License (GPL). It compiles and runs on a wide variety of UNIX and Linux platforms, Microsoft Windows and MacOS. R is a fully featured programming language and much of the system itself is written in R. Advanced users can link and call C, C++ and Fortran code at run time. R has its roots in statistics, but its extensibility, ease-of-use and powerful graphics makes it ideal for users looking for a fast, easy and robust environment for data analysis and numerics. R can easily be extended with more than 1,700 additional packages available through the Internet that can be installed with the command install.package (“name”) and then loaded with the function library (name).

R is a statistics package and was developed as a free successor to the S and Splus languages. It is probably a bit harder to learn than other statistics tools but once you are used to the functional programming approach of R it gives you great flexibility. You can accomplish complex tasks with just very few commands and produce publication quality hardcopy output. It also allows you to add functionality and automate processes. R is available on all the most common platforms. At LRZ several different version are installed.

An important remark about the equal sign in R: x<-1 is equivalent to x=1 but the latter should be used wherever optional parameters are expected like in function calls. Use the arrow form for explicit calculations and the equal sign form for definitions.

Availability and starting R interactively

R is available on the LRZ-Linux cluster and on the HLRB-II. It can be used interactively or in batch-mode. For using R interactively log into one of the interactive nodes and type:

> module load R/serial

for loading R

> R

for starting R.
For starting a different version than the default version at LRZ use the command

> module avail R

which prints out a list of all available versions. At the moment the following versions are available for the different architectures:
On all architectures the default version is R-2.13 (now Aug 2013)
Also available are 2.10, 2.11, 2.12, 2.14, 2.15 and 3.0

Please note that the compatibility versions will be removed sooner or later. We intend to provide only two versions:

Plain vanilla R
R compiled with mpicc

For example, to load the mpi version of R on SuperMUC issue the following command:

> module load R/parallel/2.13

In order to run R using mpi on 4 cores you have to copy a special .Rprofile to your current directory (please see details on separate page) and start R using the mpi environment (interactively):

> mpirun -np 4 R --no-save -q

R is then started on 4 cores and returns awaiting user input. Be aware that mpi-R is not using the readlines library and you will not be able to edit the command line as used in the vanilla R environment. A possible workaround might be running mpi-R in an emacs shell. (see more on separate page)

Short example (reading data and visualisation)

In most cases you will have some data that you would like to read and analyse lateron. The most straightforward way for loading data into R is reading from a text file. The file 'measurements.txt' contains tab separated data columns (of performance measurements for the Itanium2 processor). (It is also possible to read data in other formats or reading from a database; please refer to the R documentation for further information).

Having started R in the directory where your datafile resides you can read the file into a so called data-frame 'measurements' by using the read function:

> measurements <- read.table("measurements.txt", header=TRUE)

It is possible to inspect the contents of the data-frame and all other data objects by simply entering the respective name at the R prompt:

> measurements

The measurements contain data for different sampling intervals which are given in one column. It is common that the available data needs to be grouped by the contents of one column; this can be achieved very conveniently with using factors; in the following a factor is created from the sampling interval 'stime':

> stimef <- factor(measurements["stime"][,1])

Then a boxplot containing a separate box for each sampling interval giving the variability of the measurements for that sampling interval can be created by:

> plot(measurements["FP_OPS_RETIRED"][,1]/measurements["stime"][,1]/1.0E+9 ~ stimef, 
    main="variability of 100 samples (5 min. sampling interval)",  
    xlab="sampling length [s]", ylab = "[GFlop/s]")

Data can be prepared for hardcopy in a variety of formats (like e.g. postcript, pdf, png,...). For creating a png-file, first set up the graphics device:

> png("boxplot_GFlop.png", width=600, height=400)

Then you have to perform the plot command(s) that you would like to have output to the file you entered. Finally switch off the png device for writing the data and closing the output file:

> dev.list()
X11 PNG
  2   3
> dev.off(3)

Now you should find a file 'boxplot_GFlop.png' in the current directory.

The above is only a short example for giving you a feeling what R is like. You can find further information in the references given below. References: The first address for further information is the homepage of the R-project. Another useful source of information might be theWiki of the R-project.

If you have any questions, suggestions or would be interested in additional packages to be installed on the machines, please feel free to submit a trouble ticket.

R: MPI Extension

This R package allow you to create R programs which run cooperatively in parallel across multiple machines, or multiple CPUs on one machine, to accomplish a goal more quickly than running a single program on one machine.

How to use Rmpi

Rmpi is an implementation of R which runs across multiple processors, possibly on multiple machines, using the MPI programming model. Please note that this page is not a tutuorial on how to write Rmpi scripts, just a short example of how to run an Rmpi job at LRZ. Like any R job, you must load the R module before using Rmpi.

module load R/parallel/2.13

The script runs an Rmpi job saved in a file called "myjob_mpi.R". The R file looks like this:

# Tell all slaves to return a message identifying themselves
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
# Tell all slaves to close down, and exit the program
mpi.close.Rslaves()
mpi.quit()

MPI Job on SLURM

For batch jobs you have to write a qsub script which has to be customized for the different batch schedulers. On the linux cluster a SLURM script for a serial job might look as follows:

#!/bin/bash
#SBATCH -o myjob.%j.%N.out
#SBATCH -D /home/cluster/...
#SBATCH -J Rmpi_test
#SBATCH --get-user-env
#SBATCH --clusters uv3
#SBATCH --nodes=1-1
#SBATCH --cpus-per-task=32
#SBATCH --ntasks=32
#SBATCH --mail-user=user@lrz.de
#SBATCH --export=NONE
#SBATCH --time=01:00:00

source /etc/profile.d/modules.sh
module load R/parallel/2.13

srun_ps R -f myjob_mpi.R

MPI Job on SuperMUC

#!/bin/bash
#@ job_type = parallel
#@ class = general
#@ node = 2
#@ tasks_per_node = 16
#@ initialdir = /home/hpc/...
#@ output = job$(jobid).out
#@ error = job$(jobid).err
#@ wall_clock_limit = 5:00:00
#@ queue

. /etc/profile.d/modules.sh
module load R/parallel/2.13

poe R -f myjob_mpi.R

The output of the job looks something like this:

R version 2.13.1 (2008-02-08)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0 
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to...
.
.
. 
master (rank 0, comm 1) of size 8 is running on: compB051
slave1 (rank 1, comm 1) of size 8 is running on: compB051
slave2 (rank 2, comm 1) of size 8 is running on: compB051
slave3 (rank 3, comm 1) of size 8 is running on: compB051
slave4 (rank 4, comm 1) of size 8 is running on: compB053
slave5 (rank 5, comm 1) of size 8 is running on: compB053
slave6 (rank 6, comm 1) of size 8 is running on: compB053
slave7 (rank 7, comm 1) of size 8 is running on: compB053
> # Tell all slaves to return a message identifying themselves
> mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
$slave1
[1] "I am 1 of 8"
$slave2
[1] "I am 2 of 8"
$slave3
[1] "I am 3 of 8"
$slave4
[1] "I am 4 of 8"
$slave5
[1] "I am 5 of 8"
$slave6
[1] "I am 6 of 8"
$slave7
[1] "I am 7 of 8"
>
> # Tell all slaves to close down, and exit the program
> mpi.close.Rslaves()
[1] 1
> mpi.quit()