Julia for HPC

Julia is, similar to Python and R, a very user oriented, flexible tool for computing and software development. But it appears very easy to be installed and adapted by users. Out of this reason, we do not provide a central julia installation. Still, a lot of questions appear about the usage of Julia on SuperMUC-NG. These are tried to get answered here. If something is missing, please open a service request on our Service Desk!

Generally, the Julia Docu is a good point to obtain help!

Getting started

> module av julia 
----------- /lrz/sys/share/modules/files_sles15/tools -------------
julia/1.5.4  julia/1.6.5_lts  julia/1.7.0  julia/1.7.1  julia/1.7.2 
> module show julia/1.7.2

module-whatis   {Julia: Programming Framework}
> module load julia/1.7.2

We provide some basic modules centrally such as ClusterManagers, MPI, Plots and BenchmarkingTools. If more modules are centrally needed, please open a Service Request. Generally, however, users can load/install own packages (even project specific if needed). So, we try to minimize the number of centrally installed packages in order to avoid version requirement clashes. That's specifically true as Julia allows for a rather complex and dynamic package handling for each project, up to even including complete conda environments.

Installing Julia

The easiest way to get your own Julia into operation is via the download of the binary packages from the Julia Download Site.

Use a SCP client of your choice to upload the downloaded file to your SuperMUC-NG account (scp, pscp, WinSCP, Filezilla, ...).

> tar xf julia-1.4.1-linux-x86_64.tar.gz
> echo 'export PATH=$PATH:$HOME/julia-1.4.1/bin' >> ~/.profile && source ~/.profile    # or add bin otherwise to your search path, e.g. via alias! 
> julia
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.4.1 (2020-04-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |


Please modify accordingly if you have a different install path! Newer versions should work the same way.

That's it. You can go on!

Installing/Updating Julia Packages (SuperMUC-NG-specific)

As with Python and R, also Julia offers a dynamic package development, deployment and managing concept, which usually requires a direct access to the Internet. SuperMUC-NG does not offer this feature. So, you must use a workaround. The same local Python HTTP proxy - reverse SSH tunnel method, as described for Conda, also works here.
This procedure is only necessary when you want to add packages! Usual execution of Julia code does not require any further package download, or SSH tunnel, or Python HTTP Proxy!

Preparation (local PC/Laptop)

Setup a local HTTP Proxy using Python:

local> pip3 install --upgrade --user proxy.py
local> ~/.local/bin/proxy --port 3128                   # the port number is arbitrary, but you must remember it!

Next, open a SSH connection to SuperMUC-NG, with a reverse tunnel:

local> ssh -l <your-lrz-account-id> -R 3128:localhost:3128 skx.supermuc.lrz.de     # left port number must agree with that of the HTTP proxy above!

Preparation (SuperMUC-NG)

In a next step, create the file ~/.julia/config/startup.jl and insert the two lines

ENV["HTTP_PROXY"] = "localhost:3128"
ENV["HTTPS_PROXY"] = "localhost:3128"

Take care to use the same port number as on the right of the SSH reverse tunnel above!

Package Installation

Next, you should simply start julia on the SuperMUC-NG login node and press ']' key to switch to the package manager. It should be as simple as issueing add Plots, for instance.

Unresolved Issue: The initial update will fail. So, also the installation of any package! The error is probably something like:
        ERROR: failed to clone from https://github.com/JuliaRegistries/General.git, error: GitError(Code:ERROR, Class:Net, unrecognized URL prefix)
This is related to the download of the Julia registry (which is quite large, btw.). This must be done manually, too.

> mkdir -p .julia/registries/ && cd .julia/registries/
~/.julia/registries> export HTTPS_PROXY=localhost:3128 && export HTTP_PROXY=localhost:3128     # see the HTTP proxy ports above
~/.julia/registries> git clone https://github.com/JuliaRegistries/General.git
~/.julia/registries> cd General && rm -rf .git .githup && cd
> julia
julia>                                                                                         # ]
(@v1.4) pkg> update
(@v1.4) pkg> add Plots

Caution: Installation of a package may still fail at first attempt. Just retry! Also look on the output of the HTTP Proxy for error messages! It might be necessary to set export TMPDIR=$SCRATCH as the default temp folder might be to small.

Recommendation: The Julia package manager allows you to precompile your packages (also your own). Think also about whether, it is a good idea to pin the package versions.

Recommendations for Plotting

If plotting is seriously necessary in an interactive fashion on the login nodes, we recommend the use of VNC (and not X-forwarding via SSH). An overview of how to plot in Julia can be found here, for instance.

Julia comes with some plotting backends. Not all of them might work out of the box. GR, for instance, resulted in errors  – see here, if you meet this problem! The solution in short – Try this:

     ENV["GRDIR"] = ""
     using Pkg

Switching to another backend might help, too.

Parallel / Slurm

Julia exhibits several parallel paradigms – for shared memory (single node, many/multi core) and distributed (MPI). A non-comprehensive overview can you get here. The official documentation also has a chapter on parallel computing, which but does not say much about the integration into the LRZ cluster systems. This will be attempted here.

There are several parallelization paradigms available in Julia – some more explicit, others more implicit (similar as to differences in explicit thread or MPI programming, and implicit parallel library usage as in co-array Fortran). Which one meets your needs, you have to decide.

Shared Memory - OpenMP

The shared memory parallelism should work out of the box. This corresponds to to working on a single node.
Also, some packages react on OMP_NUM_THREADS, as is the case for e.g. the LinearAlgebra package.

> export OMP_NUM_THREADS=10
> julia linalg_test.jl
  1.554187 seconds (2.88 M allocations: 327.748 MiB, 2.48% gc time)
  0.805609 seconds (2 allocations: 190.735 MiB, 7.28% gc time)
  0.813139 seconds (2 allocations: 190.735 MiB, 8.40% gc time)
  0.746598 seconds (2 allocations: 190.735 MiB)
n = 5000;
A = randn(n,n)
B = randn(n,n)
C = zeros(n,n)
using LinearAlgebra
for i = 1:4
   @time C=A*B

It is advisable to check the scaling behavior as a function of OpenMP threads! The linear algebra package also contains solvers for matrix equations (please check also SparseArrays)

Distributed Example - the Worker Concept with SSH Manager

julia can be started by the option -p <# of threads>, which spawn a number of threads that can be used as workers. This works only on a single node, where the name of the host does not matter. For more than one node, you must use the option --machine-file, and specify a hostname list (one name per line).

#SBATCH --nodes=2
#SBATCH --tasks-per-node=14
#SBATCH --export=NONE
module load slurm_setup
mpiexec hostname -s | sort | awk '{printf("%sopa\n",$1)}' > mpi_hostfile
julia --machine-file mpi_hostfile Distributed.jl

using Distributed
@everywhere function showid()
    println("My ID ",myid()," from ",gethostname())
@everywhere showid()
println("------- more or less the same as -------------------------")
println(" # workers: ", workers())
println(" # nprocs: ", nprocs())
fa = Array{Future}(undef, nprocs())
for i in 1:nprocs()
    fa[i] = @spawnat i showid()
for i in 1:nprocs()

This spawns 14 workers per node according to the machine file + 1 on the local machine!.
SSH is used for this spawning. So, you have to create a passphrase-less SSH key in your ~/.ssh, and put the public key into ~/.ssh/authorized_keys.

The exercise above to add opa to each host name is in order to use the OmniPath network (on CoolMUC-2 it would be ib). Without, you get the ethernet maintenance network on SuperMUC-NG. On the Linux cluster (also on the housed systems), a similar problem exists. Please, consult the respective documentation for these systems!

Distributed Example - the Worker Concept with Slurm Manager

This example was taken from the ClusterManagers documentation page. Please have a look for more details!

#SBATCH --nodes=2
#SBATCH --tasks-per-node=1
module load slurm_setup

julia Distributed.jl
using Distributed, ClusterManagers

# Arguments to the Slurm srun(1) command can be given as keyword
# arguments to addprocs.  The argument name and value is translated to
# a srun(1) command line argument as follows:
# 1) If the length of the argument is 1 => "-arg value",
#    e.g. t="0:1:0" => "-t 0:1:0"
# 2) If the length of the argument is > 1 => "--arg=value"
#    e.g. time="0:1:0" => "--time=0:1:0"
# 3) If the value is the empty string, it becomes a flag value,
#    e.g. exclusive="" => "--exclusive"
# 4) If the argument contains "_", they are replaced with "-",
#    e.g. mem_per_cpu=100 => "--mem-per-cpu=100"

addprocs(SlurmManager(parse(Int,ENV["SLURM_NTASKS"])), exclusive="")

hosts = []
pids = []
for i in workers()
  host, pid = fetch(@spawnat i (gethostname(), getpid()))
  fetch(@spawnat i (println("Hello from $host and PID $pid")))
  push!(hosts, host)
  push!(pids, pid)

# The Slurm resource allocation is released when all the workers have
# exited
for i in workers()

In this case, no SSH keys are needed. Users are responsible for doing task/thread placement/pinning correctly via the addprocs' or addprocs_slurm's kwargs options. Doing this wrongly, can impair efficient use of the resources, and, in worst cases, cause the abort of the Slurm job.

MPI Example

This example of explicit MPI communication requires the installation of the MPI package. You can install it with the setting export JULIA_MPI_PATH=$I_MPI_ROOT, where $I_MPI_ROOT is set by the Intel MPI module, which needs to be loaded before.

#SBATCH --nodes=2
#SBATCH --tasks-per-node=28
#SBATCH --export=NONE
module load slurm_setup
mpiexec julia -- MPI-hello.jl
using MPI
println("Hello world, I am $(MPI.Comm_rank(comm)) of $(MPI.Comm_size(comm))")

More Examples and Example Use Cases

As of now there is hardly any experience with Julia at the LRZ. We would be happy to include here user feedback and examples, in order to share the experience with other – possibly new – users.

Use caseScripts
Julia Slurm Standalone Script with a Worker Placement Test
#SBATCH -D .             # execute in current sbatch submit folder
[...]                    # sbatch job settings: cluster, mail, export=NONE, -o log.%N.%x.%j.out, -J ...
#SBATCH --nodes=<???>
#SBATCH --tasks-per-node=<???>

## Debugging - print all environment variables
#for (key,val) in ENV
#  println("$key=$val")

# include LRZ provided Julia modules and depot
# module load slurm_setup ... if needed
  delete!(ENV, "SLURM_EXPORT_ENV")
  if parse(Int,ENV["SLURM_NNODES"]) <= 128
catch err
  println(" [Error] Could not set load slurm_setup, $err !")

# debugging: check load and depot paths
println("DEPOT_PATH  === ",Base.DEPOT_PATH)
println("LOAD_PATH  === ",Base.LOAD_PATH)

# start parallel workers (workers start with myid 2!!)
using Distributed, ClusterManagers
# set edditional srun parameters if needed!!
addprocs(SlurmManager(parse(Int,ENV["SLURM_NTASKS"])), exclusive="", mpi="none")

@everywhere syscall(x) = ccall(:syscall, Int64, (Int32,), (x))
# placement test
@everywhere function placementtest()
  hostname   = gethostname()
  pid        = getpid()
  nofthreads = Threads.nthreads()
  threadid   = Threads.threadid()
  tid        = TID = syscall(186)
  cpuid      = -1  #   /proc/$pid/stat [39]
  open("/proc/$pid/task/$tid/stat") do file
    line  = readline(file)
    cpuid = split(line)[39]
  return hostname, pid, tid, nofthreads, threadid, cpuid, myid()

# spawning of tasks
tasks = []
for i in workers()
  push!(tasks,fetch(@spawnat i placementtest()))

# collect and print all information
for t in tasks
  hostname, pid, tid, nofthreads, threadid, cpuid, myid = fetch(t)
  println("host:",hostname," PID:",pid," TID:",tid," #Threads:",nofthreads," Julia-Thread-ID:",threadid," MyID:",myid," CPU-ID:",cpuid)

# remove all workers
for i in workers()

Check carefully the Shebang, and the LOAD_PATH and DEPOT_PATH settings! You can use your own Julia installation instead.
Also, slurm_setup might change on the system. As module is not available inside a julia environment, so is not the module-system as a whole. You will have to manually set paths and environment variables, if you want to use these modules. Otherweise, use a bash-script as Slurm script and start Julia as process therein, as described above.

Simple Task Farming Scheduler (Job Farm)
using Distributed, ClusterManagers

# setup OpenMP if cpus-per-task set

# create workers specified by Slurm
addprocs(SlurmManager(parse(Int,ENV["SLURM_NTASKS"])), exclusive="")

# debug info
@everywhere println("Worker ",myid()," on host ",gethostname())

@everywhere using Dates
@everywhere function do_work(x)   #  , i::Int64) Task ID + command
  myID = myid()
  outfile = open("$x","a")
  println(outfile,"Task ID $x on worker $myID")
  starttime = Dates.now()
  println(outfile,"** START = $x = $myID = $starttime")
    line = open(readlines, "taskdb.txt")[x]
    println(outfile,"command: $line")
    cmd = split(line)
    write(outfile,read(`$cmd`))                  ## <- Task executed
    endtime = Dates.now()
    elapsed = (endtime - starttime).value/60000  ## in minutes
    println(outfile,"** STOP SUCCESS = $x = $myID = $endtime = $elapsed")
  catch err
    println(outfile,"** STOP FAILED = $x = $myID = ", Dates.now())

# check for already executed tasks and remove from TODO list (bookkeeping)
task_ID_list = []
for x in 1:countlines("taskdb.txt")
  if isfile("$x")
    if length(filter(line -> occursin(r"\*\* STOP ",line),readlines(open("$x")))) > 0

pmap(do_work, task_ID_list)                      ## <- schedule and handle tasks

for i in workers()

As written, one can start this script with one parameter (the task database) via julia jobfarmer.jl inside a Slurm job,  where taskdb.txt contains a simple bash command at each line (no empty lines!).
For more complex tasks, they must be wrapped into bash, python, R, ... scripts.

Instead of the external command execution, you can also directly execute functions inside the code (also python or R code, as julia allows for that e.g. via pyCall and the like). The bookkeeping then needs to be different, of course!

OpenMP parallel tasks are also possible. The Julia worker placement is controlled via --nodes, --ntasks-per-node and --cpus-per-task of the surrounding Slurm script.


If something goes wrong – specifically with the package manager – you can always start from scratch by removing ~/.julia.

Documentation and Education

[Youtube] Parallel Computing and Scientific Machine Learning

[Lauwens, Downey] Think Julia: How to Think Like a Computer Scientist

Learn Julia in Y Minutes

A Deep Introduction to Julia for Data Science and Scientific Computing

[julialang] Tutorials

[SciML Tutorials] [DiffEq SciML Tutorials]