High-Level Support Projects

The Astro-Lab engages in code modernization projects fulfilling the specifications for high-level support in the Gauss Centre for Supercomputing.

Hover pointer over images to show descriptions.

Recent Highlights

DPEcho: General Relativity with SYCL for the 2020s and beyond

AstroLab contact: Salvatore Cielo
Application partners:  
Alexander Pöppl (Intel Corporation, Munich), Luca Del Zanna (Università degli Studi di Firenze), Matteo Bugli (Università degli Studi di Torino)

Numerical sciences are experiencing a renaissance thanks to GPUs and heterogeneous computing, which open for simulations a quantitatively and qualitatively larger class of problems, albeit at the cost of code complexity. The SYCL programming language offers a standard approach to heterogeneity that is scalable, portable, and open.

After ECHO-3DHPC, code for General-Relativistic Magneto-Hydrodynamycs (GR-MHD), written in Fortran with hybrid parallelism via MPI+OpenMP, here we introduce DPEcho, the MPI+SYCL porting of Echo, used to model instabilities, turbulence, propagation of waves, stellar winds and magnetospheres, and astrophysical processes around Black Holes. It supports classic and relativistic MHD, both in Minkowski or any coded GR metric.  The public version of DPEcho is available on GitHub under an Apache II license. 
DPEcho revolves around SYCL device-centric constructs (parallel_for, Unified Shared Memory, ...) in order to minimize data transfer and use the devices at best.

Usage of USM in DPEcho: dynamic allocation on device memory for direct access
// Initialize the SYCL device and queue
sycl::gpu_selector sDev;  sycl::queue qDev(sDev);

// Allocate main variables with USM
double *v[VAR_NUM], *f[VAR_NUM], [...];
for (int i=0; i < VAR_NUM; ++i) {
  // Primitives and fluxes for each variable
  v[i] = malloc_device<double>(grid.numCells, qDev);
  f[i] = malloc_device<double>(grid.numCells, qDev);

We present a scaling comparison of DPEcho versus the baseline ECHO, running the same problem setup. We first conduct a weak scaling test on the Intel Xeon 8174 nodes of SuperMUC-NG at the Leibniz-Rechenzentrum (left). We observe essentially flawless scaling up to 16 nodes, and performance up to 4x higher than the baseline version. The reduced memory footprint of DPEcho allowed the placement of 3843 cells per MPI task, instead of 1923. The inclusion of the Intel® Iris® Xe MAX Graphics (right) increases performance up to 7x, surpassing even non-accelerated HPC hardware. This is a testament to the superior portability and efficiency of SYCL code, making the most efficient use of all classes of hardware.

More resources on DPEcho:

Modernization of the Gasoline code (2021-2022)

AstroLab contact: Jonathan Coles, Salvatore Cielo
Application partner:  Aura Oberja (Universitäts-Sternwarte München, LMU), Tobias Buck (Leibniz-Institut für Astrophysik Potsdam)
Project partner: Christoph Pospiech (Lenovo)

Gasoline2 is a smoother particle hydrodynamics code, built on top of the N-body code PKDGRAV. It is used for cosmological galaxy formation simulation, such as the NIHAO project (Wang et al. 2015). The follow up project, NIHAO2 -- which was granted a total of 20 Mio-CPU hours on SuperMUC-NG at LRZ --, aims to construct a large library of high resolution galaxies at redshift (𝑧) > 3, covering a mass range from massive dwarfs to high-𝑧 quasar host galaxies, while also improving on thermal balance, radiation and chemical enrichment physics. These simulations will help to answer some of the open questions in galaxy formation and to interpret new observational measurements from the James Webb Space Telescope, and the future Athena X-ray Observatory.

The modernization project of Gasoline2_LAC (Gasoline 2 with Local Photoionization Feedback (Obreja et al. 2019), Amiga halo finder for black hole seeding and accretion feedback (Blank et al. 2019) and Chemical enrichment (Buck et al. 2021)) had three initial goals: 1) ensure that the code compiles and runs without errors, 2) lower the memory footprint, and 3) improve the code scaling. After starting the project, we also decided to work on improving the accuracy of the radiation fields needed by the Local Photoionization Feedback. As the radiation fluxes were computing on the tree at the same time with the accelerations, the most natural step forward was to use a higher order (the initial code was only zero-order) for the radiation fields’ multipoles.

The speedups of all kernels involved in the optimization are shown in the left figure. The Gravity and GravTree components present the most significant speedups (20% and >80%, respectively), leading to an overall ~25% improvement.  The right figure shows a single-node scaling test with the g2.19e11 test case (halo mass ∼2 × 1011 M⊙ at redshift 0), featuring the time to solution of individual kernels. While the scaling, on this test case, deviates still early from the ideal (likely due to load imbalance), the optimized performance is still > 25% better than the initial. The figure is also a guide for for finding bottlenecks in view of future optimization projects. Due to its improved scaling, GravTree is no more the top-pressure bottleneck.

We are now able to run simulations more reliably, use more MPI ranks per node with less performance loss, and reach increased accuracy in the radiation fields thanks to the new physics modules. This opens to the possibility of running higher-resolution galaxies with the same resources, or to increase the number of galaxies we can simulate with a given computational budget, for both the LRZ allocation, and in the framework of NIHAO2.

This is measured on the g2.19e11 test, for a single node using the full 48 MPI tasks.

The full report of the optimization project is available here: Gasoline_optimization_report.pdf

 Other projects