ECHO-3DHPC: Advancing the performance of astrophysics simulations
AstroLab contact: Luigi Iapichino
Application partner: Matteo Bugli (CEA Saclay, France)
Project partner: Fabio Baruffa (Intel)
In this project we improved the parallelization scheme of ECHO-3DHPC, an efficient astrophysical code used in the modelling of relativistic plasmas. With the help of the Intel Software Development Tools, like Fortran compiler and Profile-Guided Optimization (PGO), Intel MPI library, VTune Amplifier and Inspector we have investigated the performance issues and improved the application scalability and the time to solution. The node-level performance is improved by 2.3x and, thanks to the improved threading parallelisation, the hybrid MPI-OpenMP version of the code outperforms the MPI-only, thus lowering the MPI communication overhead.
Parallel speed-up at node level (OpenMP-only) for the baseline and optimized code versions.
The Black Hole Accretion Code (BHAC) (2018)
AstroLab contact: Nicolay Hammer
Application partners: Oliver Porth, Yosuke Mizuno, Elias Most, Ludwig Papenfort, Hector Olivares, Lukas Weih (Institute for Theoretical Physics, Univ. Frankfurt)
Project partner: Florian Merz (Lenovo)
The LRZ AstroLab support project for BHAC targeted the modernisation of the code's solver scheme towards a task-based algorithm. A series of profiling (using Intel VTune Amplifier) and analysis steps before and after the implementation of the tasking revealed further bottlenecks in the memory access pattern. These performance issues were tackled by refactoring important loops of the source code during a dedicated hackathon/workshop for the BHAC developers, held on-site at LRZ. These activities were completed by further efforts which tackled parallel I/O challenges.
More details on BHAC here.
Optimizing the TARDIS parallel performance (2016/17)
AstroLab contacts: Vytautas Jančauskas, Stephan Hachinger
Application Partner: Wolfgang Kerzendorf (ESO, Garching near Munich)
TARDIS is a numerical code for Monte-Carlo simulations of supernova spectrum formation. It serves as a tool to analyse the conditions generating the spectra of observed supernovae, i.e. to trace back the supernova structure from the observations.
With this ADVISOR 2016 project, our aim was to obtain an overview of possible performance bottlenecks (and of discovery strategies for bottlenecks in the future), and first steps towards an optimised TARDIS code. The code is largely parallelised with OpenMP, and job-farming techniques (e.g. with MPI) are used to perform ensemble runs on larger machines.
In the context of ADVISOR 2016 and LRZ Astro-Lab, a comprehensive standard profiling/scaling test of the TARDIS code was performed. The profiling brought out no obvious, easy-to-resolve bottlenecks or problems, but was very valuable for planning alogrithmic and conceptual improvements (e.g. an improved convergence strategy) for TARDIS v2.0.