Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

AstroLab contact: Luigi Iapichino, Salvatore Cielo
Application partner:  Oliver Porth (Uni Amsterdam), Hector Olivares (Radboud Univ.)
Project partner: Fabio Baruffa (Intel), Anupam Karmakar (Lenovo)

BHAC (the Black Hole Accretion Code) is a multidimensional general relativistic magnetohydrodynamics code based on the MPI-AMRVAC framework. BHAC solves the equations of ideal general relativistic magnetohydrodynamics in one, two or three dimensions on arbitrary stationary space-times, using an efficient block based approach.

BHAC has proven efficient scaling up to ~200 compute node in pure MPI. The hybrid parallelization scheme with OpenMP introduced by the developers could successfully extend scaling up to ~1600 nodes, at the cost of a drop in efficiency to 40% or 65% (with 4 or 8 threads per node, respectively). The main goal for the modernization of the code is thus to check, profile and optimize the OpenMP implementation and node-level performance, including vectorization and memory access.

Early tests reveal revealed a rather high degree of vectorization, though with some room for improvement as the code contains a mix of compute-bound and memory-bound (the slight majority)  kernels. The node performance vs MPI/OMP ratio confirms a degrading for increasing OpenMP threads, which needs further investigation.

Image Removed

 
VTune’s threading analysis exposed an OpenMP imbalance,  addressed  by adding dynamic scheduling to the loops with significant workload. This yielded an average performance improvement of ∼ 5%. Yet the same analysis identified the main bottleneck as large serial code blocks in the ghost-cell exchange. Restructuring of the code allowed to fully OpenMP-parallelize this section of the code, which led to an average performance increase of 27% compared to the initial code.

At the end of the project, the hybrid implementation is capable to efficiently utilize over 30 000 cores, allowing to study large scale problems. The improvements made through in the AstroLab project are already merged into the staging branch of BHAC and will become part of the next public release.

Image Added

The full report of the optimization project is available here: AstroLab_BHAC.pdf


(grey lightbulb) Optimization of the  GReX code (2019-2020)

AstroLab contact: Luigi Iapichino, Salvatore Cielo
Application partners: Elias Most, Jens Papenfort (Institute for Theoretical Physics, Univ. Frankfurt)
Project partner: Fabio Baruffa (Intel)

...

The GReX code has  successfully run simulations of neutron star mergers up to 32 k cores on SuperMUC-NG, while taking large advantage of the SIMD capabilities of modern CPUs.  Scaling to extreme core counts (> 100k) is however required to leverage on the power of the coming Exascale Supercomputers. During the optimization project with the Astro-Lab, we identified the main bottlenecks in load balancing and communication when using a large number of AMR grids with many cells. After a detailed characterization through Application Performance Snapshot and Intel Trace Analyzer and Collector,  we performed several node-level scaling tests and investigated the effect of local grid tiling on the performance up to 256 nodes.  However the best results were obtained by an by rewriting the way the code exchanges ghost cells around grids via MPI (see full report below)

Thus we were able to achieve a scaling close to ideal up to about 50 k cores during the LRZ Extreme Scaling Workshop. The performance degrades significanlty around 70 k cores, however succesful runs with stil satisfactory perfomance were now possible up to about 150 k cores. The reason for this loss is not yet completely understood, but it is likely associated with a different load-balancing at intermediate core counts for the specific problem setup, and will be ivestigated in the future. Both parties were very satisfied with the achieved optimization and look forward to further collaborations.


GReX magnetic amplification in AMR neutron star merging simulationImage Modified.       GReX single node scaling and balanceImage Removed

View file
nameScaling_grex.pdf
height400

The full report of the optimization project is available here: AstroLab_GReX_report.pdf


(grey lightbulb) Finalized projects

Expand
titleECHO-3DHPC: Advancing the performance of astrophysics simulations (2018)

AstroLab contact: Luigi Iapichino
Application partner: Matteo Bugli (CEA Saclay, France)
Project partner: Fabio Baruffa (Intel)

In this project we improved the parallelization scheme of ECHO-3DHPC, an efficient astrophysical code used in the modelling of relativistic plasmas. With the help of the Intel Software Development Tools, like Fortran compiler and Profile-Guided Optimization (PGO), Intel MPI library, VTune Amplifier and Inspector we have investigated the performance issues and improved the application scalability and the time to solution. The node-level performance is improved by 2.3x and, thanks to the improved threading parallelisation, the hybrid MPI-OpenMP version of the code outperforms the MPI-only, thus lowering the MPI communication overhead. 

Parallel speed-up at node level (OpenMP-only) for the baseline and optimized code versions.

More details: see article on  Intel Parallel Universe Magazine 34, p. 49.
ArXiv version here.

...