Programming Models - Examples

Introduction

This page is supposed to illustrate some of the possible programming models on Intel PVC GPUs. All examples were created and tested with Intel HPC Toolkit 2025.2.0 (intel-toolkit/2025.2.0 from stack/24.5.0).

The recommended way to use them at the moment is therefore

> module sw stack/24.5.0
> module load intel-toolkit/2025.2.0

The examples are only small numerical or computational physics applications - most probably not very efficiently implemented. The goal is only to illustrate the technical approach of GPU, multi-GPU, and multi-node multi-GPU programming and deployment.

OpenMP only - single-node multi-GPU

MPI Offload + (OpenMP/SYCL/DPC++/Kokkos) - multi-node multi-GPU


Learning Material

OpenMP

  • Timothy G. Mattson, Beverly A. Sanders, Berna Massingill, "Patterns for Parallel Programming", 2004
  • Ruud Van Der Pas, Eric Stotzer, Christian Terboven, "Using OpenMP-The Next Step: Affinity, Accelerators, Tasking, and SIMD", 2017
  • Timothy G. Mattson, Yun (Helen) He, Alice E. Koniges, "The OpenMP Common Core: Making OpenMP Simple Again", 2019
  • Tom Deakin, Timothy G. Mattson, "Programming Your GPU with OpenMP: Performance Portability for GPUs", 2023

SYCL/DPC++

Data Parallel C++ - Programming Accelerated Systems Using C++ and SYCL (book; open access)

Kokkos

Kokkos Lecture Series (videos, slides)