Intel VTune profiler and Application Performance Snapshots
Purpose
Intel VTune supports profiling and evaluation of performance characteristics for single- and multi-threaded programs on all Intel-based hardware platforms. It is free to use within Intel oneAPI.
Availability on LRZ's HPC platforms
VTune is provided on HPC systems which are based on Intel processors. On non-Intel processors or systems on which no kernel driver is available, only partial functionality may be available. If you encounter any difficulties with the LRZ-specific installations, please contact the LRZ Service Desk for help.
How to use
First load the relevant modules:module load oneapi
module load intel-oneapi-vtune
You can then invoke the tool either via the command line interface (command vtune) or the GUI (command vtune-gui).
The GUI allows you to build analysis projects, specify an executable as well as various parameters for execution and analysis modes. In particular, profiling of threaded programs (including scalability analysis and identification of parallelization-induced performance problems) is supported. Please consult the documentation referenced below for a description of the many options this tool offers.
Because the kernel modules for performance-counter based runs cannot be provided, only a subset of the functionality may be available via the Linux perf infrastructure.
However, since 2019 this functionality is much more accurate and comprehensive than in the past. See this (off-site) article for further details.
For data protections reasons profiling counters on the login nodes have very limited access rights. Full profiling is allowed on compute nodes (accessible via interactive or batch jobs, see Job Processing with SLURM on SuperMUC-NG and Job Processing on the Linux-Cluster).
APS
With recent releases, Intel Amplifier XE includes the Application Performance Snapshots (APS), that provides a quick overview about:
- MPI parallelism (Linux* only)
- OpenMP* parallelism
- Memory access
- FPU Utilization
- I/O efficiency
- ...
APS is included in the vtune
module, can be used whenever VTune can, but is a much lighter application, often used as the first profiling step, or for large scale runs. LRZ users are encouraged to use APS for independent profiling, especially at the beginning of a new project, or after porting to a new machine. Occasionally, LRZ may ask users to provide APS reports of their production runs.
Running APS
Initialize APS on LRZ machines by loading the oneapi
and vtune
modules as above, plus initialize aps
module load oneapi
module load intel-oneapi-vtune
export MPS_STAT_LEVEL=4
Here is some more information about the controlling the amount of collected data MPS_STAT_LEVEL
:
In a job on the compute node (interactive or batch), to run analysis for an application and store results in <dir> (e.g. within a slurm job file):
# Collection aps [--result-dir=<dir>] ./myserial.exe # Report aps-report <dir> # Creates a useful .html report that can be viewed with any browser aps-report -a <dir> # Prints all available stats to stdout
The syntax is a bit different for MPI-parallel profilings
# Collection mpiexec <mpiexec_options> aps --result-dir=<dir> ./myparallel.exe # Output dir is mandatory here # Report # The same options for non-MPI code are available. In addition: aps-report -x --format=html <dir> # Creates a communication matrix for all MPI tasks
For more advanced APS capabilites, please refer to aps and aps-report manual entries, or consult this (off-site) article.
Documentation
- Manual pages for the commands can be consulted when running with the GUI.
- Intel VTune for Linux