Darshan

Darshan is a scalable HPC I/O characterization tool that is designed to capture an accurate picture of application I/O behavior with a minimum overhead. This includes properties such as patterns of access within files, number of I/O operations, size of operations, etc.

Darshan on LRZ platforms

Darshan can be used to trace MPI applications’ dynamic executables on SuperMUC-NG and the Linux Clusters. It is enabled to trace applications that use Intel MPI.

Enabling Darshan to trace MPI applications

You have to load the appropriate module depending on whether you use Intel-Compilers or GCC.

The availabe modules can be seen via

module av darshan-runtime

Here we will use GCC as an example:

module load darshan-runtime/3.3.1-gcc8-impi

The tracing is enabled by using the environment variable LD_PRELOAD:

LD_PRELOAD=$DARSHAN_LIBDIR/libdarshan.so  

Darshan uses the environment variable $DARSHAN_LOG_DIR_PATH to specify where its logfiles will be written. This variable is set as default to $SCRATCH/.darshan-logs. It is not recommended to change this variable, especially do not to let it point to $HOME.

An example script for how to use Darshan SuperMUC-NG in your SLURM-Script is given below (the way how to load the modules and the use of LD_PRELOAD is identical on CoolMUC2).

darshan_example_sng.slurm
#!/bin/bash
#SBATCH -J io_test
#SBATCH -A YOUR_PROJECT
#SBATCH -D ./
#SBATCH -o ./%x-%j-%N.out
#SBATCH -e ./%x-%j-%N.err
#SBATCH --export=NONE
#SBATCH --mail-user=YOUR_EMAIL@SOME.DOMAIN
#SBATCH --mail-type=NONE
#SBATCH --partition=test
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=48
#SBATCH --time=0:05:00


module load slurm_setup

###
module unload intel
module load gcc
###

###
module unload intel-mpi
module load intel-mpi/2019-gcc
###

###
module load darshan-runtime/3.2.1-gcc8-impi 
###

###
mpiexec -env LD_PRELOAD=$DARSHAN_LIBDIR/libdarshan.so -env DARSHAN_LOGHINTS="" NAME_OF_YOUR_BINARY_WITH_IO
###

Unsetting the variable "DARSHAN_LOGHINTS" is necessary because of some kind of incompatibility with the settings of MPI-IO in Intel-MPI and Darshan. Otherwise your programm will likely end successfully but you job will hang and the Darshan log file is not created properly.

Extracting the I/O characterization

If the program finishes correctly a log file is located in:

Darshan logfiles
$SCRATCH/.darshan-logs/<USERNAME>_<BINARY_NAME>_<SLURM_JOBID>_<DATE>_<UNIQUE_ID>_<TIMING>.darshan

Generation of a PDF summary

You can generate a PDF summary with graphs.

module load darshan-util
darshan-job-summary.pl <YOUR_DARSHAN_FILE>.darshan

Analysis on the command line (works on SNG and CM2)

On the command line, you can analyse the log file using the utility darshan-parser which provides full I/O information of the performance and operations. It has several command line options:

:~> darshan-parser --help
Usage: darshan-parser [options] <filename>
    --all   : all sub-options are enabled
    --base  : darshan log field data [default]
    --file  : total file counts
    --file-list  : per-file summaries
    --file-list-detailed  : per-file summaries with additional detail
    --perf  : derived perf data
    --total : aggregated darshan field data

If you want detail analysis of the I/O counters and the I/O performance in a text file you can use the following command.

darshan-parser <YOUR_DARSHAN_FILE>.darshan > <YOUR_DARSHAN_FILE>.txt

Documentation

Please refer to Darshan Web Site for more information about the meaning of I/O counters, other utilities of Darshan, and static tracing.