Page tree
Skip to end of metadata
Go to start of metadata

Darshan is a scalable HPC I/O characterization tool that is designed to capture an accurate picture of application I/O behavior with a minimum overhead. This includes properties such as patterns of access within files, number of I/O operations, size of operations, etc.

Darshan on LRZ platforms

Darshan is enabled to trace MPI applications’ dynamic executables on SuperMUC Phase 1-2 and Linux Clusters. It is enabled to trace applications that use IBM MPI and Intel MPI.

Enabling Darshan to trace MPI applications

To make use of Darshan, please load the appropriate module. Currently, the default version is 3.1.4 but it is possible to use the last version 3.1.6 as test module.

module load darshan

Set up the variable FORTRAN_PROG in “true” if your program is a Fortran program and false is not is it.

FORTRAN_PROG=true

Upload the appropriate library.

export LD_PRELOAD=`darshan-user.sh $FORTRAN_PROG`

Set up the Darshan job identifier with Loadleveler job identifier in SuperMUC Phase 1-2 and set up the environment variable DARSHAN_JOBID to the environment variable name that contain the job identifier of Loadleveler.

export JOBID_LL=`darshan-JOBID.sh $LOADL_STEP_ID`
export DARSHAN_JOBID=JOBID_LL

Two last steps are not needed in Linux-Cluster because the SLURM job identifier is recognized by Darshan automatically.

Darshan is configured to can select the log path by the environment variable LOGPATH_DARSHAN_LRZ. We recomend you to use the $SCRACTH for Darshan logs. Using the script "darshan-logpath.sh" logs are written to the $SCRATCH/.darshan folder.

export LOGPATH_DARSHAN_LRZ=`darshan-logpath.sh`

Examples for job command files for the SuperMUC

This example is for a Fortran program compiled with IBM MPI.

#!/bin/bash
#
#@ job_type = parallel
#@ class = general
#@ node = 256 
#@ total_tasks = 4096 
#@ island_count = 1
#@ wall_clock_limit = 01:00:00
#@ job_name = btio4KpC
#@ network.MPI = sn_all,not_shared,us
#@ initialdir = $(home)/
#@ output = btio4KpC$(jobid).out
#@ error = btio4KpC$(jobid).err
#@ notification = always
#@ notify_user = user@yyy.zz
#@ queue
#@ hpm = yes
. /etc/profile
. /etc/profile.d/modules.sh
module load mpi.ibm
module load darshan
######### Darshan Variable ########
FORTRAN_PROG=true
export LD_PRELOAD=`darshan-user.sh $FORTRAN_PROG`
export JOBID_LL=`darshan-JOBID.sh $LOADL_STEP_ID`
export DARSHAN_JOBID=JOBID_LL
export LOGPATH_DARSHAN_LRZ=`darshan-logpath.sh`
######### End Darshan Variable ########

cd $SCRATCH/btio-Darshan
poe $HOME/NPB3.3/NPB3.3-MPI/bin/bt.E.4096.mpi_io_full

Extracting the I/O characterization

If the program finishes correctly a log file is located in:

$SCRATCH/.darshan-logs/<USERNAME>_<BINARY_NAME>_<JOBID_LL>_<DATE>_<UNIQUE_ID>_<TIMING>.darshan

If it requires a detail of the I/O counters and the I/O performance in a text file you can use the following command.

darshan-parser $SCRATCH/.darshan-logs/<USERNAME>_<BINARY_NAME>_<JOBID_LL>_<DATE>_<UNIQUE_ID>_<TIMING>.darshan > 
$SCRATCH/.darshan-logs/<USERNAME>_<BINARY_NAME>_<JOBID_LL>_<DATE>_<UNIQUE_ID>_<TIMING>.txt

Utility darshan-parser provides full I/O information of the performance and operations.

darshan-parser --perf file.darshan
....
# performance
# -----------
# total_bytes: 171567505272
#
# I/O timing for unique files (seconds):
# ...........................
# unique files: slowest_rank_io_time: 0.000000
# unique files: slowest_rank_meta_time: 0.000000
# unique files: slowest rank: 0
#
# I/O timing for shared files (seconds):
# (multiple estimates shown; time_by_slowest is generally the most accurate)
# ...........................
# shared files: time_by_cumul_io_only: 98.509203
# shared files: time_by_cumul_meta_only: 9.550778
# shared files: time_by_open: 98.500785
# shared files: time_by_open_lastio: 98.317576
# shared files: time_by_slowest: 98.532686
#
# Aggregate performance, including both shared and unique files (MiB/s):
# (multiple estimates shown; agg_perf_by_slowest is generally the most accurate)
# ...........................
# agg_perf_by_cumul: 1660.956731
# agg_perf_by_open: 1661.098669
# agg_perf_by_open_lastio: 1664.194038
# agg_perf_by_slowest: 1660.560872

Option --perf provides output related with performance metrics I/O timing and aggregate bandwidth. Metrics for shared files are reported when all the processes of the parallel application perform I/O. Unique file metrics are reported when each MPI process accesses to a file or a subset of MPI processes accesses to a file.

Aggregate bandwidth agg_perf_by_slowest is the most accurate for shared and unique files. For the I/O timing generally the most accurate is slowest_rank_io_time for unique files and the time_by_slowest for shared files.

Furthermore, timing for metadata, read and write operations can be obtained through the counters CP_F_POSIX_META_TIME, CP_F_POSIX_READ_TIME, CP_F_POSIX_WRITE_TIME at posix level and counters CP_F_MPI_META_TIME, CP_F_MPI_READ_TIME, CP_F_MPI_WRITE_TIME at MPI level.

Documentation

Please refer to Darshan Web Site for more information about the meaning of I/O counters, other utilities of Darshan, and static tracing.

  • No labels