Page tree
Skip to end of metadata
Go to start of metadata

Technology

SuperMUC-NG integrates Lenovo DSS-G for IBM Spectrum Scale (aka GPFS) as building blocks for the storage. They are used for both the long-term storage and the high performance parallel file system.

File System Characteristics

AreaPurposeTotal CapacityAggregate Bandwidth
Home

Storage for user's source, input data, and small and important result files.
Globally accessible from login and compute nodes.

256 TiB~25 GiB/s (SSD Tier)
~6 GiB/s (HDD Tier)
WorkLarge datasets that need to be kept on-disk medium or long term.
Globally accessible from login and compute nodes.
34 PiB~300 GiB/s
ScratchTemporary storage for large datasets (usually restart files, files to be pre-/postprocessed). Globally accessible from login and compute nodes.16 PiB~200 GiB/s
DSS

Data Science Storage. Long term storage for project's purposes and/or the science community. World wide access/transfer for this data via high performance WAN optimized transfer protocols, using a simple Graphical User Interface in the Web. Share data like LRZ Sync+Share, Dropbox or Google Drive.

20 PiB~70 GiB/s
Node-local/tmp on login and compute nodes. Resides in memory on compute nodes. Locally accessible only. Please do not use paths to this area explicitly (e.g. in scripts). TMPDIR (see below) can be used and will automatically be set to an appropriate value.Small. A completely filled /tmp causes the node to become unusable. Therefore, lifetimes are short (per-job, or a few days).varies

File system access and policies

Upon login to the system or inside batch jobs, the environment module tempdir is loaded and supplies the necessary variable settings for file systems with exception of HOME.

AreaEnvironment VariablePath patternQuotaLifetime of DataData Safety/Integrity Measures
Home$HOME/dss/home/<hash>/<user>100 GB/userExpiration of all projects an
account is associated with
Replication to secondary storage plus daily backup to tape
Work$WORK_<project>$WORK_<project>In accordance with project grant1.End of specified projectNone. See section below on archiving important data.
Scratch$SCRATCH/hppfs/scratch/<hash>/<user>1 PB/user (safety measure)Usually 3-4 weeks. Execution of deletion procedure depends on file system filling.None. See section below on archiving important data.
DSS-/dss/<data-project>/<container>Per data-project and container1End of data projectper-container policy.
Regarding backup to tape archive: 
NONE, BACKUP_WEEKLY, BACKUP_DAILY
(costs may arise for the user!)
temporary$TMPDIRdepends on availability of file systems, usually a subfolder of SCRATCH. /tmp is only used as a last measure cop-out.depends on target file systemdepends on target file system.depends on target file system.
1 Supplied value can be increased upon request. Please contact the Service Desk.

File system usage

Data  Transfer from  SuperMUC to SuperMUC-NG

Users must organize the transfer of data from SuperMUC Phase 2 to SuperMUC-NG, see Data Migration from SuperMUC to SuperMUC-NG.

User's responsibility for saving important data

Having (parallel) filesystems of several tens of petabyte, it is technically impossible (or too expensive) to backup these data automatically. Although the disks are protected by RAID mechanisms, other severe incidents might destroy the data. In most cases however, it is the user himself who incidently deletes or overwrites files. Therefore it is within the responsibility of the user to transfer data to more safe/secondary places and/or to archive them to tapes. Due to the long off-line times for dump and restoring of data, LRZ might not be able to recover data from any type of file outage/inconsistency of the SCRATCH or WORK filesystems. The alias name WORK and the intended storage period until the end of your project should not be misguided as an indication for the data safeness!

There is no automatic backup for SCRATCH and WORK. Beside automatic deletion, severe technical problems might destroy your data. It is your obligation to copy, transfer, or archive the files you want to keep!

Data after the end of project

Data on disk and in the tape archive will be deleted one year after the end of the project. However, for the data in the tape archive, the project manager can request that the project is converted into a data-only project to gain further access to the archived data. Additionally, the project manager is warned by email after the project end that the data will be deleted.

Limitations and advantages of the parallel file system

The WORK and SCRATCH systems are tuned for high bandwidth, but it is not optimal for handling large quantities of small files located in a single directory with parallel accesses. In particular, generating more than ca. 1000 files per directory at approximately the same time from either a parallel program or from simultaneously running jobs will probably cause your application(s) to experience I/O errors (due to timeouts) and crashes. If you require this usage pattern, please generate a directory hierarchy with at most a few hundred files per subdirectory.

  • see also: Optimal usage of the High Performance Parallel Files System

Temporary filesystems

Please use the environment variable $SCRATCH to access the temporary file systems. This variable points to the location where the underlying file system will deliver optimal IO-Performance. Do not use /tmp or $TMPDIR for storing temporary files! The file system where /tmp resides in memory is very small. Files will be regularly deleted by automatic procedures or sysadmins.

Coping with high watermark deletion in $SCRATCH

The high watermark deletion mechanism may remove files which are only a few days old if the file system is used heavily. In order to cope with this situation, please note:

  • The normal tar -x command preserves the modification time of the original file and not the time when the archive has been unpacked. Therefore, files which have been unpacked from an older archive are one of the first candidates to be deleted. To prevent this, use tar -xm to unpack your files, which will give them the actual date.
  • Please use the TSM system to archive/retrieve files from/to SCRATCH to/from the tape archive.
  • Please always use $WORK or $SCRATCH for files which are considerably larger than 1 GB.
  • Please remove any files which are not needed any more as soon as possible. The high watermark deletion procedure is then less likely to be triggered.
  • More information about the filling of the file systems and about the oldest files will be made available on a web site in the near future.

Selecting the $WORK directory

Each project on SuperMUC-NG has a separate WORK directory with a shared quota for all users in this project. Users can select a specific WORK directory by applying the appropriate projectID e.g.,

export WORK=$WORK_<project>     in scripts or setting it in their .profile.

A colon seperated list of all WORK directories a user has access to is stored in the environment variable

echo $WORK_LIST

Sharing files with other users

Backup and Archive

For using the TSM tape archiving and backup it is necessary to login to the archiving nodes:   skx-arch.supermuc.lrz.de
The regular login nodes do not support TSM usage. Conversely, the archiving nodes should not be used for any purpose than TSM data handling.

Transferring files from/to other systems

Please see the appropriate subsection in the login document for a description.

Quotas

To see your quota please issue the following command since the normal quota-command will not work on the High Performance Parallel Files Sstems.

budget_and_quota

Parallel copy and rsync

Sometime it is necessary to copy or sync large amount (TBytes) of data for example from SCRATCH to WORK. Hint: use msrync, prsync or pexec to distribute the work onto more than one process or onto many cores.

Examples:

module load lrztools

#use 96 tasks on one node
msrsync -p 96 $SCRATCH/mydata $WORK/RESULTS/Experiment1

#use all processes within a parallel job
# generate the commands, make the directory structure, copy the data
prsync -f $SCRATCH/mydata -t WORK/RESULTS/Experiment1
source $HOME/.lrz_parallel_rsync/MKDIR
mpiexec -n 256 pexec $HOME/.lrz_parallel_rsync/RSYNCS

# exectue many copies in parallel
cat copylist
cp -r $SCRATCH/mydata/Exp1 $WORK/RESULTS
cp -r $SCRATCH/mydata/Exp2 $WORK/RESULTS
...
cp -r r $SCRATCH/mydata/Exp2000 $WORK/RESULTS
mpiexec -n 256 pexec copylist

Conversion of a SuperMUC project into a Data-Only Project (after project end)

Data in the tape archive will be deleted one year after the project end if the project is not converted into a data only project. However, the project manager can request that the project is converted into a data-only project to have further access to the archived data. The project manager is warned by email after the project end that the data will be deleted.

On request, it is possible  to convert a SuperMUC project into a Data-Only project. Within such a Data-Only project the project manager is able to further retain and access the data once archived on tape, thus using the tape archive as a safe and reliable long term storage for the data generated by an SuperMUC project.

Data can than be accessed via the gateway node "tsmgw.abs.lrz.de" using the SuperMUC username and password of the project manager. Access to the server is possible via SSH with no restricitons on the IP address. However, access to SuperMUC itself is not possible after the end of a project. Currently, the server is equipped with a 37 TB local disk storage (/tsmtrans) to buffer the data retrieved from tape. There is a directory /tsmtrans/<username> where you can store the data and transfer them via scp.

The project manager can access all data of the project that are stored in the tape archive, but it is necessary to use the -fromowner=otheruser flag for data which was not archived by him/herself but another project member. Also, the password for accessing the tape archive (TSM Node) is not stored on the gateway node and must be set and remembered by the project manager.  

  • When a SuperMUC project ends, the project manager will receive a reminder E-Mail, explaining the steps necessary to convert the project.

Further information

  • No labels