SuperMUC-NG integrates Lenovo DSS-G for IBM Spectrum Scale (aka GPFS) as building blocks for the storage. They are used for both the long-term storage and the high performance parallel file system.
File System Characteristics
|Area||Purpose||Total Capacity||Aggregate Bandwidth|
Storage for user's source, input data, and small and important result files.
|256 TiB||~25 GiB/s (SSD Tier)|
~6 GiB/s (HDD Tier)
|Work||Large datasets that need to be kept on-disk medium or long term. |
Globally accessible from login and compute nodes.
|34 PiB||~300 GiB/s|
|Scratch||Temporary storage for large datasets (usually restart files, files to be pre-/postprocessed). Globally accessible from login and compute nodes.||16 PiB||~200 GiB/s|
Data Science Storage. Long term storage for project's purposes and/or the science community. World wide access/transfer for this data via high performance WAN optimized transfer protocols, using a simple Graphical User Interface in the Web. Share data like LRZ Sync+Share, Dropbox or Google Drive.
|20 PiB||~70 GiB/s|
|Node-local||/tmp on login and compute nodes. Resides in memory on compute nodes. Locally accessible only. Please do not use paths to this area explicitly (e.g. in scripts). TMPDIR (see below) can be used and will automatically be set to an appropriate value.||Small. A completely filled /tmp causes the node to become unusable. Therefore, lifetimes are short (per-job, or a few days).||varies|
File system access and policies
Upon login to the system or inside batch jobs, the environment module tempdir is loaded and supplies the necessary variable settings for file systems with exception of HOME.
|Area||Environment Variable||Path pattern||Quota||Lifetime of Data||Data Safety/Integrity Measures|
|Home||$HOME||/dss/home/<hash>/<user>||100 GB/user||Expiration of all projects an |
account is associated with
|Replication to secondary storage plus daily backup to tape|
|Work||$WORK_<project>||$WORK_<project>||In accordance with project grant1.||End of specified project||None. See section below on archiving important data.|
|Scratch||$SCRATCH||/hppfs/scratch/<hash>/<user>||1 PB/user (safety measure)||Usually 3-4 weeks. Execution of deletion procedure depends on file system filling.||None. See section below on archiving important data.|
|DSS||-||/dss/<data-project>/<container>||Per data-project and container1||End of data project||per-container policy. |
Regarding backup to tape archive:
NONE, BACKUP_WEEKLY, BACKUP_DAILY
(costs may arise for the user!)
|temporary||$TMPDIR||depends on availability of file systems, usually a subfolder of SCRATCH. /tmp is only used as a last measure cop-out.||depends on target file system||depends on target file system.||depends on target file system.|
|1 Supplied value can be increased upon request. Please contact the Service Desk.|
File system usage
Data Transfer from SuperMUC to SuperMUC-NG
Users must organize the transfer of data from SuperMUC Phase 2 to SuperMUC-NG, see Data Migration from SuperMUC to SuperMUC-NG.
User's responsibility for saving important data
Having (parallel) filesystems of several tens of petabyte, it is technically impossible (or too expensive) to backup these data automatically. Although the disks are protected by RAID mechanisms, other severe incidents might destroy the data. In most cases however, it is the user himself who incidently deletes or overwrites files. Therefore it is within the responsibility of the user to transfer data to more safe/secondary places and/or to archive them to tapes. Due to the long off-line times for dump and restoring of data, LRZ might not be able to recover data from any type of file outage/inconsistency of the SCRATCH or WORK filesystems. The alias name WORK and the intended storage period until the end of your project should not be misguided as an indication for the data safeness!
There is no automatic backup for SCRATCH and WORK. Beside automatic deletion, severe technical problems might destroy your data. It is your obligation to copy, transfer, or archive the files you want to keep!
Data after the end of project
Data on disk and in the tape archive will be deleted one year after the end of the project. However, for the data in the tape archive, the project manager can request that the project is converted into a data-only project to gain further access to the archived data. Additionally, the project manager is warned by email after the project end that the data will be deleted.
Limitations and advantages of the parallel file system
The WORK and SCRATCH systems are tuned for high bandwidth, but it is not optimal for handling large quantities of small files located in a single directory with parallel accesses. In particular, generating more than ca. 1000 files per directory at approximately the same time from either a parallel program or from simultaneously running jobs will probably cause your application(s) to experience I/O errors (due to timeouts) and crashes. If you require this usage pattern, please generate a directory hierarchy with at most a few hundred files per subdirectory.
- see also: Optimal usage of the High Performance Parallel Files System
Please use the environment variable $SCRATCH to access the temporary file systems. This variable points to the location where the underlying file system will deliver optimal IO-Performance. Do not use
/tmp or $TMPDIR for storing temporary files! The file system where /tmp resides in memory is very small. Files will be regularly deleted by automatic procedures or sysadmins.
Coping with high watermark deletion in $SCRATCH
The high watermark deletion mechanism may remove files which are only a few days old if the file system is used heavily. In order to cope with this situation, please note:
- The normal
tar -xcommand preserves the modification time of the original file and not the time when the archive has been unpacked. Therefore, files which have been unpacked from an older archive are one of the first candidates to be deleted. To prevent this, use
tar -xmto unpack your files, which will give them the actual date.
- Please use the TSM system to archive/retrieve files from/to SCRATCH to/from the tape archive.
- Please always use $WORK or $SCRATCH for files which are considerably larger than 1 GB.
- Please remove any files which are not needed any more as soon as possible. The high watermark deletion procedure is then less likely to be triggered.
- More information about the filling of the file systems and about the oldest files will be made available on a web site in the near future.
Selecting the $WORK directory
Each project on SuperMUC-NG has a separate WORK directory with a shared quota for all users in this project. Users can select a specific WORK directory by applying the appropriate projectID e.g.,
export WORK=$WORK_<project> in scripts or setting it in their .profile.
A colon seperated list of all WORK directories a user has access to is stored in the environment variable
Sharing files with other users
Backup and Archive
For using the TSM tape archiving and backup it is necessary to login to the archiving nodes:
The regular login nodes do not support TSM usage. Conversely, the archiving nodes should not be used for any purpose than TSM data handling.
- HPC Backup and Archiving (how to handle backup/archive and the TSM tape system)
- Optimal use of TSM for SuperMUC-NG
Transferring files from/to other systems
Please see the appropriate subsection in the login document for a description.
To see your quota please issue the following command since the normal quota-command will not work on the High Performance Parallel Files Sstems.
Parallel copy and rsync
Sometime it is necessary to copy or sync large amount (TBytes) of data for example from SCRATCH to WORK. Hint: use msrync, prsync or pexec to distribute the work onto more than one process or onto many cores.
module load lrztools
#use 96 tasks on one node
msrsync -p 96 $SCRATCH/mydata $WORK/RESULTS/Experiment1
#use all processes within a parallel job
# generate the commands, make the directory structure, copy the data
prsync -f $SCRATCH/mydata -t WORK/RESULTS/Experiment1
mpiexec -n 256 pexec $HOME/.lrz_parallel_rsync/RSYNCS
# exectue many copies in parallel
cp -r $SCRATCH/mydata/Exp1 $WORK/RESULTS
cp -r $SCRATCH/mydata/Exp2 $WORK/RESULTS
cp -r r $SCRATCH/mydata/Exp2000 $WORK/RESULTS
mpiexec -n 256 pexec copylist
Conversion of a SuperMUC project into a Data-Only Project (after project end)
Data in the tape archive will be deleted one year after the project end if the project is not converted into a data only project. However, the project manager can request that the project is converted into a data-only project to have further access to the archived data. The project manager is warned by email after the project end that the data will be deleted.
On request, it is possible to convert a SuperMUC project into a Data-Only project. Within such a Data-Only project the project manager is able to further retain and access the data once archived on tape, thus using the tape archive as a safe and reliable long term storage for the data generated by an SuperMUC project.
Data can than be accessed via the gateway node "tsmgw.abs.lrz.de" using the SuperMUC username and password of the project manager. Access to the server is possible via SSH with no restricitons on the IP address. However, access to SuperMUC itself is not possible after the end of a project. Currently, the server is equipped with a 37 TB local disk storage (/tsmtrans) to buffer the data retrieved from tape. There is a directory /tsmtrans/<username> where you can store the data and transfer them via scp.
The project manager can access all data of the project that are stored in the tape archive, but it is necessary to use the -fromowner=otheruser flag for data which was not archived by him/herself but another project member. Also, the password for accessing the tape archive (TSM Node) is not stored on the gateway node and must be set and remembered by the project manager.
When a SuperMUC project ends, the project manager will receive a reminder E-Mail, explaining the steps necessary to convert the project.