File Systems and IO on Linux-Cluster

This document gives an overview of background storage systems available on the LRZ Linux Cluster. Usage, special tools and policies are discussed.

Disk resources and file system layout

The following table gives an overview of the available file system resources on the Linux Clusters.

Recommendation: LRZ has defined an environment variable $SCRATCH which should be used as a base path for reading/writing large scratch files. Since the target of $SCRATCH may change over time, it is recommended to use this variable instead of hard-coded paths.

PurposeSegment
of the Linux
Cluster
File system type
and full name
How the user should access the filesSpace AvailableApprox. aggregated
bandwidth
Backup by LRZLifetime and deletion strategy.
Remarks

Globally accessible Home and Project Directories

User's Home Directories

all

GPFS
/dss/dsshome1/lxc##/<user>

$HOME

100 GByte
by default
per user

up to a few GB/s

YES, backup to tape and file system snapshots

Expiration of LRZ project.
DSS quotas apply.

DSS
Data Science Storage
long-term storage

all

GPFS
see text section below for further details

interactively type: dssusrinfo all

view and select an appropriate directory and defined your own variable (e.g. WORK) in your (e.g. ~/.profile or ~/.bashrc)

up to 10 TByte without additional costup to a few GB/sNO

Temporary/scratch File Systems

(Legacy) Scratch file system

all, except CoolMUC-4 and Teramem

GPFS
/gpfs/scratch/<group>/<user>

$SCRATCH

1,400 TByte

up to ~30 GB/s on CooLMUC2 (aggregate)

up to ~8 GB/s on CooLMUC3 (aggregate)

NO

Sliding window file deletion.
No guarantee for data integrity.

 

(New) Scratch file system

all, except CoolMUC-2 compute nodes

GPFS
/dss/lxclscratch/##/<user>

$SCRATCH_DSS

3,100 TByte

up to ~60 GB/s (aggregate)

up to ~8 GB/s on CoolMUC3 (aggregate)

NO

Sliding window file deletion.
No guarantee for data integrity.

Node-local File Systems (please do not use!)

Node-local temporary user data

all

local disks, if available
/tmp


8-200 GByte

approx.
30 MB/s for diskfull nodes

NO

Compute nodes:  Job duration only.
Files should be deleted by user job script at the end of a job.
Login Nodes:
files are removed if necessary.

Backup and Archiving

User's responsibility for saving important data

Having (parallel) filesystems of several hundreds of Terabytes (DSS, $SCRATCH), it is technically impossible (or too expensive) to backup these data automatically. Although the disks are protected by RAID mechanisms, other severe incidents might destroy the data. In most cases however, it is the user himself who incidently deletes or overwrites files. Therefore it is within the responsibility of the user to transfer data to more safe places (e.g. $HOME) and to archive them to tapes. Due to the long off-line times for dump and restoring of data, LRZ might not be able to recover data from any type of file outage/inconsistency of the scratch or DSS filesystems. A specified lifetime for a file system until the end of your project should not be misguided as an indication for the safeness of data stored there!

LRZ had to discontinue the old Linux Cluster tape archive because of security concerns. 

Snapshots

The data in your home directories is protected by nightly file system snapshots, which are kept for at most 7 days. In order to access theses snapshots, look into the directory /dss/dsshome1/.snapshots/. In this directory you'll find the individual snapshots as subdirectories, which have the date and time at which the snapshot was taken encoded as YYYY-MM-DD_HHMM in their directory name. In order to restore files you can simply copy them back to your HOME directory.


Details on the usage and on the configuration of the file systems

DSS long-term storage

LRZ uses Data Science Storage (DSS) based systems for the purpose of long-term data storage. In conjunction with this, LRZ has transferred management rights and obligations for these storage areas to the data curator, an additional role typically taken on by the master user of your project. For projects that use basic DSS storage services on the cluster, LRZ retains certain management rights to be able to provide these services.

In order to use DSS storage on the Linux-Cluster, the following steps need to be performed:

  1. On any cluster login node, issue the command
    dssusrinfo all
    This will list paths to accessible containers, as well as quota information etc. If no such container exists, please continue with step 2; otherwise, go to step 5.
  2. Please verify in the LRZ IDM-Portal section "Self Services | Person | view" that your user data contain a valid e-mail address, either for an LRZ mail service on a personal account, or as contact e-mail address. Otherwise, please ask your Master User to register a contact e-mail address for you in IDM-Portal.
  3. Open a ticket with the LRZ Service Desk against the service "High Performance Computing → Linux Cluster" with a request to set up a DSS storage area for the project your cluster account belongs to, and the required capacity (at most 10 TBytes).
  4. If your request is granted, and a new DSS area is created, you will receive an e-mail to the address specified above. Please reply appropriately to it to activate your DSS share.
  5. Edit your shell profile and set the PROJECT and/or WORK variable to a suitable path value based on the above output, typically one of the DSS paths with your account name appended to it. These settings can subsequently be used in any login shell or batch script.

Notes:

  • The DSS long-term storage is not automatically available after Linux-Cluster activation. The master user can apply for the storage via predefined Service Request Template.  
  • If the current capacity is smaller than 10 TByte, the data curator can ask for a quota increase up to the maximum value via predefined Service Request Template.
  • For larger capacities and/or containers that are automatically backed up at a regular basis (which cannot be provided free of cost), you need to contact your master user to ask LRZ for a quote.
  • Due to your involvement in multiple projects, the dssusrinfo output may refer to more than one DSS container. It is your responsibility to appropriately store data where they belong, and perform the necessary bookkeeping.
  • Depending on the system used and the usage pattern, it may be appropriate to stage in/out data to/from the SCRATCH file system before/after performing large scale processing. It is permissible to perform the necessary copy or rsync operations on the cluster login nodes.

Metadata on SCRATCH and DSS directories

While for both scratch and project directories the metadata performance (i.e., performance for generating, accessing and deleting directories and files) is improved compared to previously used technologies, the capacity for metadata (e.g., number of file entries in a directory) is limited. Therefore, please do not generate extremely large numbers of very small files in these areas; instead, try to aggregate into larger files and write data into these e.g. via direct access. Violation of this rule can lead to LRZ blocking your access to the $SCRATCH or DSS area since otherwise user operation on the cluster may be obstructed. Please also note that there exists a per-directory limit  for storing i-node metadata (directory entries and file names); this limits the number of files which can be put into a single directory.

File deletion strategies and data integrity issues

To prevent overflow of the large scale storage areas, LRZ has implemented various deletion strategies. Please note that

  • for a given file or directory, the exact time of deletion is unpredictable!
  • the normal tar -x command preserves the modification time of the original file instead of the time when the archive is unpacked. So unpacked files may become one of the first candidates for deletion. Use tar -mx if required, or perform touchon a file or
    find mydir -exec touch {} \;
    on a directory tree mydir.

Due to the deletion strategies described in the subsections below, but also due to the fact that LRZ cannot guarantee the same level of data integrity for the high performance file system as compared to e.g., $HOME, LRZ urges you to copy, transfer or archive your files from temporary disks as well as from the DSS areas to safe storage/tape areas!

  • High Watermark Deletion: When the filling of the file system exceeds some limit (typically between 80% and 90%), files will be deleted starting with the oldest and largest files until a filling of between 60% and 75% is reached. The precise values may vary.
  • Sliding window file deletion: Any files and directories older than typically 30 days (the interval may be shortened if the fill-up rate becomes very high) are removed from the disk area. This deletion mechanism is invoked once a day.

World-Wide data access and transfer

For easy access and transfer of data to/from the LRZ Linux Cluster DSS based file systems, HOME and the (new) scratch filesystem, you can use the Globus Research Data Management PortalThis allows you to easily transfer data world wide, using a protocol which is optimised for high speed transfer via wide area networks (WAN).

For details on how to use Globus Online, check out this documentation

Please make sure to log in to Globus, using your LRZ Linux Cluster user ID (Search for Leibniz Rechenzentrum in the list of available Institutions) and use the Globus Collection: Leibniz Supercomputing Centre's DSS - CILogon to access the data.