DSS documentation for users


1. How to get access to DSS

Access to a DSS container is only possible on invitation by the data curators of the specific DSS container. If a data curator invites you for accessing a DSS container, an invitation Mail is sent to the EMail address, associated with your user in the LRZ IdPortal. If you are in doubt that this address is correct, please log in to the LRZ IdPortal and check by viewing your account settings (Account → View) and contact your master user, if the address is wrong. 

When you receive an invitation Mail, you have to confirm the invitation by clicking on the link in the Mail and follow the instructions on the website, in order to activate the access. Please note that there may be a latency of up to an hour before the updated access rights have been propagated to all attached systems.

2. Storing and accessing data in DSS

In the following we describe the versatile ways in which you may access the data stored in a DSS container. 

2.1. Using DSS via LinuxCluster and the AI Systems

On LRZ's LinuxCluster and AI Systems the DSS file systems are directly mounted via a high performance file system client on the compute and login nodes. Performance may vary between different system partitions as not all system partitions have the same network bandwidth to DSS.

On these systems, DSS containers can be accessed via the file system path /dss/<filesystem name>/<data project>/<data container>/

2.2. Using DSS via SuperMUC-NG

For SuperMUC-NG projects, there are two special file systems (/dss/dssfs02 and /dss/dssfs03) available that are directly mounted via a high performance file system client on the compute and login nodes. See https://doku.lrz.de/display/PUBLIC/Data+Science+Storage+for+SuperMUC for details on how to request access for these file systems.

2.3. Using DSS via Compute Cloud and VMware

2.3.1. Prerequisites 

In order to access a DSS container from a LRZ Compute Cloud or VMware virtual machine, you must ask the data curator of the data project, to which the desired container belongs, to export the container to the IP address used by your VM.

Though technically not forbidden, you should only export DSS containers to IPs that are statically assigned to and trusted by you. NFS exports follow a "host based trust" semantic, which means the DSS NFS server will trust any IP/system to which a DSS container is exported. There is no additional user authentication between NFS server and client enforced. This is especially important if you want to export DSS containers to cloud machines, as these - by default - use a dynamically allocated IP, which may be reused by other machines as soon as you shut down your VM.

If you want to use NFS v4 instead of NFS v3 to mount the data container inside your VM, please make sure to configure the ID mapping daemon accordingly. The easiest thing to do is to set the DOMAIN parameter in /etc/idmapd.conf to LRZ.DE and follow the user and group setup steps described below. You may also be able to successfully setup some kind of user mapping between the local users on your VM and the users, used by DSS. However this is not covered by LRZ's support for DSS.

2.3.2. Preparing users and groups on the VM

As you may be aware, NFS user permissions are based on user ID (UID). UIDs of any users on the client must match those on the server in order for the user to have access. The typical ways of doing this are either through some kind of manual synchronisation or the use of some kind of directory service like LDAP for example.

Currently, LRZ does not allow access to its LDAP servers from customer VMs because of data privacy reasons. However, as DSS supports arbitrary users from the central TUM and LMU Identity Management Systems, you may be able to connect your VMs to the user directory of your organisation. For more information on how to do that, please contact the Servicedesk of your organisation, as this is out of scope of LRZ's support for DSS.

In order to create the users manually on your VM, you must determine the username and UIDs of the users, which are invited to the particular container and should be able to access the data via your VM. You can ask the data curators of your container, to provide you a list, mapping the usernames to UIDs.

Usually, you also have to create the particular container access groups on your VM. As normally the groups a user belongs to are also provided by the NFS client. However to work around a known limitation of the NFS protocol in handling groups (see this for more info), group membership is managed by the DSS NFS servers. So technically it is not mandatory to create the container groups on your side.

2.3.3. Mounting a DSS Container on a VM

In order to mount the container on your VM, you have to ask the data curators of your project for the IP address and path of your NFS export. Once you have this information, you should be able to mount the container on your VM. We suggest to use the following command:

your-vm:># mkdir -p /dss
your-vm:># mount -t nfs -o rsize=1048576,wsize=1048576,hard,tcp,bg,timeo=600,vers=3 <IP>:<Path> /dss

2.3.4. Further Information

The page DSS How to export a container via NFS to your virtual machine in LRZ covers an end to end example of how to export and mount a DSS container on a Compute Cloud VM.

2.4. Using DSS world wide via Globus Online

In order to access the data stored in DSS containers from outside of the LRZ, we provide a Globus Connect Server DTN infrastructure, which integrates into the Globus Research data management portal. With this setup, you can easily transfer and share DSS container data world wide, using a protocol which is optimised for high speed transfer via wide area networks (WAN).

Please note that it may take up to several hours after you have accepted your DSS invitation before you can successfully access the DSS container via Globus. This is because there still has to trickle-down some information through various LRZ system components via regularly running cron jobs.

If you cannot access you DSS container after 12 hours after invitation acceptance and first time registration (see step 2.3.1) please raise a ticket via the LRZ Servicedesk.

2.4.1. Understanding the difference between Globus Mapped and Guest Collections

In Globus, data on storage systems is made available as so called Collections. Globus distinguishes between two types of collections.

  1. Mapped Collections are collections where the username which which you logged in to Globus will be tried to map on a local user account on the server, the data resides on. Therefore only the rights of the mapped user on the server are important on what can be accessed via the mapped endpoint and what not. This also means that you need to have an user account on the server that hosts the Mapped Collection. There exists exactly ONE Mapped Collection for DSS called: Leibniz Supercomputing Centre's LRZ DSS - CILogon
  2. Guest Collections are collections for which ACL management is outsourced to Globus. This means that one can give anybody with a valid Globus Online User, access to a certain portion of a storage system hence this allows Data Curators of DSS to share data in their container also with persons that do not have an LRZ/LMU/TUM account. In context of the local system, all data access via a Guest Collection is in the context of the user which created the Guest Collection. For DSS this is always a special system account called dssglobus. Guest Collections on DSS are created on a per-container basis and only if the Data Curator decides to enable Globus Sharing in DSSWeb. Guest Collections for DSS in Gloubs are always called "LRZ DSS Container XXXX-YYYY-ZZZZ"

The general rule to decide which type of collection to use as user is simple: If you have a LRZ, LMU or TUM account for which you were invited to access a DSS container always use the Mapped Collection at: Leibniz Supercomputing Centre's LRZ DSS - CILogon. If you don't have a LRZ, LMU or TUM account and you received an invitation mail from Globus Online, use the respective Guest Collection for the container.

2.4.2. Log in to the Globus Research data management portal

You can login to Globus by clicking on the Log In button on the upper right of the page.

You then will be directed to a page, where you have to select your identity provider. If you want to access the DSS Mapped Collection Leibniz Supercomputing Centre's LRZ DSS - CILogon make sure to always select the Leibniz Supercomputing Centre as organization, even if you want to log in using a TUM or LMU managed account.

Then click on the continue button in order to start the login workflow. 

The login workflow will redirect you to the Shibboleth Single Sign on Provider of LRZ. Use the username and password provided you from your institution (LRZ/TUM/LMU) which you were invited to access DSS for.

After that you should be successfully logged in and see the following File Manager view.

2.4.3. Using Globus to transfer files between your workstation and DSS

In order to transfer files between a DSS container and your workstation, you currently have to install the Globus Connect Personal software on your workstation and setup a so called Personal Endpoint. To do so, just click on the install link that is appropriate for your operating system on the Globus Connect Personal site and follow the instructions.

After that, you can go back to the File Manager view and select the Personal Endpoint for your workstation, you have just created by clicking on the Collection field. This will open a search window in which you can search and select your workstations endpoint. 

After selecting your endpoint, you will see the content of the directories on your workstation, you have exported via Globus (usually the HOME directory)

After that, you can switch to the two Panel view by switching the Panels button on the right upper area of the UI.

After that, you can now select the DSS Endpoint on the other side of the page. Just again click on the collection Input field that says Transfer or sync to and this time search for LRZ DSS. Select the Endpoint Leibniz Supercomputing Centre's DSS - CILogon.

Now you should see the content of your workstation on one side and the base directory of DSS on the other side.

Now you can navigate/browse through the directories and start a transfer by just navigating to the destination folder on the destination, selecting the source folder or files on the source side and click the big blue Start button that points from the source to the destination.

Please note that you can also adjust some Transfer Settings on the bottom of the page. For example you can choose to encrypt the data transfer if you require an extra level of privacy, or you can even tell Globus to "Sync" your directories like you may be used to do by tools like rsync.

Basically as soon as you have started the transfer you are done. That means you can now navigate away from the page and do other stuff, while Globus is doing the heavy lifting in the background. Once the transfer has finished (or failed permanently), you will receive an email from Globus.

However, if you are curious and want to watch Globus while it does it's magic you can click on the Activity link in the left navigation panel and get an overview of your recent transfers.

When you click on one of those transfers, you can also get some more details about it.

2.4.4. Using Globus to transfer files between your Servers and DSS

If you want to transfer files regularly and with more performance than the Globus Personal Endpoint can deliver, you can also setup a Globus Endpoint Server on the servers on your institutions. Running a basic Globus Endpoint Server is free of charge. Just check out the Globus Connect Server page for download and installation/configuration instructions. Getting this up and running for the first time should be pretty doable in one hour for medium to advanced Linux Sysadmin. However, if you struggle, please don't hesitate to call out on us via the LRZ Servicedesk.

2.4.5. Using Globus to transfer and share data world-wide

Like the name Globus suggest, its mission is to connect data islands around the world and enable easy, fast and reliable data transfers between theses islands. Therefore, many science institutions run their own Globus Endpoints. So if you need to transfer your data between LRZ and some other site, chances are good that there is already an Endpoint setup at your partner site. However, if this is not the case, you can ask them to install a Globus Connect Server or at least run a Globus Personal Endpoint for you on their systems. Both options are free of charge. 

For DSS we even have signed a premium subscription for Globus which enables us to provide you another great feature, which basically is very unique. You can share your data in DSS world-wide with arbitrary people as easy as you may be already used to by tools like LRZ Sync+Share or Dropbox. Just provide Globus the Email address of the person you want to invite together with the permissions, you want to give the person (read or write access for example), and you are done. However, while tools like LRZ Sync+Share or Dropbox or Google Drive come to their limitations when we talk about data sizes of more than 1 TB, this is where Globus Sharing just begins... 

If you need to share the data in a container with external users, please ask your data projects data curators. Only data curators are able to create Globus Sharing Access Permissions.

2.4.6. Using Globus to download and upload smaller files directly using your Browser

In the Globus Web UI, when you click on a single file, you'll see that the Open and Download Icons will be activated. If you click on one of these icons, the file will be opened or downloaded directly via your Browser. Also if you navigate to a directory, you'll see that the Upload Icon will be activated. If you click on this icon, you can upload files directly via your Browser. However, bear in mind that this access mechanism is much slower than transfers via Globus Connect Server/Personal, so it is only recommended for few smaller files.

2.4.7. Using Grid command line utilities to transfer data

If you don't want to use the graphical user interface for transferring data using Globus, you can also check out their Command Line Interface or RESTful API.



3. Hints and possible pitfalls

3.1. Known Limitations

3.2. Do's and Dont's