TSM Backup vs. Archiv

TSM backup vs. archive - features, differences, areas of use, application scenarios

Again and again questions arise like: What is actually the difference between backup with TSM and archiving with TSM? But also which function is suitable for which purpose. In this article we want to answer these questions.

If you had to summarize the difference between TSM backup and archiving in one word, it would be: Version management. The backup function detects, stores and manages different versions of a file, whereas the archive function does not work incrementally.

The backup function is used for regular (e.g. daily) backup of data, so that this data can be restored from the backup in case of an error. To keep the load on the system as low as possible during the backup and to keep the amount of data to be stored as low as possible, only the files that have changed since the last backup run are saved. Each change to a file - compared to the last backup run - therefore results in a new version of this file being saved in the backup. Because it is not possible to store all versions of a file forever, there are rules in TSM to age out old versions of a file from the backup. This is done both time and version controlled. Note that the most recent version of a file that still existed on the client - at the last backup run - is the so-called active version of the file and that TSM never deletes an active version. As soon as an active file version is replaced by a new version or it is detected during the backup run that this file has been deleted on the client, it will be marked as inactive and will be subject to the retention policy from now on, which is controlled by the following parameters.

  • VERExists – Maximum number of versions to keep of a file that still exists on the client.
  • VERDeleted – Maximum number of versions of a file to keep that have already been deleted from the client.
  • RETExtra – Number of days to keep a backup version after it has been marked as inactive.
  • RETOnly – Number of days for which a file deleted on the client should be kept in the backup.

As soon as one of the above parameters takes effect for a version of a file - i.e. either the maximum number of versions is exceeded or the maximum retention period of a version is exceeded - this version is deleted from the backup.

The LRZ standard retention policies work with the following values:

  • VERExists = 3
  • VERDeleted = 3
  • RETExtra = 180
  • RETOnly = 180

Loosely translated, this means that we store a maximum of three versions of backup data for a maximum of 180 days. Besides the standard retention policies (in TSM jargon "management class") there is also the possibility to specify a special management class for certain files or directories, which allows to store a maximum of ten versions for a maximum of 180 days.

In contrast to this, the archive function of TSM allows you to safely keep a copy of your data for a longer period of time. For example, for legal reasons or to comply with DFG rules of good scientific practice, or simply to move "cold" data off your local system for efficiency reasons to make room for new data. The archive function keeps files in the archive system for a specified time without recognizing or managing versions of a file. This means that each time a file is archived, a copy of it is stored in the archive system, regardless of whether the same version of that file has been stored before. Therefore, the retention policy for archive data consists only of the following parameter:

  • RetVer – Number of days an archived file should be kept.

The LRZ standard retention policy works with the following value:

  • RetVer = 3653

This means we store archive copies for 10 years. Upon request, we also offer the option of so-called long-term archiving, where archive copies are kept "forever".

Another significant difference between backup and archiving in the LRZ environment is that we make a duplicate copy of the archive data at a remote location, so that in the event of a media failure or even destruction of the LRZ, your archive data is still safe. Unfortunately, for cost reasons, it is not possible to store a duplicate copy of the backup data as well.

Based on your characteristics, the following standard application scenarios for backup and archive can be derived:

  • Backup – Regular backup of your current work/data to protect against data loss on your system.
  • Archivierung – One-time backup for offsite and/or long-term safekeeping of your completed work/data.

In certain application scenarios, it would be beneficial to have some sort of incremental archiving capability. Such a case is typically where raw data is generated over an extended period of time during an experiment that needs to be retained for a longer period of time, but there is not enough capacity on the primary system to store all of the raw data from the experiment. That is, if the primary system fills up, you can delete older data and it will still be kept in the backup system for 10 years and can be retrieved later if needed. For this use case, we have created a special backup retention policy with secondary copy and the following parameters:

  • VERExists = 1
  • VERDeleted = 1
  • RETExtra = 3653
  • RETOnly = 3653

This means that we store the most recent version of a file and keep it (from the time it is marked as inactive by TSM Backup) for 10 years. Since archive data is by definition closed data - i.e. the data does not change anymore - it is sufficient that we store only one version.

The way TSM implements the backup function is fundamentally different from the traditional approach. You can find a detailed comparison in our article "The TSM Way of Backup". Both methods have their advantages and disadvantages. What does not work with the TSM backup philosophy in the standard case, because the number of stored versions is limited, is to keep a record of the state of a file system at certain intervals on a monthly or quarterly basis. However, if you think more carefully about your requirements, in 99% of cases this is not necessary at all, because the costs exceed the potential additional benefits. For justified exceptional cases, however, we offer the possibility to store such monthly or quarterly backups for a longer period of time. For this purpose you have to request another "special" TSM node for the monthly or quarterly backup in parallel to your "normal" TSM node. This node will be assigned to a special retention policy. Possible policies are:

VERExists/VERDeleted

RETExtra/RETOnly

Backup frequency

12

1 year

Monthly backup

12

2 years

two-month backup

12

3 years

quarterly backup

16

4 years

quarterly backup

20

5 years

quarterly backup

40

10 years

quarterly backup

The procedure is then that the daily backup is stored in the "normal" node and an additional backup is stored in the "special" node at the appropriate time intervals (monthly, bi-monthly or quarterly).

However, since this procedure of course consumes significantly more resources than the normal TSM backup procedure, we can only offer this to a limited extent and after a precise cost/benefit analysis.

What we ask you to refrain from in any case - and also reserve the right to take corrective steps accordingly - is to use the archive function for the regular saving of file system states. Since the archive function is not incremental, the complete file system is transferred and saved each time, causing immense and above all unnecessary costs. Since the capacity of the backup system is limited and we cannot continue to offer our service free of charge when the capacity limit is reached, this procedure is more than questionable just out of fairness to the other users.

If you have a need for the special solutions presented here (incremental archive or monthly/quarterly backup) or if you have special requirements that cannot be met with the solutions presented here, please contact us via the LRZ-Servicedesk. We will then try to find a suitable solution together.