Sicherung großer Dateibestände

Challenges and possible solutions for backing up large data sets

More and more research areas are confronted with the problem of extremely large and, above all, very rapidly growing data volumes. In addition to the subject-specific questions, such as how new scientific findings can be drawn from this data, you also have to deal more than ever with questions about the underlying IT infrastructure. A topic that is often treated very stepmotherly in this context is the protection of your data against data loss. In this article, we want to take a closer look at the challenge and possible solutions for securing large data sets.

The classic data backup strategy of weekly full and daily differential backup faces the problem that it no longer scales with the rapidly growing data volumes. For this reason, the LRZ has been using the so-called Incremental Forever procedure of the Tivoli Storage Manager for decades. In this process, only the first backup is a full backup, and in all subsequent backup runs, only the changes to the previous backup point are transferred. This means that much less data has to be moved for the backup process than with the traditional method. For the restore case, however, both concepts have the problem that all data from the backup must first be copied back. Even with powerful systems and optimal conditions, this means several hours of downtime for large data sets.

In addition to data assets, user requirements for recovery time objective (RTO) and recovery point objective (RPO) are also growing.

DAR-01

Whereas in the past it was often acceptable to build on last night's or the night before last's backup in the event of a data loss and wait until the next day or the day after that to restore, today IT managers are often faced with the requirement of RPOs and RTOs in the range of hours or even minutes. Of course, this can no longer be met with traditional backup procedures.

In the course of the increased requirements, storage system manufacturers have developed new backup methods. These can be classified into the following categories:

  • Snapshots
  • Synchronous replication
  • Asynchronous replication
  • Replication of snapshots

Unfortunately, it is often overlooked that these procedures - unlike traditional backup - cannot protect against all forms of data loss. Data loss scenarios can be roughly classified into the following categories:

  1. Hardware defect
  2. Software defect
  3. Operating error or deliberate deletion (hacker/virus)
  4. Defect in the software of the storage system

The following table provides an overview of which backup procedures can protect against which types of data loss:

 

Hardware

Software

User/Evil

Storage SW

Snapshots

 

X

X

 

Sync Repl

X

 

 

 

Async Repl

X

 

 

 

Snap + Repl

X

X

X

 

As we can see, the combination of snapshots and their replication to a secondary system - especially if it is located far enough away from the primary system - can provide good protection against many data loss scenarios. However, the backup methods mentioned are usually only available within the same storage system family. This means that the primary and secondary systems run the same software. If an error now occurs in the software of the storage system, it is quite possible that both systems will be affected, resulting in data loss on both sides.

In the final analysis, this means that even modern backup concepts must still provide for a traditional backup copy with system and media disruption as a so-called "last line of defense".  

For this purpose, the LRZ offers all chairs of the TU and LMU the possibility to use our backup system, which is operated with the software IBM Tivoli Storage Manager, free of charge. You can find out more about this under Backup und Archivierung.

Are you planning to purchase a large storage system for your department and want to use our backup system for the backup? Contact the LRZ-Servicedesk as early as possible so we can give you valuable tips and advice already in the planning phase and check the infrastructure operated by the LRZ for possible bottlenecks and take appropriate measures.

Do you need a high-performance, highly reliable data storage system for your research tasks, but would prefer not to worry about its operation and instead concentrate entirely on your research project? Then the LRZ Storage Cloud service might be of interest to you. You can find more information about this service here: Online-Speicher (NAS) .