DSS Understanding Data Science Storage Reliability and Service Level Objectives

With LRZ's Data Science Storage, we want to provide a cost optimised solution that allows you to store vast amounts of data. However, cost optimisation always requires some tradeoffs to be made. In the following we discuss some of these tradeoffs we made and how they impact DSS's Service Level Objectives

Recovery Point Objective

One tradeoff we made for DSS is, that we don't replicate data to a second hot or warm standby system, that can immediately take over in a catastrophic failure, but just rely on RAID technology to tolerate isolated disk failures and use traditional tape backup for data protection. However, given the sheer size of a Data Science Storage System, which usually is a PetaByte or more, a restore of a complete system will probably take a month or even more.

Availability

Another tradeoff we made for DSS is, that we leverage hard- and software, which is optimised for high data throughput, infinite scaling and some other special features we rely on, but which is not primarily designed for ultra high availability. Therefore, even though LRZ services are always delivered as a best effort service with no commitment (no Service Level Agreements so to say), our internal Service Level Objective is to meet or exceed an availability of 98-99% per month.

Related articles