Page tree
Skip to end of metadata
Go to start of metadata

02.02.2021 – 05.02.2021 Power Outage

UPDATED

All cloud compontens (and the VMs) will be shut down.

We expect the Cloud to be fully operational during Friday, 5.2.2021.

The cloud has returned to operation.

Date: Thu, 28 Jan 2021 15:06:55 +0100
From: Niels Fallenbeck <Niels.Fallenbeck@lrz.de>
To: LRZ Compute Cloud Users <cloud-users@lists.lrz.de>
Subject: [Cloud Users] Service interruption shifted to: 02.02.21 16:00 -> 05.02.21
X-Editor: VIM - Vi IMproved 8.2 <http://www.vim.org>
User-Agent: Mutt/2.0.5 (2021-01-21) Darwin 19.6.0

LRZ Compute Cloud Users!

Unfortunately, the time of the power interruption has changed due to external
delays by 1 day. Sorry für the information that reached us today.

To keep the downtime as short as possible, we will shut down the cloud one day
later that mentioned in our previous mail on
    Tuesday, 02.02.2021, afternoon at 16:00 CET.

We expect that the cloud will become available again on
    Friday, 05.02.2021.

Sorry for any inconveniences,
best regards,
Niels
LRZ Compute Cloud Users,
 
in the meantime the time window of the power interruption has been narrowed 
down to:
    Tuesday, 2.2.2021 - Wednesday, 3.2.2021
 
We will start shutting down the cloud system on Monday, 1.2.2021, afternoon 
around 17:00 (5 PM).
 
Thanks for your understanding and sorry for any inconvenience,
Niels
 
On 14.01.21 at 09:28, Niels Fallenbeck <niels@lrz.de> wrote:
> LRZ Compute Cloud Users,
> 
> In week 5 (February 1, 2021 - February 5, 2021), work will be carried out on
> the LRZ power infrastructure with regard to future computer systems. The LRZ
> Compute Cloud is affected by this measure. During this time, the cloud
> infrastructure will not be available for about 3 days. The entire cloud needs
> to be switched off, running VMs are shut down.
> 
> Unfortunately, I can not tell you the exact time of the work at the moment, as
> it depends on preparations that will take place during the next week.
> 
> As soon as I know the exact time of the service interruption, I will inform
> you again in more detail.
> 
> Warm regards and sorry for any inconvenience,
> Niels
 
-- 
Dr. Niels Fallenbeck
Leibniz-Rechenzentrum (LRZ), IT-Infrastruktur Server und Dienste (ITS)
Boltzmannstr. 1, 85748 Garching, Germany
Phone: +49 89 35831-7860


26.05.2020 – 29.05.2020 Maintenance Window

Alle VMs werden in diesem Zeitraum heruntergefahren

=== English version below ===
 
Liebe Nutzer*innen der LRZ Compute Cloud,
 
seit dem Wartungfenster im letzten Juli haben sich wieder einige Arbeiten aufgestaut, die wir nicht im laufenden Betrieb transparent für die Nutzer durchführen können. Diese Arbeiten werden wir im Zeitraum vom Dienstag, 26.05.2020 ab 10 Uhr, bis Freitag, 29.05., durchführen.
 
In diesem Zeitraum werden wir nicht nur Firmwareupdates für die Server und Softwareupdates und -bugfixes für deren Betriebssysteme einspielen, sondern auch die Server teilweise neue verkabeln und weiteren Hauptspeicher in jene Server einbauen, die Ressourcen für virtuelle Maschinen ohne GPUs bereitstellen.
In den letzten Wochen sind wir mehrfach in die Situation gekommen, dass Nutzer*innen keine neuen VMs mehr starten konnten, da alle CPUs bereits vergeben waren: Die Cloud war (und ist!) voll. Um diese Situation zu entspannen, haben wir uns die Nutzung der Hardware angeschaut und sind zu dem Schluss gekommen, dass wir die CPUs überbuchen werden, um für virtuelle Maschinen mehr CPUs zur Verfügung stellen zu können, als tatsächlich vorhanden sind - die Compute Cloud wächst also virtuell. Da die reale CPU-Auslastung der Server deutlich unter 20% liegt, erwarten wir keine spürbare Auswirkung auf die Performance der VMs.
 
Sollten Sie Fragen haben, stehen wir selbstverständlich für Rückfragen zur Verfügung.
Herzlichen Dank und viele Grüße,
 
Niels Fallenbeck
 
=== English ===
 
Dear LRZ Compute Cloud users,
 
since the maintenance window last July, some work has accumulated that we cannot carry out transparently for the users during operation. We will perform these maintenance tasks in the period from Tuesday, May 26, 2020 from 10 a.m. to Friday, May 29.
 
During this period, we will not only install firmware updates for the servers, software updates, and bug fixes, but we will also need to rearrange the cabling and install additional main memory modules in the servers that provide resources for the "normal" virtual machines without GPUs.
In the past few weeks, we have been faced multiple times with the situation that users were no longer able to start new VMs because of the lack of free CPUs: the cloud was (and is!) full. To relax this situation, we have looked at the real hardware utilization and came to the conclusion that we will overbook the CPUs in order to be able to provide more CPUs for virtual machines than actually exist - the compute cloud will grow virtually. Since the real CPU load is lower than 20%, we do not expect any noticeable impact on the performance of the VMs.
 
If you have any questions, we are of course available to answer any questions.
Thank you very much and best regards,
 
Niels Fallenbeck

09.07.2019 – 12.07.2019 Maintenance Window

  1. MTU issue: The default Networks MWN and internet public to all user have been configured incorrectly .At the moment the MTU size of a network interface attached to a VM  is set to 1450. Docker and Windows "do not like this". → we will migrate all current network interfaces to a  copy of the internet and MWN networks  which will finally run a MTU size of 1500.
  2. Impact to the users of the VM → all MAC addresses inside the VM will change, but all IP addresses (internal/floating) will stay the same  and work → This is considered to be  be OK.
  3. All VMs will be rebooted due to a necessary firmware upgrade of the hardware and the need to partially recable the hardware.






  • No labels