Hello all,

John might delete this because it isn’t pretty, like his posts.

On Wed, Feb 21, the site experienced downtime.  The cause was an infrastructure update that caused certain resources to go offline.

At no time was there a risk to the contents of the site.

The resources (ceph cluster) has been brought back online and the site, obviously, is now functioning properly.

Geek speak:

The base infrastructure is K8S. A ceph cluster provides backing storage for the RDBMS and for the assets. At around 0800, the K8S cluster was forcibly upgraded because of EOL issues.

This caused NAS volumes to become detached from the K8S ceph nodes. This is expected. Once the volumes were attached to the new K8S ceph nodes, the OSD processes had to be properly restarted.

Once this was completed, the ceph volumes became available to all the pods that needed them and the site was brought back up.

4 thoughts on “Site Down Report”
  1. Wonder if that fried my motherboard and hard drive compromise issues, just kidding, old computer decided to kick off Wednesday morning, New/used/refurbished Dell $600.00 just got plugged up/in, hope it lasts as long as the last one did, have to take time to get used to it, but, it is what it is. With my computer down and my phone network down, I felt “lost in space”. Good to be back in cyber-space.

Comments are closed.