Sorry about the site being down for the last few hours.

Our cloud provider had a server stop responding. It was enough “up” that it didn’t show down from the outside.

Our cloud hosting provider is Linode. I hope you can understand just how good they are.

My initial contact was that it looked like the load balancer was failing.

They did a thorough examination and determined that it wasn’t the load balancer, but something related to our configuration.

Given that our configuration is freaking stable, this did not make any sense.

These lead me to determine that the database server was not responding. This is not good.

This in turn led me to discovering that I had pods that were “stuck”. I attempted to manual stop one of the pods, and it just hung there.

The ticket was updated to “I can’t terminate pods.” The update I received was not helpful. It was talking about disks not attaching correctly, suggesting that I was attempting to attach the same drive to multiple machines.

This took me to attempting to log into the node. Something I should never have to do.

The node was borked. I took the reset hammer to it. It rebooted. Things are working again.

Linode did a great job working for me to help resolve the issue.

3 thoughts on “Unplanned Downtime”
  1. Sometimes you have to take the reboot angle to remedy the situation. Make it all forget what’s happening now and start all the processes clean in the order they should be run at for correct file handle linking. One hates to do it, but it the best way to fix the unknown, unresponsive computers that are supposed to do our bidding, only faster. It isn’t a HAL3000, so you can’t say “Hal, please fix the problem with the website server.” “Ok Dave, I am doing that now.” “Done”

Comments are closed.