Unscheduled downtime - Thursday May 14th 2020

Rys

Graphics @ AMD
Moderator
Veteran
Supporter
In the early hours of May 14th the VM hosting the forum underwent 2 restart procedures triggered by the admin team, however due to a package misconfiguration the webserver part didn't startup during the VM restart procedure.

That was my fault: a package upgrade for maintenance a couple of weeks earlier left FreeBSD's version of nginx in place as the webserver, which I didn't spot at the time during maintenance. The packaged version doesn't support HTTPv2, so I build nginx from source to add that ability and remove unwanted nginx features we don't rely on, to reduce the surface area of code we rely on in that critical bit of infrastructure.

So when the VM restarted, the packaged nginx tested the config file, failed because it asked for HTTPv2, and didn't startup. I've added a line item to my maintenance playbook to hopefully avoid that in the future.

So great news that the forum restart system I put in place so that @BRiT and others can restart the forum VM in my absence worked, but not so great news that the VM didn't come up cleanly after that. The forum was down for around 7 hours as a result.

Please accept my endless love and (socially distant) hugs and kisses in apology :love:
 
I would just have blamed China, COVID-19 or President Trump - all are credible things for explaining anything going wrong currently. This honesty policy is just weird! :yep2:
 
Back
Top