Site and forum downtime - Thursday January 3rd 2013

Rys

Graphics @ AMD
Moderator
Veteran
Supporter
It looks like one of the disks in the RAID array backing the VM datastore in the machine that does most of the work for the site and forums has died. The degraded array performance is causing CPU load issues on all other guests whenever any other guest does a lot of disk IO.

That appears to be the root cause of the site and forums' reduced performance and spotty availability recently. I've upped the amount of RAM available to the guest today (sorry for that unscheduled maintenance, it was only brief) in anticipation of a visit to the DC on the 3rd next week to fix it.

I'm also installing a couple of other machines in the rack, and at least one will be used to improve the site and forum performance at some point in time.

Rys
 
BTW do you have a raid 6 or equivalent, or is it all toasted if another drive fails :).

I wish you good luck, to not face big surprises and to make it back intact. Beware of the dragons and pit traps.
 
It's a RAID5, so another failure would be really bad. We have backups of course, but there'd be extended downtime while I repaired things. That's one of the things I'm working to fix; adding a second machine and beefing up the main server's disk array so we're faster and more resilient to failure.

The work today didn't go ahead in the end due to being incredibly busy (I'm moving house and I'm the only engineer not on holiday in my group at IMG!), but I'll be heading there soon. The virtual machine is now given effectively the entire run of the host at the moment in order to give us the best performance, until I can rework the array.
 
Back
Top