A big thunderstorm hit Austin this evening and at 9:04PM local time we
lost power to several of our most important infrastructure servers in
the lab. Most everything seems to have booted back up on its own.
However, 2 of the 3 servers providing Ceph storage to the DeveloperCloud
failed to boot back up on their own. Every VM in the DeveloperCloud is
backed by Ceph so this has caused quite a bit of havoc. Additionally the
main network node providing external access to the cloud failed to boot
back up properly.
I currently have the Ceph cluster recovering. However, its looking like
it could be a couple hours until it decides all its data is in the
proper state and can be used for write access.
The network node is still giving me lots of trouble. I'll give an update
once I have more information.