Short Technical Explanation:

Some of the hardware in SAN shelf 5 (of cluster ey02) failed around 05:00-06:00 PDT.  This hardware failure caused multiple disks to be marked as failed.  The loss of these disks meant we lost parity in two raid 5 sets, and required the shelf to be brought offline and swapped with a spare chassis.  Our SAN vendor helped our engineers rebuild the parity information and restore the failed disks. The majority of affected customers experienced down time limited to the 30 minute shelf swap and a reboot of their application.  In addition, a handful of customers experienced a number of hours of down time, due to the time required to rebuild the raid sets and check each file system.  In all cases no data was lost.

Our Message to You:

We recognize that a thorough analysis of today’s issue is very important to each of you. Many members of our staff have information to share; we are capturing and parsing this information, and will post here again tomorrow afternoon with additional detail and insight. Transparency remains one of our core values.  All customers affected by this issue will receive a service credit.

Contacting Us:

Please don’t hesitate to open a ticket or call your account manager — we are all eager and available to answer and help you in any way possible.

Thank you again for choosing to host with Engine Yard,

–Taylor Weibley
Director of Support

Post a Comment

*
*