A problem with a cloud backup meant that the only chance of saving months of work from being lost was to call a data recovery company.
R3 initially lost the job to a substantially cheaper / high risk data recovery / computer forensics investigators. The client did subsequent investigations and elected to retrieve the server from the competitors HQ in Wales and drive it direct to Security House in Sheffield.
The 15 Disk RAID 5 configuration included 1 hot swap disc, but 2 of the Seagate FC series fibre channel discs had failed.
The recovery was complicated at a number of levels but in summary:
- A disc had failed
- A hot disc did not work
- Another disc failed
- The resync to the hot disc failed in some unexplained manner
The discs were formatted 520bytes per sector which was common for enterprise class SAN but data recovery algorithms / software are most stable working with 512byte per sector formatting.
The client contacted a British / Global data recovery company who were unable to diagnose the problem or propose a solution and R3 were called to take over the case.
On the face of it solving the disc failures, imaging conversion, VMFS and recovery of specific data from within just 1 of dozens of VMs was complex enough.
But as the tasks progressed something was wrong and it just did not make sense.
During any disaster / failure there are a number of factors - each year I manage several of the larger disasters in UK and dozens of relatively routine for the R3 team recoveries with disc arrays in pools from as little as 4, 6 12, and others with 24, 55, 180 and recently a pair of 240x 4TB SAN with varying RAID combinations and Tables.
But this EMC RAID 5 was different, it not only had a very seriously degraded disk which had sustained a head crash but the resync had written as data not parity and prevented the data from the RAID being accessible.
In fact it prevented a RAID recovery being possible. R3 engineers were initially stumped because the last disc to fail was proving difficult to get a full image from and took a few days in itself to be recovered and its image converted to 512bytes per sector.
The first to fail disc initially was ruled out as one of the discs being needed because it was out of sync by several weeks.
But on rebuilding the RAID5 volume the VMs were showing as corrupt and all involved including the developers of RAID recovery software and hardware could not help and hit a dead end.
Later we realised just how unique this EMC failure was and that no one had identified a similar case ever.
EMC now part of Dell is "fairly good gear" R3 have worked on a number EMC data recovery cases where multiple disc failures have caused the data to be inaccessible but non that effectively wrote back to a RAID5 disc member in RAID0.
All involved had never seen anything like this on an EMC or on any hardware for that matter.
After solving this a bespoke system configuration was built and a script developed. All VMs were extracted in order of priority and the contained data extracted and tested by the client.
Apart from the normal server dirty shutdown problems needing some database file repairs it was a full recovery.