[linux-lvm] HDD failure - please help!

Stuart D Gathman stuart at bmsi.com
Thu Sep 2 15:22:19 UTC 2010

On 09/02/2010 07:50 AM, Patterson, James wrote:
>> If you wanted RAID5, your best bet on linux is the md driver.
>> Or else a hardware RAID controller.
> I don't/didn't want RAID.

Based on your expectations, I think you *do* want at least RAID1.  Raid 
1 is simple to administer
and understand.
>> your first step is getting a copy of the metadata.
>> There should be a copy at the beginning of each drive.
> Yes. How do I access it? None of the drives will mount. I am thinking here that I should create a special boot disk with the LVM tools on it (they are not present on the FC11 boot iso, afaik).
You don't mount the PVs.   Use the metadata extraction tool, I don't 
remember the name atm.
If this was your boot filesystem, then you will need a LiveCD or new 
install.  Since you will need a new disk anyway, I suggest you get the 
new disk that is *bigger* than the failing drive and install to it (but 
*not* overwriting the others) and leave a partition big enough to 
contain the PV from the failing drive.  Remove the failing drive, and 
access it via USB - even if you have another drive slot.  By taking 
steps to keep it as cold as possible during recovery, you can coax a few 
more sectors out of it.
>> then you should look back a month or so in the archives
> I looked...could you please be a bit more specific? I didn't see anything.
This should get you started:  
> Well, truly, the only thing I've learned is never to use LVM if it's 
> going to cause me to lose data on all 5 drives when one goes down. The 
> logic behind it's use appears to be to just make life "easier"
With jbod (which you likely have), the failure scenario is exactly the 
same whether you have 1 disk or 5.  Part of your filesystem gets 
trashed, and you have to use low level tools to recover what remains if 
you don't have backups.

What having 5 disks *does* do is make failure more likely.  Suppose the 
probability of 1 disk *not* failing in a given year is .999 (3 sigmas).  
With jbod, the LV fails if *any* of the disks fail.  The probability 
that none of them fail in a given year would then be .999^5 ~= .995.  
Your array is less reliable.

By using RAID, you can make the array more reliable.  RAID works by 
using multiple copies of data so that you don't lose anything on a 
single drive failure.

More information about the linux-lvm mailing list