When LVM Goes Bad

Andy Green andy at warmcat.com
Tue Jun 20 12:14:46 UTC 2006


Hi folks -

A story about LVM.  I believe LVM is the default on Fedora partitioning 
now, at least I didn't love it that much that I would have selected it, 
and it is on all my boxes now.

LVM can make a lot of sense for large storage binding together multiple 
devices or raids into a single logical storage device, in fact I use it 
for that too.  However LVM makes less sense on, say, a laptop which has 
and will only ever have a single 2.5" HDD for storage that is 
permanently available with the laptop.

Now it doesn't matter too much when everything is working, because LVM 
is a fairly lightweight additional layer AFAIK.  However on a box here 
its sole SATA drive went bad without warning, basically some dozens of 
sectors were goneski after a recent period of high temperature here. 
The resulting symptom was that the partition contents were no longer 
recognized as containing a logical volume or a volume group, nor pvscan, 
although pvdisplay could see it was a physical volume if pointed 
directly at the partition.

Recovery from LVM metadata corruption is not something that is 
overburdened by tools to help out, in fact I couldn't find anything 
useful.  By using dd I probed the damaged region and found that it 
started 33214 512-byte blocks into the partition, and ended 33336 
512-byte blocks in, it trashed something like 60Kbytes.  Touching this 
region spewed IO errors to the console.  Whether this explained the loss 
of LVMness or a subsequent logical brain damage that happened elsewhere 
did it I don't know.

What I did was to add a new HDD and install FC5 on it and boot into it, 
with the old HDD on as /dev/sdb.  I then used dd to copy the first 33214 
512-byte blocks to a file on the new drive, dd'ed 122 512-byte blocks 
from /dev/zero and appended that on the end of the first file, and then 
used dd with bs=512 skip=33336 to copy the remainder of the damaged 
partition to this file also.  So after this I had a copy of the 
partition as a file on the new HDD with everything in the right place 
and the damaged area zeroed out.

Now naturally this file will not mount loop because of the LVM, it's not 
a valid ext3 image.  I googled around some more and went on the LVM IRC 
channel and explained my problem.  No help, in fact no response.  There 
don't seem to be any tools or readily findable advice for recovering 
from this situation.

I created a new 10MB file with dd and used mkfs.ext3 on it, and examined 
the first part of it using hexdump.  With the help of Google I found 
that the ext3 magic is present at offset +0x438, and I noticed that the 
first 1Kbytes of it is zeroed.  I then used hexdump and grep to search 
for this situation in the copied LVM partition file, and found such a 
situation was present at offset 0x30438.

I decided to remove the first 0x30000 bytes of my copied partition 
image, which took a while because the partition was 60GB, in fact the 
whole process was agonizingly slow.

After this, I was able to mount the resulting file -text3 -oloop 
successfully and I recovered my data.  The zeroed/damaged region trashed 
a small part of two directories whose contents where noncritical.  This 
story is offered in the hope that future Googlers will have better luck 
than I did.

I wouldn't say that LVM is evil from this, but I would suggest that you 
simply turn it off for partitioning actions where you know there will be 
no expansion, because the only thing it will ever do for you in that 
case is to stress you out when you least need it.

-Andy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4492 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/fedora-list/attachments/20060620/509c72a0/attachment-0001.bin>


More information about the fedora-list mailing list