[linux-lvm] recovering data from broken volume group

Tue May 8 12:17:36 UTC 2007

hi list

I am running a debian system with 2 raid arrays (+swap). on md0, I have the
/ filesystem (about 500 MB). md1 (about 200GB) contains a LVM2 volume group
with 5 volumes, where /home, /var, /usr, /tmp, and /home/vpopmail reside.
all filesystems are of type ext3.

this setup has been running just fine for 2 years now. until I upgraded from
debian 3.1 to 4.0. the update went smooth, but when I rebooted I got:

[/sbin/fsck.ext3 (1) -- /var ] fsck.ext3 -a -C0 /dev/mapper/volg1-b
fsck.ext3: no such file or directory while trying to open
/dev/mapper/volg1-b
/dev/mapper/volg1-b:
The superblock could not be read or does not describe a correct ext2
filesystem. if the device is valid and it really contains an ext2 filsystem
(and not swap or ufs or something else), then the superblock is corrupt, and
you might try running e2fsck with an alternate superblock:
    e2fsck -b 8192 <device>

the same for all other lvs.

the first problem was that md1 did not get started, so I did this manually
and continued the boot process. I got

[/sbin/fsck.ext3 (1) -- /var] fsck.ext3 -a -C0 /dev/mapper/volg1-b
/var: recovering journal
/var contains a file system with errors, check forced.
/var:
Inode 184326 has illegal block(s)
/var: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY (i.e. without the -a or -o
options)

( again, for all lvs)

fsck died with exit status 4.

here I got dropped into a maintenance shell again, but forced the system to
continue to boot (probably not the wisest choice, in retrospect).

EXT3-fs warning: mounting fs with errors. running e2fsck is recommended
EXT3 FS on dm-4, internal journal
EXT3-FS: mounted filesystem with ordered data mode.

...  (same for dm-3, dm-2, dm-1, dm-0)

EXT3-fs error (device dm-1) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device dm-1) in ext3_orphan)write: Journal has aborted
EXT3-fs error (device dm-1) in ext3_orphan_del: Journal has aborted
EXT3-fs error (device dm-1) in ext3_truncate_write: Journal has aborted
ext3_abort called.
EXT3-fs error (device dm-1): ext3_journal)_start_sb: Detected aborte
djournal
Remounting filesystem read-only

and finally I get tons of these:

dm-0: rw-9, want=6447188432, limit=10485760
attempt to access beyond end of device

I can boot to a root shell specifying init=/bin/sh w/o any file system
related errors, and from here I can also start the volume group without
getting any errors.

when I mount one of the lvs read only and look at the data, it seems as
there is an 'offset' to the whole volume group: looking at files I see
contents that should be in other files, and directory listings suddenly show
subdirectories that really are in other directories. or, I get 'attempt to
access beyond end of device' errors.

because all 5 filesystems started failing simultaneously,  this must be an
error in either the underlying LVM vg or raid volume. since md0 works just
fine and lvm seems to find it's metadata on md1, I don't think it's the
raid. so, it must be lvm...

I do think (and hopefully, this is not just wishful thinking) that most of
the data should still be *physically* intact on the disk. now, before I rip
out the disks and try to rescue the data from another system with e2salvage
or something like that, is there a possible way how I could fix the broken
LVM volume group?

many thanks in advance,
- Dave.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20070508/3eaf34a8/attachment.htm>