[linux-lvm] Failed PV recovery
Lamont R. Peterson
peregrine at openbrainstem.net
Mon Jul 24 02:19:47 UTC 2006
Here's the setup: home file server has 3 drives, 4.3GB, 45GB, 120GB; all IDE.
The 4.3GB drive has a /boot/ partition and a small swap with the rest
allocated to an LVM partition which is the only member of the "system VG.
The other two drives are single LVM partitions and comprise the "data" VG.
That's how it was configured for over a year.
A few months ago, I started seeing some unreadable sectors on the 45GB drive.
I purchased a 320GB SATA drive and a PCI controller (no SATA on this
motherboard) to replace the two drives (I'll get more SATA disks and convert
to LVM on RAID as I can afford them). Long story short, motherboard needed
BIOS flash and a little coaxing to recognize the PCI STAT controller, but
that's sorted out now.
I partition the 320GB drive with 1 LVM PV and add it the data VG. I
run "pvmove /dev/hde1 /dev/sda1" (120GB -> 320GB) which takes about 75
minutes (120GB was almost completely full) no issues.
AT that point, I *should* have run "vgreduce data /dev/hde1" so that I
wouldn't have the 120GB drive in the VG anymore, but I didn't. 20/20
Next I ran "pvmove /dev/hdg1 /dev/sda1" (45GB -> 320GB). About 45% of the way
through, it crashes:
/dev/hdg1: Moved: 45.0%
/dev/hdg1: read failed after 0 of 1024 at 4096: Input/output error
/dev/hdg1: read failed after 0 of 2048 at 0: Input/output error
Failed to read existing physical volume '/dev/hdg1'
Physical volume /dev/hdg1 not found
ABORTING: Can't reread PV /dev/hdg1
ABORTING: Can't reread VG for /dev/hdg1
The system was still running, but the /dev/hdg disk no longer showed up. In
the past, I could power down for an hour or so (let the drive cool down) and
then it would show up again. It looked like the mounted LVs which are on
data were fine (I could read & write), so I powered off. Rebooting, I get
kernel panics. I can bring the box up in "emergency" mode or with a rescue
Prior to this, only one LV was unusable. I was able to read every bit of the
rest of them just fine (I have backups of everything important). The one bad
LV (due to unreadable sectors on the 45GB drive) was for /var/spool/up2date
when I was running RHEL3, which I have obviously replaced since RHEL3
wouldn't support SATA (I have SUSE Linux 10.1 on there now).
If I had already removed the 120GB drive from the VG, I would try dd_rescue
and copy the entire 45GB drive over to the 120GB one. I can't get vgreduce
to run correctly and pull it out of the VG. When I run pvscan, I get:
NOTE: I just booted up the box to get the output, and the 45GB disk was
working. It hasn't been for about a week now. I have successfully removed
the 120GB drive from the data VG. Man, I gotta love having a little bit of
luck! Wow. :D
I could just blow it all away and recreate the data VG from scratch, reloading
from backups (and pulling down things like .iso images, etc.). I would like
to figure out some techniques to try to recover this from here. As I make my
living teaching over 1,000 people/year (newbies and experts alike) to use
Linux, I'd like to be able to use this experience to teach others how to
recover if they find themselves up the "Creek Who Should Not Be Named".
1. How can I take an unused PV out of a VG with another PV that's broken?
2. Once I have a copy of the entire bad drive's contents, how do I alter the
VG (hand edit?) so that it is using the copy instead of the original.
3. What am I not asking/seeing?
4. Are there better ways I could have handled this (other than the obvious
like RAID to start with, etc.)?
Lamont R. Peterson <peregrine at OpenBrainstem.net>
Founder [ http://blog.OpenBrainstem.net/peregrine/ ]
GPG Key fingerprint: 0E35 93C5 4249 49F0 EC7B 4DDD BE46 4732 6460 CCB5
___ ____ _ _
/ _ \ _ __ ___ _ __ | __ ) _ __ __ _(_)_ __ ___| |_ ___ _ __ ___
| | | | '_ \ / _ \ '_ \| _ \| '__/ _` | | '_ \/ __| __/ _ \ '_ ` _ \
| |_| | |_) | __/ | | | |_) | | | (_| | | | | \__ \ || __/ | | | | |
\___/| .__/ \___|_| |_|____/|_| \__,_|_|_| |_|___/\__\___|_| |_| |_|
|_| Intelligent Open Source Software Engineering
[ http://www.OpenBrainstem.net/ ]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
More information about the linux-lvm