[linux-lvm] Restore LVM after drive error
james.schatzman at futurelabusa.com
Sat Jul 7 19:06:14 UTC 2007
In my opinion, the docs for pvmove should include the warning "do not use pvmove to copy data from a bad disk". Once you start it, it seems that you are hosed. The original pvmove fails, any attempt to resume pvmove also fails, and pvmove --abort also fails. Vgreduce also fails in this situation.
At this point, I recommend that you use one of two possible methods of partial recovery:
1) By hook or by crook, get the filesystems on the vg to mount. You may have to use dmsetup and "-P" to enable VG to assemble the volume while erroring out on a bad drive.
Unfortunately, it does seem that pvmove scrambles things so that I have not been able recover from this situation. Using "-P" does not seem to work. Again, in my opinion, folks should be warning: DO NOT ATTEMPT TO USE PVMOVE TO RECOVER FROM A BAD DISK - YOU WILL NOT LIKE THE RESULT.
Once mounted, copy all the files you can off the old file system. Copy files in small groups. When you get errors, stop immediately and shift to another directory. You may have to power off while in operation if the drive seems to be looping and your system log is filling up with error messages.
Since you are getting the "couldn't read all logical volumes" message - this suggests that Linux is unable to enable one or more of your disk drives. Check the system log and find out why. If it is a "software reset failed", you may want to try powering down the whole thing (computer and drives) many times - maybe, eventually, the drives will come up. Once the drives are all up, the volume group should work and you should be able to mount the FS. Also, try turning the bad drive(s) over or standing them on end. Sometimes that works. "-P" might work on "vgchange -Pay" and/or "lvchange -Pay" to enable you to use the VGs and LVs with a bad drive.
If desperate, try replacing the PC board on the bad drives. See http://myharddrivedied.com/presentations.html for detailed instructions. If you have a spare, identical drive handy, this may help.
2) I haven't tried this - it is theoretical: Acquire a working Linux system. Put your bad LVM drives in this system one at a time along with an identical new drive. Use dd on the raw device to copy from the bad drive to the new drive. Tell dd to ignore errors. This should mirror the old bad drives onto new good ones, to the extent that the old data could be read. I am not sure how to get LVM to recognize the new drive(s), but I am told that you can put the complete collection of old/good and new drives on one system and get LVM to reassemble them. Preferably copy the good files somewhere else. If you are brave, run fsck.
Some general recommendations based on my experiences:
1) Never put anything you care about in a multi-disk JBOD volume group. One bad drive destroys the entire VG. Unreliable. Especially don't do this for your system partition. This is unfortunately, and seriously limits the usefulness of LVM, in my opinion. If LVM had better error recovery, then I would have a better opinion of it.
2) I have had a number of cheap SATA to very expensive SCSI drives go belly up within one week. Suggest that you burn in your new drives for a week or two prior to relying on them.
3) Use RAID 5 or 6 if you can't stand to lose everything.
Again, it is unfortunate, but pvmove does bad things if either the old drive or the new drive fail. Don't use it unless you are confident that it will succeed. You could run "e2fsck -c" for this purpose, but it would be better to use badblocks directly.
At 07:47 AM 7/7/2007, you wrote:
>I recived an smart warning for one of my scsi disk that it might faile
>some day, so i added
>another disk to move the lvm data from the failing disk another. But the
>"new" disk faild and
>the system crashed during the pvmove process. Now all lvm commands i
> "Couldn't find volume 'pvmove0' for segment 'start_extend' " and
>"Couldn't read all logical volumes for volume group system".
>Where 'system' is my vg. I tried pvmove --abort and vgreduce
>--removemissing but nothing worked.
>(now i know that vgreduce --removemissing dosn't solve the pvmove
>problem). My /-Directory is whithin the system vg, so i could not access
>the backup configuration of the lvm ! Is thre any way to undo the pvmove
>changes and rescue my data ?!
>I have searched the inet for solutions but found nothing that worked.
>Please Help me !
>Psssst! Schon vom neuen GMX MultiMessenger gehört?
>Der kanns mit allen: http://www.gmx.net/de/go/multimessenger
>linux-lvm mailing list
>linux-lvm at redhat.com
>read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
More information about the linux-lvm