[dm-devel] Re: raid failure and LVM volume group availability

hank peng pengxihan at gmail.com
Mon May 25 16:09:25 UTC 2009


2009/5/21 Tim Connors <tconnors at rather.puzzling.org>:
> I had a raid device (with LVM ontop of it) that failed through the disks
> being disconnected in a long power failure that outlasted the UPS (the
> computer, being a laptop, had its own builtin UPS).
>
> While I could just reboot the computer, I don't particularly want to
> reboot it just yet.  Unfortunately, failing a raid device like that means
> that the volume group half disappears in a stream of I/O errors, but you
> can't stop the raid device because it still has something accessing it
> (LVM), but you can't make LVM stop accessing it by making the volume group
> unavailable because it is suffering from I/O errors:
>
>> mdadm -S /dev/md0
> mdadm: fail to stop array /dev/md0: Device or resource busy
> Perhaps a running process, mounted filesystem or active volume group?
>
>> vgchange -an
>  /dev/md0: read failed after 0 of 4096 at 0: Input/output error
>  /dev/dm-5: read failed after 0 of 4096 at 0: Input/output error
>  Can't deactivate volume group "500_lacie" with 2 open logical volume(s)
>  Can't deactivate volume group "laptop_250gb" with 3 open logical volume(s)
>
>> vgchange -an rotating_backup
>  /dev/md0: read failed after 0 of 4096 at 0: Input/output error
>  /dev/dm-5: read failed after 0 of 4096 at 0: Input/output error
>  /dev/md0: read failed after 0 of 4096 at 1000204664832: Input/output error
>  /dev/md0: read failed after 0 of 4096 at 1000204722176: Input/output error
>  /dev/md0: read failed after 0 of 4096 at 0: Input/output error
>  /dev/md0: read failed after 0 of 4096 at 4096: Input/output error
>  /dev/md0: read failed after 0 of 4096 at 0: Input/output error
>  /dev/dm-5: read failed after 0 of 4096 at 644245028864: Input/output error
>  /dev/dm-5: read failed after 0 of 4096 at 644245086208: Input/output error
>  /dev/dm-5: read failed after 0 of 4096 at 0: Input/output error
>  /dev/dm-5: read failed after 0 of 4096 at 4096: Input/output error
>  /dev/dm-5: read failed after 0 of 4096 at 0: Input/output error
>  Volume group "rotating_backup" not found
>
> The lvm device file still exists,
>
>> ls -lA /dev/rotating_backup /dev/mapper/rotating_backup-rotating_backup
> brw-rw---- 1 root disk 254, 5 May 10 09:22 /dev/mapper/rotating_backup-rotating_backup
>
> /dev/rotating_backup:
> total 0
> lrwxrwxrwx 1 root root 43 May 10 09:22 rotating_backup -> /dev/mapper/rotating_backup-rotating_backup
>
> however lvdisplay, vgdisplay and pvdisplay can't access it:
>> vgdisplay
>  /dev/md0: read failed after 0 of 4096 at 0: Input/output error
>  /dev/dm-5: read failed after 0 of 4096 at 0: Input/output error
>  --- Volume group ---
>  VG Name               500_lacie
> ...
>
> but the raid device files don't exist (the drive I plugged back in later
> was given a new device name, /dev/sda1) and obviously raid is not very
> happy anymore:
>
>> cat /proc/mdstat
> Personalities : [raid1]
> md0 : active raid1 sdc1[0] sdb1[2](F)
>      976762432 blocks [2/1] [U_]
>      bitmap: 147/233 pages [588KB], 2048KB chunk
>> ls -lA /dev/sdc1 /dev/sdb1 /dev/md0
> ls: cannot access /dev/sdc1: No such file or directory
> ls: cannot access /dev/sdb1: No such file or directory
> brw-rw---- 1 root disk 9, 0 May 10 09:22 /dev/md0
>
>
> Does anyone know a way out of this, sans rebooting?
> I don't suspect I could just add /dev/sda1 back into the array because I'm
> sure LVM would still complain about IO errors even if raid would let me (I
> suspect raid itself will also fail to add the disk back because it is
> still trying to be active but has no live disks so would be completely
> inconsistent).
>
> Is it possible to force both lvm and md to give up on the device so I can
> readd them without rebooting (since they're not going to be anymore
> corrupt yet than you'd expect from an unclean shutdown, because there's
> been no IO to them yet, so I should just be able to readd them, mount and
> resync)?
>
Only one of disks in this RAID1failed, it should continue to work with
degraded state.
Why LVM complained with I/O errors??
> --
> TimC
> "This company performed an illegal operation but they will not be shut
> down."     -- Scott Harshbarger from consumer lobby group on Microsoft
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
The simplest is not all best but the best is surely the simplest!




More information about the dm-devel mailing list