[linux-lvm] Re: Found: workaround for crash on snapshot removal, and hopefully a good clue to the underlying bug
James G. Sack (jim)
jsack at tandbergdatacorp.com
Fri Dec 9 02:07:32 UTC 2005
More testing results..
A) The snapshot create/remove cycle with the suspend/resume calls around
the lvremove ran over 1500 passes before I stopped it -- all the while
with continuous i/o to the origin filesystem. Remember this is on a
patched 2.6.14-1_1637_FC4 (patches listed in previous message below).
B) I installed vanilla FC4 build 2.6.14-1.1644_FC4, and tried the same
test, but in this case, the suspend prevents lvremove from running -- I
guess an automatic suspend has been added to 1644 which was missing or
broken in 1637 (maybe?). Anyway, I took out the suspend/resume and the
crash came back! So maybe the patches had something to do with test A
succeeding?
C) I rebooted to valilla 2.6.14-1_1637_FC4 and am now starting a test
with the suspend/resume calls around the lvremove. So far it looks like
it's passed a few dozen cycles. So maybe the patches are irrelevant.
Can anybody make any sense of this?
I'm logging 'level = 6' to lvm2.log -- would anybody be able to suggest
what to look for in there? Hmmm, maybe tomorrow, I should create a
simple log with a single failure to see if there's any locking
asymmetries or something like that.
Another context reminder: I'm runnning
lvm version
LVM version: 2.02.01-cvs (2005-11-10)
Library version: 1.02.01-cvs (2005-11-10)
Driver version: 4.4.0
Will let the test run overnight, and report tomorrow.
Regards,
..jim
On Thu, 2005-12-08 at 17:41 -0800, James G. Sack (jim) wrote:
> Hooray!
>
> I think I've found a definitive clue to a crash during lvremove of a
> snapshot. I have a reliably repeatable failure test and a workaround
> that seems to be passing.
>
> Here's the regression test:
> --------------------------
>
> 1. arrange to have some continuous i/o on an lvm volume
> I do it with a simple shell loop that copies a 1GB file to another name
> and then back (essentially: 'while :;do cp abcd wxyz;cp wxyz abcd;done')
>
> 2. while that's running, start a snapshot create/remove loop
> Such as 'while :;do lvcreate -snSnap -L10G LVorigin;
> lvremove -f /dev/VG/Snap;done
>
> My experience is that a system crash always occurs upon executing the
> lvremove call. The first one!
>
> (On my most recent experiments, the system is locking hard,
> although earlier I was able to see a kcopyd oops and the
> keyboard scollback worked.)
>
>
> Here's the workaround
> ---------------------
>
> In the snap-cycle test surround the lvremove command with suspend/resume
> dmsetup suspend VG-LVorigin
> lvremove -f /dev/VGorigin/Snap
> dmsetup resume VG-LVorigin
>
> I am currently testing this workaround on a patched 2.6.14-1.1637_FC4
> kernel
> (using 4 patches suggested by agk on Tue, 15 Nov 2005 22:33:58 +0000)
>
> <excerpt from that prior message>
> ---------------------------------
> > > The kcopyd.c BUG at line 145 is triggered by the first lvremove
> > > following start of the i/o (copy loop).
>
> Try some kernel patches.
>
> http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/
>
> in particular these four:
>
> dm-snapshot-bio_list-fix.patch
> dm-snapshot-metadata-reading-separation.patch
> dm-snapshot-load-metadata-on-creation.patch
> dm-ioctl-reduce-pf-memalloc-usage.patch
> </excerpt>
>
>
> ==> BUT I suspect the lvremove problem is independent of those patches,
> as I was getting the same symptom before putting in the suspend/resume.
>
>
> I thought I had tried suspend/resume previously and found that they were
> unnecessary because the create automatically performed a suspend/resume
> -- so my current workaround is the result of a desperation-experiment of
> applying the suspend/resume wrapper ONLY to the lvremove step.
>
> ==> SO MAYBE this current success points to a bug in the lvremove code,
> eh?
>
>
> I plan on repeating my test on a vanilla kernel. In the meantime, I hope
> someone can look at the lvremove code (agk?..).
>
> Regards,
> ..jim
>
>
More information about the linux-lvm
mailing list