[linux-lvm] Regression with FALLOC_FL_PUNCH_HOLE in 3.5-rc kernel
snitzer at redhat.com
Mon Jul 2 13:41:04 UTC 2012
On Mon, Jul 02 2012 at 6:35am -0400,
Lukáš Czerner <lczerner at redhat.com> wrote:
> > So you're testing rather old kernel so you might be missing some
> > fixes there. Could you rerun the test with the recent kernel ?
> > Also it appears that the bug here happens because dm requested a
> > destination page which is within the kernel space. It seems that
> > this has been initiated by the write request from the mirror target.
> > So I do not immediately see how punch hole (discard) is involved at
> > all. You might have been lucky enough to hit a different bug
> > probably ?
> > Looking at git log, this commit has been brought to my attention:
> > 0c535e0d6f463365c29623350dbd91642363c39b dm io: fix discard support
> > seems related to this crash.
> > Please retest with recent kernel.
Ah, you beat me to recommending that fix ;)
> So from the original backtrace for the problem Zdenek is seeing on 3.5.0-rc4
> (https://lkml.org/lkml/2012/6/30/98) I think that this is
> problem in the device mapper itself. I do not think it has anything
> to do with tmpfs or mm. According to bisects from Zdenek it clearly
> shows that the problem appear when the discard support for the loop
> device is added, so it is most likely related to the dm discard support.
What about using scsi_debug with the dm-mirror target?
Never say never, DM-mirror and/or dm-io code could still have an issue,
but the commit referenced above did fix discard with the mirror target
back in 3.3.
> Anyway, the backtrace points to the NULL pointed dereference in
> dm_rh_region_context() which is simple function:
> void *dm_rh_region_context(struct dm_region *reg)
> return reg->rh->context;
> so either reg, or reg-rh is NULL. Now the only place this is used is
> from recovery_complete() in dm-raid1.c. So this is somewhat related
> to raid recovery. I am not familiar with the dm code, but can
> someone from the dm team look at this ?
I'll coordiinate with Zdenek.
> But just to be sure to rule out the punch hole thing Zdenek can you
> run your tests on the "real" discard capable device ? Or at least on
> the device which does not convert discard requests into punch hole ?
> You can use scsi_debug to create such device:
> modprobe scsi_debug dev_size_mb=16 sector_size=512 num_tgts=1 lbpu=1
Great minds think alike ;)
More information about the linux-lvm