[dm-devel] dm-io: reject unsupported DISCARD/WRITE SAME requests with EOPNOTSUPP

Fri Feb 27 18:42:09 UTC 2015

On Thu, Feb 26, 2015 at 02:56:20PM -0500, Mikulas Patocka wrote:
> 
> 
> On Fri, 13 Feb 2015, Mike Snitzer wrote:
> 
> > On Fri, Feb 13 2015 at  4:24am -0500,
> > Darrick J. Wong <darrick.wong at oracle.com> wrote:
> > 
> > > I created a dm-raid1 device backed by a device that supports DISCARD
> > > and another device that does NOT support DISCARD with the following
> > > dm configuration:
> > > 
> > > # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo
> > > # lsblk -D
> > > NAME         DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
> > > sda                 0        4K       1G         0
> > > `-moo (dm-0)        0        4K       1G         0
> > > sdb                 0        0B       0B         0
> > > `-moo (dm-0)        0        4K       1G         0
> > > 
> > > Notice that the mirror device /dev/mapper/moo advertises DISCARD
> > > support even though one of the mirror halves doesn't.
> > > 
> > > If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl
> > > BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite
> > > loop in do_region() when it tries to issue a DISCARD request to sdb.
> > > The problem is that when we call do_region() against sdb, num_sectors
> > > is set to zero because q->limits.max_discard_sectors is zero.
> > > Therefore, "remaining" never decreases and the loop never terminates.
> > > 
> > > Before entering the loop, check for the combination of REQ_DISCARD and
> > > no discard and return -EOPNOTSUPP to avoid hanging up the mirror
> > > device.  Fix the same problem with WRITE_DISCARD while we're at it.
> > > 
> > > This bug was found by the unfortunate coincidence of pvmove and a
> > > discard operation in the RHEL 6.5 kernel; 3.19 is also affected.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong at oracle.com>
> > > Cc: "Martin K. Petersen" <martin.petersen at oracle.com>
> > > Cc: Srinivas Eeda <srinivas.eeda at oracle.com>
> > 
> > Your patch looks fine but it is laser focused on dm-io.  Again, that is
> > fine (fixes a real problem).  But I'm wondering how other targets will
> > respond in the face of partial discard support across the logical
> > address space of the DM device.
> > 
> > When I implemented dm_table_supports_discards() I consciously allowed a
> > DM table to contain a mix of discard support.  I'm now wondering where
> > it is we benefit from that?  Seems like more of a liability than
> > anything -- so a bigger hammer approach to fixing this would be to
> > require all targets and all devices in a DM table support discard.
> > Which amounts to changing dm_table_supports_discards() to be like
> > dm_table_supports_write_same().
> > 
> > BTW, given dm_table_supports_write_same(), your patch shouldn't need to
> > worry about WRITE SAME.  Did you experience issues with WRITE SAME too
> > or were you just being proactive?
> > 
> > Mike
> 
> I think that Darrick's patch is needed even for WRITE SAME.
> 
> Note that queue limits and flags can't be reliably prevent bios from 
> coming in.
> 
> For example:
> 
> 1. Some piece of code tests queue flags and sees that 
> max_write_same_sectors is non-zero, it constructs WRITE_SAME bio and sends 
> it with submit_bio.
> 
> 2. Meanwhile, the device is reconfigured so that it doesn't support 
> WRITE_SAME. q->limits.max_write_same_sectors is set to zero.
> 
> 3. The bio submitted at step 1 can't be reverted, so it arrives at the 
> device mapper even if it advertises that it doesn't support write same - 
> now, it causes the lockup that Darrick observed.

I'd pondered patching the WRITE SAME case too.

> Another problem is that queue flags are not propagated up when you reload 
> a single device - someone could reload a mirror leg with a different dm 
> table that doesn't support write_same, and even after the reload, the 
> mirror keeps advertising that it does support WRITE_SAME.

Not sure how to deal with that -- I suppose we could save the 'last recorded q
limits' and watch for changes, but I suppose that depends on how resilient
callers are against DISCARD and WRITE SAME returning -EOPNOTSUPP.  So far as I
can tell the in-kernel caller handles it gracefully enough, and I'd hope that
any sane userland program would know to follow up with a regular write (if
applicable), but who knows...

> It comes to another idea - if the limits change while the do-while loop is 
> in progress, even the original Darrick's patch is wrong and fails to 
> prevent the lockup. So - we need to read the limits in advance, test them 
> and never re-read them.

Agreed.  I /think/ issuing discard/write same to a device that no longer
supports it will return an error... or at least in my crummy 5 minute test it
seemed to work.

--D

> 
> 
> Mikulas
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel