[dm-devel] dm-io: reject unsupported DISCARD/WRITE SAME requests with EOPNOTSUPP
Mikulas Patocka
mpatocka at redhat.com
Thu Feb 26 19:56:20 UTC 2015
On Fri, 13 Feb 2015, Mike Snitzer wrote:
> On Fri, Feb 13 2015 at 4:24am -0500,
> Darrick J. Wong <darrick.wong at oracle.com> wrote:
>
> > I created a dm-raid1 device backed by a device that supports DISCARD
> > and another device that does NOT support DISCARD with the following
> > dm configuration:
> >
> > # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo
> > # lsblk -D
> > NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
> > sda 0 4K 1G 0
> > `-moo (dm-0) 0 4K 1G 0
> > sdb 0 0B 0B 0
> > `-moo (dm-0) 0 4K 1G 0
> >
> > Notice that the mirror device /dev/mapper/moo advertises DISCARD
> > support even though one of the mirror halves doesn't.
> >
> > If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl
> > BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite
> > loop in do_region() when it tries to issue a DISCARD request to sdb.
> > The problem is that when we call do_region() against sdb, num_sectors
> > is set to zero because q->limits.max_discard_sectors is zero.
> > Therefore, "remaining" never decreases and the loop never terminates.
> >
> > Before entering the loop, check for the combination of REQ_DISCARD and
> > no discard and return -EOPNOTSUPP to avoid hanging up the mirror
> > device. Fix the same problem with WRITE_DISCARD while we're at it.
> >
> > This bug was found by the unfortunate coincidence of pvmove and a
> > discard operation in the RHEL 6.5 kernel; 3.19 is also affected.
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong at oracle.com>
> > Cc: "Martin K. Petersen" <martin.petersen at oracle.com>
> > Cc: Srinivas Eeda <srinivas.eeda at oracle.com>
>
> Your patch looks fine but it is laser focused on dm-io. Again, that is
> fine (fixes a real problem). But I'm wondering how other targets will
> respond in the face of partial discard support across the logical
> address space of the DM device.
>
> When I implemented dm_table_supports_discards() I consciously allowed a
> DM table to contain a mix of discard support. I'm now wondering where
> it is we benefit from that? Seems like more of a liability than
> anything -- so a bigger hammer approach to fixing this would be to
> require all targets and all devices in a DM table support discard.
> Which amounts to changing dm_table_supports_discards() to be like
> dm_table_supports_write_same().
>
> BTW, given dm_table_supports_write_same(), your patch shouldn't need to
> worry about WRITE SAME. Did you experience issues with WRITE SAME too
> or were you just being proactive?
>
> Mike
I think that Darrick's patch is needed even for WRITE SAME.
Note that queue limits and flags can't be reliably prevent bios from
coming in.
For example:
1. Some piece of code tests queue flags and sees that
max_write_same_sectors is non-zero, it constructs WRITE_SAME bio and sends
it with submit_bio.
2. Meanwhile, the device is reconfigured so that it doesn't support
WRITE_SAME. q->limits.max_write_same_sectors is set to zero.
3. The bio submitted at step 1 can't be reverted, so it arrives at the
device mapper even if it advertises that it doesn't support write same -
now, it causes the lockup that Darrick observed.
Another problem is that queue flags are not propagated up when you reload
a single device - someone could reload a mirror leg with a different dm
table that doesn't support write_same, and even after the reload, the
mirror keeps advertising that it does support WRITE_SAME.
Mikulas
More information about the dm-devel
mailing list