[dm-devel] [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones

Junichi Nomura j-nomura at ce.jp.nec.com
Thu May 28 06:38:29 UTC 2015


On 05/27/15 18:50, Junichi Nomura wrote:
> On 05/27/15 17:21, Christoph Hellwig wrote:
>> On Tue, May 26, 2015 at 06:20:43AM +0000, Junichi Nomura wrote:
>>> Not completing bios is not sufficient.
>>> If you advance the bi_iter to the end, you need to somehow rewind it
>>> or the re-submission will be incomplete, that would end up as a data
>>> corruption...

Less critical than the data corruption issue,
I'm also worried about partial completion case.
For successful partial completion, current code completes
bio before fully completing the request.
Your patch changes bios not completed until the request is
fully completed.

Other related concern is partial failure. In the case of bad sector,
for example, current code fails I/O for the particular sector but
other sectors in the request succeeds.
If you make the request completion as all-or-nothing model,
that will be a degrade for such a case.

I'm not very sure how much impact does the removal of partial
completion have in the real world.
If partial completion is so negligible, I think it should be
handled in such a way all the cases, instead of special casing
REQ_CLONE.

>> Can you explain which particular case you're worried about?
> 
> General path failure case.
> 
> On retrying, another clone is created but bios it points to
> are already advanced to the end with your patch.
> So they look like bios with no remaining segments.
> Lower driver may successfully completes such a resubmitted
> clone *without doing actual I/O*.
> Then written data will be lost / read data will be bogus.
> 
> Can you test this scenario with your patch?
>   1. Set up a multipath device with fail-over mode
>   2. Write something to the multipath device.
>      After the clone request is sent to the primary path
>      and before the data goes to the disk, 
>      down the primary path
>      (e.g. echo offline > /sys/block/sdXX/device/state)
>   3. (dm-mpath will retry from the secondary path and
>       the write will eventually succeed)
>   4. Verify if the written data is really on the disk

I made a small script so that people can play with.
The script sets up tcm_loop multipath device and fio verification
test while repeating paths up and down quickly.

When your patch is applied, fio reports verification failure within
a minute like this:

# ./stress-mp.sh
..
test1: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, iodepth=2
fio-2.2.8-16-g68d9
Starting 1 process
meta: verify failed at file /dev/mapper/mp offset 477626368, length 524288
       received data dumped as mp.477626368.received
       expected data dumped as mp.477626368.expected
fio: pid=13560, err=84/file:io_u.c:1866, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character

test1: (groupid=0, jobs=1): err=84 (file:io_u.c:1866, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character): pid=13560: Thu May 28 01:54:56 2015

-- 
Jun'ichi Nomura, NEC Corporation

-------------- next part --------------
A non-text attachment was scrubbed...
Name: stress-mp.sh
Type: application/x-sh
Size: 1677 bytes
Desc: stress-mp.sh
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20150528/975a371c/attachment.sh>


More information about the dm-devel mailing list