[dm-devel] [PATCH 7/7] dm-mpath: Fix a race condition in __multipath_map()

Mike Snitzer snitzer at redhat.com
Wed Nov 23 00:48:03 UTC 2016


On Tue, Nov 22 2016 at  6:47pm -0500,
Bart Van Assche <bart.vanassche at sandisk.com> wrote:

> On 11/21/2016 04:34 PM, Mike Snitzer wrote:
> >But you WARN_ON_ONCE(clone && q->mq_ops) will trigger with sq-on-mq.
> 
> Hello Mike,
> 
> This behavior is indeed triggered by the sq-on-mq test. After having
> added the following code in __bind():
> 
> 	if (old_map &&
> 	    dm_table_all_blk_mq_devices(old_map) !=
> 	    dm_table_all_blk_mq_devices(t))
> 		pr_debug("%s: old all_blk_mq %d <> new all_blk_mq %d\n",
> 			 dm_device_name(md),
> 			 dm_table_all_blk_mq_devices(old_map),
> 			 dm_table_all_blk_mq_devices(t));
> 
> I see the following output appear frequently in the kernel log:
> 
> dm_mod:__bind: 254:0: old all_blk_mq 1 <> new all_blk_mq 0
> 
> Could these all_blk_mq state changes explain that WARN_ON_ONCE(clone
> && q->mq_ops) is triggered in __multipath_map()? Does this mean that
> the comment in patch http://marc.info/?l=dm-devel&m=147925314306752
> is correct?

Yes, looks like you're on to something.  dm_old_prep_tio() has:

        /*
         * Must clone a request if this .request_fn DM device
         * is stacked on .request_fn device(s).
         */
        if (!dm_table_all_blk_mq_devices(table)) { ...

So if your table->all_blk_mq is false (likely due to a temporary no
paths in multipath table scenario) a clone will be passed to
__multipath_map().  But what isn't clear is how __multipath_map() would
then go on to have any underlying paths available to issue IO to (given
the table would have been empty -- or so I would think).

Would be great if you could verify that the table is in fact empty.

It should be noted that dm_table_determine_type() has:

        if (list_empty(devices) && __table_type_request_based(live_md_type)) {
                /* inherit live MD type */
                t->type = live_md_type;
                return 0;
        }

So this explains how/why an empty table will inherit the
DM_TYPE_MQ_REQUEST_BASED, and it also explains why the new (empty) table
would not have ->all_blk_mq set to true.  But again, no IO is able to be
issued when there are no underlying paths.

And looking closer, __multipath_map() _should_ return early with either
DM_MAPIO_DELAY_REQUEUE or DM_MAPIO_REQUEUE when no paths are available.

Not seeing how you could have this scenario where you prepared a clone
(as if the clone request were to be issued to a .request_fn, aka "sq",
device) and then by the time you get into __multipath_map() there is an
underlying path available with q->mq_ops.

But regardless, what certainly needs fixing is the inconsistency of
inheriting DM_TYPE_MQ_REQUEST_BASED but not setting all_blk_mq to true
(all_blk_mq == true is implied by DM_TYPE_MQ_REQUEST_BASED).

I'm now questioning why we even need the 'all_blk_mq' state within the
table.  I'll revisit the "why?" on all that in my previous commits.
I'll likely get back with you on this point tomorrow.  And will
hopefully have a fix for you.

FYI: given all this I do think something like your 7/7 patch (which you
referenced with the above url) would be a possible added safety net to
guard against future inconsistencies/bug/regressions.




More information about the dm-devel mailing list