[dm-devel] [PATCH 2/7] multipathd: fix check_path errors with removed map

Hannes Reinecke hare at suse.de
Fri Jun 19 06:32:34 UTC 2020


On 6/19/20 1:17 AM, Benjamin Marzinski wrote:
> On Thu, Jun 18, 2020 at 07:34:38PM +0000, Martin Wilck wrote:
>> On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
>>> If a multipath device is removed during, or immediately before the
>>> call
>>> to check_path(), multipathd can behave incorrectly. A missing
>>> multpath
>>> device will cause update_multipath_strings() to fail, setting
>>> pp->dmstate to PSTATE_UNDEF.  If the path is up, this state will
>>> cause
>>> reinstate_path() to be called, which will also fail.  This will
>>> trigger
>>> a reload, restoring the recently removed device.
>>>
>>> If update_multipath_strings() fails because there is no multipath
>>> device, check_path should just quit, since the remove dmevent and
>>> uevent
>>> are likely already queued up. Also, I don't see any reason to reload
>>> the
>>> multipath device if reinstate fails. This code was added by
>>> fac68d7a99ef17d496079538a5c6836acd7911ab, which clamined that
>>> reinstate
>>> could fail if the path was disabled.  Looking through the current
>>> kernel
>>> code, I can't see any reason why a reinstate would fail, where a
>>> reload
>>> would help. If the path was missing from the multipath device,
>>> update_multipath_strings() would already catch that, and quit
>>> check_path() early, which make more sense to me than reloading does.
>>
>> fac68d7 is related to the famous "dm-multipath: Accept failed paths for
>> multipath maps" patch (e.g.
>> https://patchwork.kernel.org/patch/3368381/#7193001), which never made
>> it upstream. SUSE kernels have shipped this patch for a long time, but
>> we don't apply it any more in recent kernels.
>>
>> With this patch, The reinstate_path() failure would occur if multipathd
>> had created a table with a "disabled" device (one which would be
>> present in a dm map even though the actual block device didn't exist),
>> and then tried to reinstate such a path. It sounds unlikely, but it
>> might be possible if devices are coming and going in quick succession.
>> In that situation, and with the "accept failed path" patch applied, a
>> reload makes some sense, because reload (unlike reinstate) would not
>> fail (at least not for this reason) and would actually add that just-
>> reinstated path. OTOH, it's not likely that the reload would improve
>> matters, either. After all, multipathd is just trying to reinstate a
>> non-existing path. So, I'm fine with skipping the reload.
>>
It's actually _not_ unlikely, but a direct result of multipathd 
listening to uevents.

If you have a map with four paths, and all four of them are going down, 
you end up getting four events.
And multipath will be processing each event _individually_, causing it 
to generate a reload sequence like:

[a b c d]
[b c d]
[c d]
[d]
[]

Of which only the last is valid, as all the intermediate ones contain 
invalid paths.

And _that_ is the scenario I was referring to when creating the patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare at suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer





More information about the dm-devel mailing list