[dm-devel] [PATCH 4/5] multipathd: disable queueing for recreated map in uev_remove_map

Martin Wilck mwilck at suse.com
Tue Aug 18 19:23:51 UTC 2020


Hi Lixiaokeng,

On Tue, 2020-08-18 at 21:09 +0800, lixiaokeng wrote:
> There may be a race window here:
> 1. all paths gone, causing map flushed both from multipathd and
> kernel
> 2. paths regenerated, causing multipathd creating the map again.
> 
> 1 will generate a remove uevent which can be handled after 2, so we
> can
> disable queueing for the map created by 2 here temporarily and let
> the
> change uevent (generated by 2) calling uev_add_map->setup_multipath
> to set queueing again. This can prevent the deadlock in this race
> window.
> 
> The possible deadlock is: all udevd workers hangs in devices because
> of
> queue_if_no_path, so no udevd workers can handle new event and since
> multipathd will remove the map, the checkerloop cannot check this
> map's
> retry tick timeout and cancel the io hang which makes the udevd
> worker
> hang forever. multipathd cannot receive any uevent from udevd because
> all udevd workers hang there so the map cannot be recreated again
> which
> makes a deadlock.
> 
> Signed-off-by: Lixiaokeng at huawei.com

A map which is removed and not yet re-added again (as far as udev is
concerned) doesn't need to queue because it can't possibly be in use.
So I think the patch can't hurt in other scenarios, at it makes sense
in the situation you describe. However I have a few questions.

Have you observed this, or is it theory? I'm wondering: After 2) there
should be some paths again, so why would the udev workers hang? 
I guess this could happen if the regenerated paths all in failed /
standby state, is that what you mean?

Note also that we set DM_NOSCAN in the udev rules when there are no
usable paths, so udev workers would only hang if the last path fails /
is removed after the "multipath -U" check.

You've certainly hit a weak spot here, and you've nicely described a
potential problem scenario. The delayed processing of uevents that
multipathd triggered itself is a recurring cause of headache.

Regards,
Martin



> ---
>  multipathd/main.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index baa18183..d7e20a10 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -798,6 +798,7 @@ uev_remove_map (struct uevent * uev, struct
> vectors * vecs)
>  		goto out;
>  	}
> 
> +	dm_queue_if_no_path(alias, 0);
>  	remove_map_and_stop_waiter(mpp, vecs);
>  out:
>  	lock_cleanup_pop(vecs->lock);





More information about the dm-devel mailing list