[dm-devel] [PATCH 2/7] multipathd: fix check_path errors with removed map
Martin Wilck
Martin.Wilck at suse.com
Thu Jun 18 19:34:38 UTC 2020
On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> If a multipath device is removed during, or immediately before the
> call
> to check_path(), multipathd can behave incorrectly. A missing
> multpath
> device will cause update_multipath_strings() to fail, setting
> pp->dmstate to PSTATE_UNDEF. If the path is up, this state will
> cause
> reinstate_path() to be called, which will also fail. This will
> trigger
> a reload, restoring the recently removed device.
>
> If update_multipath_strings() fails because there is no multipath
> device, check_path should just quit, since the remove dmevent and
> uevent
> are likely already queued up. Also, I don't see any reason to reload
> the
> multipath device if reinstate fails. This code was added by
> fac68d7a99ef17d496079538a5c6836acd7911ab, which clamined that
> reinstate
> could fail if the path was disabled. Looking through the current
> kernel
> code, I can't see any reason why a reinstate would fail, where a
> reload
> would help. If the path was missing from the multipath device,
> update_multipath_strings() would already catch that, and quit
> check_path() early, which make more sense to me than reloading does.
fac68d7 is related to the famous "dm-multipath: Accept failed paths for
multipath maps" patch (e.g.
https://patchwork.kernel.org/patch/3368381/#7193001), which never made
it upstream. SUSE kernels have shipped this patch for a long time, but
we don't apply it any more in recent kernels.
With this patch, The reinstate_path() failure would occur if multipathd
had created a table with a "disabled" device (one which would be
present in a dm map even though the actual block device didn't exist),
and then tried to reinstate such a path. It sounds unlikely, but it
might be possible if devices are coming and going in quick succession.
In that situation, and with the "accept failed path" patch applied, a
reload makes some sense, because reload (unlike reinstate) would not
fail (at least not for this reason) and would actually add that just-
reinstated path. OTOH, it's not likely that the reload would improve
matters, either. After all, multipathd is just trying to reinstate a
non-existing path. So, I'm fine with skipping the reload.
>
> Signed-off-by: Benjamin Marzinski <bmarzins at redhat.com>
> ---
> multipathd/main.c | 36 ++++++++++++++----------------------
> 1 file changed, 14 insertions(+), 22 deletions(-)
>
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 6b7db2c0..8fb73922 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1611,22 +1611,18 @@ fail_path (struct path * pp, int del_active)
> /*
> * caller must have locked the path list before calling that
> function
> */
> -static int
> +static void
> reinstate_path (struct path * pp)
> {
> - int ret = 0;
> -
> if (!pp->mpp)
> - return 0;
> + return;
>
> - if (dm_reinstate_path(pp->mpp->alias, pp->dev_t)) {
> + if (dm_reinstate_path(pp->mpp->alias, pp->dev_t))
> condlog(0, "%s: reinstate failed", pp->dev_t);
> - ret = 1;
> - } else {
> + else {
> condlog(2, "%s: reinstated", pp->dev_t);
> update_queue_mode_add_path(pp->mpp);
> }
> - return ret;
> }
> static void
> @@ -2088,8 +2084,13 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
> * Synchronize with kernel state
> */
> if (update_multipath_strings(pp->mpp, vecs->pathvec, 1)) {
> + struct dm_info info;
> condlog(1, "%s: Could not synchronize with kernel
> state",
> pp->dev);
> + if (pp->mpp && pp->mpp->alias &&
> + do_dm_get_info(pp->mpp->alias, &info) == 0)
> + /* multipath device missing. Likely removed */
> + return 0;
> pp->dmstate = PSTATE_UNDEF;
It would be more elegant if we could distinguish different failure
modes from update_multipath_strings() directly, without having to call
do_dm_get_info() again.
> }
> /* if update_multipath_strings orphaned the path, quit early */
> @@ -2179,12 +2180,8 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
> /*
> * reinstate this path
> */
> - if (!disable_reinstate && reinstate_path(pp)) {
> - condlog(3, "%s: reload map", pp->dev);
> - ev_add_path(pp, vecs, 1);
> - pp->tick = 1;
> - return 0;
> - }
> + if (!disable_reinstate)
> + reinstate_path(pp);
> new_path_up = 1;
>
> if (oldchkrstate != PATH_UP && oldchkrstate !=
> PATH_GHOST)
> @@ -2200,15 +2197,10 @@ check_path (struct vectors * vecs, struct
> path * pp, unsigned int ticks)
> else if (newstate == PATH_UP || newstate == PATH_GHOST) {
> if ((pp->dmstate == PSTATE_FAILED ||
> pp->dmstate == PSTATE_UNDEF) &&
> - !disable_reinstate) {
> + !disable_reinstate)
> /* Clear IO errors */
> - if (reinstate_path(pp)) {
> - condlog(3, "%s: reload map", pp->dev);
> - ev_add_path(pp, vecs, 1);
> - pp->tick = 1;
> - return 0;
> - }
> - } else {
> + reinstate_path(pp);
> + else {
> LOG_MSG(4, verbosity, pp);
> if (pp->checkint != max_checkint) {
> /*
--
Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer
More information about the dm-devel
mailing list