[dm-devel] [QUESTION]: multipath device with wrong path lead to metadata err

Wed Jan 20 13:02:13 UTC 2021

> verify_paths() would detect this. We do call verify_paths() in
> coalesce_paths() before calling domap(), but not immediately before.
> Perhaps we should move the verify_paths() call down to immediately
> before the domap() call. That would at least minimize the time window
> for this race. It's hard to avoid it entirely. The way multipathd is
> written, the vecs lock is held all the time during coalesce_paths(),
> and thus no uevents can be processed. We could also consider calling
> verify_paths() before *and* after domap().
>
> Was this a map creation or a map reload? Was the map removed after the
> failure? Do you observe the message "ignoring map" or "removing map"?
>
> Do you observe a "remove" uevent for sdi?
>

>
> I wonder if you'd see the issue also if you run the same test without
> the "multipath -F; multipath -r" loop, or with just one. Ok, one
> multipath_query() loop simulates an admin working on the system, but 2
> parallel loops - 2 admins working in parallel, plus the intensive
> sequence of actions done in multipathd_query at the same time? The
> repeated "multipath -r" calls and multipathd commands will cause
> multipathd to spend a lot of time in reconfigure() and in cli_* calls
> holding the vecs lock, which makes it likely that uevents are missed or
> processed late.
>
> Don't get me wrong, I don't argue against tough testing. But we should
> be aware that there are always time intervals during which multipathd's
> picture of the present devices is different from what the kernel sees.
>
> There's definitely room for improvement in multipathd wrt locking and
> event processing in general, but that's a BIG piece of work.
>
>

I don't know if this helps, or is exactly like what he is duplicating:

I debugged and verified a corruption issue a few years ago where this
was what happened:

DiskA was presented at say sdh (via SAN) and a multipath device was
created on top of its paths, then diskA was unpresented and new disks
were put back in the same zone.
DiskB was now presented in the same slot (zone+lunid/sdh) and
inherited by the still in place multipath device/mapping.    In this
case I don't believe there was ever a device level event for sdh.

In our case, we did not log a case with our vendor as this was never
supposed to happen and this seemed at the time to be how we expected
it to work and because during the unpresent a script was supposed to
have been run to clean up all of the dead paths prior to any new
storage being presented.

You might have to verify the path is still the same device any time
you are recovering from a path like failure since in the above case
the sdh device always existed but had the underlying storage/lun
changed.   I am not sure in his case if this is what is going on or if
the sdX device he is using actually goes away in his test.