[dm-devel] [QUESTION]: multipath device with wrong path lead to metadata err
lixiaokeng
lixiaokeng at huawei.com
Wed Jan 20 02:30:58 UTC 2021
Hi Martin:
Thanks for your reply.
> verify_paths() would detect this. We do call verify_paths() in
> coalesce_paths() before calling domap(), but not immediately before.
> Perhaps we should move the verify_paths() call down to immediately
> before the domap() call. That would at least minimize the time window
> for this race. It's hard to avoid it entirely. The way multipathd is
> written, the vecs lock is held all the time during coalesce_paths(),
> and thus no uevents can be processed. We could also consider calling
> verify_paths() before *and* after domap().
Can calling verify_paths() before *and* after domap() deal this entirely?
> Was this a map creation or a map reload? Was the map removed after the
> failure? Do you observe the message "ignoring map" or "removing map"?
>
> Do you observe a "remove" uevent for sdi?
This was a map reload but sdi was not in old map. The "removing map"
was observed. The "remove" uevent for sdi was not observed here.
> I wonder if you'd see the issue also if you run the same test without
> the "multipath -F; multipath -r" loop, or with just one. Ok, one
> multipath_query() loop simulates an admin working on the system, but 2
> parallel loops - 2 admins working in parallel, plus the intensive
> sequence of actions done in multipathd_query at the same time? The
> repeated "multipath -r" calls and multipathd commands will cause
> multipathd to spend a lot of time in reconfigure() and in cli_* calls
> holding the vecs lock, which makes it likely that uevents are missed or
> processed late.
As you said, there were lots of cli_* calls but no uevent when error
caused. And after finishing them, hundreds of uevent will be found (for
example ,"Forwarding 201 uevents" in log).
> Don't get me wrong, I don't argue against tough testing. But we should
> be aware that there are always time intervals during which multipathd's
> picture of the present devices is different from what the kernel sees.
What you said is very reasonable. When this problem was found, I think
it is difficult to solve that entirely, while it is hard to happen. Well,
I will discuss the rationality of test scripts with testers.
> There's definitely room for improvement in multipathd wrt locking and
> event processing in general, but that's a BIG piece of work.
Thanks again!
Regards
Lixiaokeng
More information about the dm-devel
mailing list