[dm-devel] [PATCH] 9/9: Reactivate paths that have come back to life.
Kevin Corry
kevcorry at us.ibm.com
Thu Jan 22 12:55:02 UTC 2004
Hi,
I've recently realized that this patch introduced a potential deadlock.
In multipath_suspend(), we iterate through the groups and paths while holding
the path_lock spinlock. For each path, we down that path's test_lock
semaphore in order to stop all path-testing while the device is suspended.
However, if a path is currently being tested when multipath_suspend() is
called, then multipath_suspend() has to wait on the test_lock semaphore,
while holding the path_lock spinlock. When the test bio completes,
test_endio() needs to take the path_lock spinlock in case it has to move a
path from the valid list to the invalid list (or vice-versa). Deadlock.
So I thought for a long time about how to prevent this, and couldn't come to
an easy answer. We need the test_lock to prevent the same test bio from being
submitted multiple times and to be able to stop testing altogether during a
suspend. But we also need the path_lock to protect places where we have to
move paths between the lists.
So this got me thinking a bit more...why do we actually need to separate the
valid and invalid paths into different lists? It would be just as simple to
have an atomic_t in the path struct to indicate if a path was valid or
invalid. Combining the valid and invalid lists would completely eliminate the
need for the path_lock, since the group list and path list would never have
to change once the mapping was constructed.
The only advantage I currently see for the separate lists is that
__find_priority_group() can easily determine if a group has any valid paths
to select. But this could also be accomplished with another atomic_t counter
in the priority_group struct to keep track of the number of valid paths.
So, unless somebody is terribly adverse to this idea, I'm going to start
working on a patch to combine the valid and invalid lists in the
priority_group into a single list, which will hopefully simplify the locking
quite a bit.
-Kevin
On Tuesday 13 January 2004 16:01, Kevin Corry wrote:
> If a test bio completes successfully and the path is currently invalid,
> reactivate the path.
> --- diff/drivers/md/dm-mpath.c 2004-01-13 15:21:51.000000000 -0600
> +++ source/drivers/md/dm-mpath.c 2004-01-13 15:44:45.000000000 -0600
> @@ -199,15 +199,32 @@
> path->pg->ps->type->set_path_state(path->pg->ps, path, 0);
> }
>
> +static void __recover_path(struct path *path)
> +{
> + if (!path->has_failed)
> + return;
> +
> + atomic_set(&path->fail_count, MPATH_FAIL_COUNT);
> + list_move(&path->list, &path->pg->valid_paths);
> + path->pg->ps->type->set_path_state(path->pg->ps, path, 1);
> + path->has_failed = 0;
> +}
> +
> static int test_endio(struct bio *bio, unsigned int done, int error)
> {
> struct path *path = (struct path *) bio->bi_private;
> + struct multipath *m = path->pg->m;
> + unsigned long flags;
>
> if (bio->bi_size)
> return 1;
>
> + spin_lock_irqsave(&m->path_lock, flags);
> if (error)
> __fail_path(path);
> + else
> + __recover_path(path);
> + spin_unlock_irqrestore(&m->path_lock, flags);
>
> up(&path->test_lock);
> return 0;
--
Kevin Corry
kevcorry at us.ibm.com
http://evms.sourceforge.net/
More information about the dm-devel
mailing list