[dm-devel] [patch] dm-mpath.c trigger_event race

Tue Jun 14 09:13:19 UTC 2005

On 2005-06-13T19:28:51, Lars Marowsky-Bree <lmb at suse.de> wrote:

> Hi Alasdair, Ed, List,
> 
> this patch addresses the bug Ed pointed out. The trigger_event queue
> might still hold a reference while dm_destroy kills the current mapping.
> 
> However, this patch is unconvincing in the larger view of things. Yes,
> barring bugs, it should address the problem, but the real issue is -
> just like with still having pg_inits in-flight - that DM's reference
> counting isn't taking into account what we are doing internally, and
> that remains unsolved.
> 
> Comments?

Second (public) revision. The wait_on_work() was no good. While it
considerably reduced the window for the race, it didn't entirely close
it (the flag is cleared prior to actually calling the handler).

The comment still remains: The layering in DM is sub-optimal, if not
wrong. I tried exporting dm_(inc|dec)_pending(struct mapped_device *md)
from dm.c, so as to be able to influence the reference counter for the
dm device when queuing / completing an internal event, but this is a
mess. Deep inside dm-mpath, I don't have a reference to that struct at
all.

This is actually the same problem which makes sane _logging_ in dm-mpath
a nightmare: No references to the name of the DM dev etc. This all
sucks(tm).

However, I needed a fix reasonably soon (as in, right now ;-) which is
somewhat easier to verify then revamping the full DM layering. Comments
on this approach are appreciated.

And yes, I know that the "looping" in free_multipath() sucks. See above
for the reason: It's easier to verify then the full answer.

A flush_workqueue(kmultipathd) is too large a hammer - given that other
tables might still be busy queuing events, re-queuing IO etc, that could
potentially block for much longer periods of time.

From: Lars Marowsky-Bree <lmb at suse.de>
Subject: Panic in dm-mpath.c:trigger_event()
References: 88635

If a dm mpath table was destroyed while events were still pending to be
delivered, the code could panic in trigger_event() trying to dereference
free'd memory.


Index: linux-2.6.5/drivers/md/dm-mpath.c
===================================================================

--- linux-2.6.5.orig/drivers/md/dm-mpath.c	2005-06-13 18:14:58.555830390 +0200
+++ linux-2.6.5/drivers/md/dm-mpath.c	2005-06-14 10:57:44.879248517 +0200
@@ -80,6 +80,8 @@ struct multipath {
 	unsigned queue_size;
 
 	struct work_struct trigger_event;
+	atomic_t trigger_event_pending; /* Reference count whether any trigger_event
+					   is outstanding */
 
 	/*
 	 * We must use a mempool of mpath_io structs so that we
@@ -179,6 +180,7 @@ static struct multipath *alloc_multipath
 		m->queue_io = 1;
 		INIT_WORK(&m->process_queued_ios, process_queued_ios, m);
 		INIT_WORK(&m->trigger_event, trigger_event, m);
+		atomic_set(&m->trigger_event_pending, 0);
 		m->mpio_pool = mempool_create(MIN_IOS, mempool_alloc_slab,
 					      mempool_free_slab, _mpio_cache);
 		if (!m->mpio_pool) {
@@ -195,6 +197,14 @@ static void free_multipath(struct multip
 	struct priority_group *pg, *tmp;
 	struct hw_handler *hwh = &m->hw_handler;
 
+	/* TODO - Yes, I know:
+	 * It would be better if in fact pending trigger_events
+	 * were accounted for in the struct mapped_device * we belong
+	 * to. However, that is not exported to us, and that fix will
+	 * require slightly more rearchitecting then this bit. */
+	while (atomic_read(&m->trigger_event_pending))
+		schedule();
+	
 	list_for_each_entry_safe (pg, tmp, &m->priority_groups, list) {
 		list_del(&pg->list);
 		free_priority_group(pg, m->ti);
@@ -421,6 +431,16 @@ static void trigger_event(void *data)
 	struct multipath *m = (struct multipath *) data;
 
 	dm_table_event(m->ti->table);
+
+	spin_lock(&m->lock);
+	atomic_set(&m->trigger_event_pending, 0);
+	spin_unlock(&m->lock);
+}
+
+static void inline queue_trigger_event(struct multipath *m)
+{
+	atomic_set(&m->trigger_event_pending, 1);
+	queue_work(kmultipathd, &m->trigger_event);
 }
 
 /*-----------------------------------------------------------------
@@ -818,7 +838,8 @@ static int fail_path(struct pgpath *pgpa
 	if (pgpath == m->current_pgpath)
 		m->current_pgpath = NULL;
 
-	queue_work(kmultipathd, &m->trigger_event);
+	
+	queue_trigger_event(m);
 
 out:
 	spin_unlock_irqrestore(&m->lock, flags);
@@ -857,7 +878,7 @@ static int reinstate_path(struct pgpath 
 	if (!m->nr_valid_paths++)
 		queue_work(kmultipathd, &m->process_queued_ios);
 
-	queue_work(kmultipathd, &m->trigger_event);
+	queue_trigger_event(m);
 
 out:
 	spin_unlock_irqrestore(&m->lock, flags);
@@ -900,9 +921,10 @@ static void bypass_pg(struct multipath *
 	m->current_pgpath = NULL;
 	m->current_pg = NULL;
 
+	queue_trigger_event(m);
+
 	spin_unlock_irqrestore(&m->lock, flags);
 
-	queue_work(kmultipathd, &m->trigger_event);
 }
 
 /*
@@ -930,9 +952,9 @@ static int switch_pg_num(struct multipat
 		m->current_pg = NULL;
 		m->next_pg = pg;
 	}
+	queue_trigger_event(m);
 	spin_unlock_irqrestore(&m->lock, flags);
 
-	queue_work(kmultipathd, &m->trigger_event);
 	return 0;
 }
 

Sincerely,
    Lars Marowsky-Brée <lmb at suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"