[dm-devel] [RFC] pathchecker

Mon Mar 1 15:02:06 UTC 2004

On Tue, Feb 24, 2004 at 06:04:33PM +0100, christophe.varoqui at free.fr wrote:
> * thread 1 : loop in {update failed path list; wait for DM event}
> * thread 2 : loop in {submit test IOs to failed paths; sleep}

	I've been looking at the mpath code, and I had a couple
thoughts.  I'm coming from code reading and recent dm-devel discussion,
so I apologize if I couldn't find the right doc when I looked at Google
or missed the right BOF.
	A seperate API for noting newly live (or dead) paths would be
preferable.  If you have different selector types, aren't you going to
have different pathcheckers?  Or is the intention to have one
pathchecker to rule them all?  I think that 'marking all live' causes
fail-storms in the presence of still-failed high-priority paths (this
was mentioned earlier).  Extending the existing API for liveness means
that you can have only one pathchecker, and it must understand all
paths, selectors, and groups for a given mpath.  I'm wondering if
that's the best plan.  Of course, if you allow multiple checkers for one
mpath device, you need a way to specify the associated group, selector,
and path.  That's a pain.
	The "wait for DM event" part.  Do we have an event yet?
2.6.4-udm1 doesn't seem to send any events to userspace on fail_path().
Are we thinking an upcall, or perhaps polling the status?
	Why threads?  Just to make the wait semantics simpler?  If 'DM
event' is an upcall, the user helper can just connect to a single
process (thread2, above) that recieves the notification and adds the
failed paths.

Joel

-- 

"Born under a bad sign.
 I been down since I began to crawl.
 If it wasn't for bad luck,
 I wouldn't have no luck at all."

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127