[dm-devel] [RFC] pathchecker

christophe varoqui christophe.varoqui at free.fr
Mon Mar 1 16:36:19 UTC 2004


Joe Thornber wrote:
> On Mon, Mar 01, 2004 at 12:03:34PM -0800, Joel Becker wrote:
> 
>>	The "wait for DM event" part.  Do we have an event yet?
>>2.6.4-udm1 doesn't seem to send any events to userspace on fail_path().
>>Are we thinking an upcall, or perhaps polling the status?
> 
> 
> The event handling should be working; fail_path() uses a work queue to
> schedule trigger_event() being called (we can't call it directly from
> interrupt context).
> 
> dm has a very simple model for events:
> 
> - Userland issues the wait for event ioctl, which blocks until an event occurs
> - A target (eg. mpath) triggers an event
> - Userland returns from the wait for event ioctl.  At this point it
>   should query the status of the device to work out what happened.
> 
> An event number is passed into 'wait for event' to indicate the last
> known event.  This way we can avoid missing events while previous
> events are being processed.  Only recent versions of dmsetup support
> this event number handling.
> 
Yes,

I confirm the current event notification scheme is useable for the 
pathchecker. I have a prototype I'll post this week.

Speaking of that I call for comment on the saneness of the following 
general rule : what about the multipath configuration tool isolating 
failed paths in a fallback PG ? They would be marked Active as no IO 
went through them, and thus be exercised in case high priority paths all 
fail. If they are hot-activated by the controler (think a controler LUN 
handling switchover), they will work as-is. If they are really failed, 
they will just be marked as such.

Now with the pathchecking logic :
Upon MP initial config, all path are marked Active, including failed 
ones grouped in a separate secondary PG ... no pathchecking.

Waiter threads wait for events.

Now an exercized path fails. A waiter thread wakes and fetch the MP 
status string, discovers the failed path and push it on the failedpaths 
list. Now the patchchecker thread has this path to test.

On the 1000th try the pathchecker finds the path has gone up again. 
pathchecker fork'n exec the multipath config tool that reset the MP 
target as it was in the begining. The path pops out of the failedpaths 
list and everybody returns to sleep happy.

So, sane / insane ?




More information about the dm-devel mailing list