[dm-devel] Improving mirror fault handling.

Tue Jan 13 17:50:48 UTC 2009

On Jan 12, 2009, at 9:26 PM, malahal at us.ibm.com wrote:

>> 4) Transient fault handling
>> - Since we can't just assume "wait 5 seconds and then see if the  
>> failure
>> still exists", we are going to have to make this configurable.
>> Discussion should proceed on this in parallel with #2 and #3, since  
>> this
>> phase will take a long time for everyone to agree.  We have to  
>> determine
>> where the user specifies the configuration - lvm.conf?  CLI?  We also
>> have to determine /what/ their configuration will be based on - time?
>> percentage of mirror out-of-sync?
>
> Thank you Jonathan for the nice write up. Transient failure are
> generally recoverable after a period of time. The 'time' may vary from
> device to device though. lvm.conf based configuration is a good  
> place to
> start. Do we really need LV or PV based configuration for this
> 'timeout'?
>
> The recovery itself doesn't depend on the %of out-of-sync regions, but
> that is a good place to start looking for re-allocating the regions if
> configured for re-allocation.
>
> Here are my thoughts:
> 	handle_mirror_transient_failure()
> 	{
> 		do {
> 			if (device-came-back-to-life()) {
> 				start-resynchronization();
> 				break;
> 			}
>
> 			if (reallocation-timeout exceeded or
> 			    re-allocation-too-much out-of-sync) {
> 				re-allocate();
> 				break;
> 			}
> 			if (some-other-timeout exceeded) {
> 				log a message and break;
> 			}
> 			sleep(for-few-seconds);
> 			timeout =- few-seconds;
> 		} while (1)
> 	}

If we put the configuration in lvm.conf, then it would globally apply  
to all volume groups and all logical volumes.  I might be willing to  
accept that for a while, but others may want a plan for something  
better going forward.  We don't want to pollute the conf file with new  
fields that will be useless shortly into the future.  If you look in  
LVM2/doc/example.conf and search for _fault_policy, you can see that  
there are already some configuration options there.  We might stick  
the new ones there as well.  (Although this somewhat confuses me,  
because they apply only to our default DSO, and you can change the DSO  
you want to use in a completely different section of the config  
file...  So now you have settings that are worthless because a custom  
DSO is being used.)

What I meant in regards to "/what/ their configuration will be based  
on", is that the user may not care about the time they wait for a  
device to come back, but how far the mirror has gone out of sync while  
the device has been gone...  If one of the legs fails and the mirror  
is 75% out of sync before the device comes back, the user may just  
want the device removed and stop waiting.  If the user specifies "5  
minutes" wait time, but there have been no writes to the mirror in  
that time, then we could probably wait longer.  You see what I mean?   
A user may wish to use a combination of the two methods... "Wait 20  
minutes for the device to come back, but only if the mirror stays >  
95% in-sync".

As for the pseudo-code...  I wouldn't use a 'while(1)' there... leave  
the thread free to continue.  We could use dmeventd's timer events to  
trigger the next check for the device coming back (I hope).  Your code  
seems to suggest that you understand my point in the preceding  
paragraph, but I am a bit confused by the use of '[re-]allocation'.   
In this piece of code, we are only concerned about whether or not to  
take action.  The action is user defined (see the example.conf  
mentioned above), so the space may or may not be reallocated.

brassow