[dm-devel] Improving mirror fault handling.
Jonathan Brassow
jbrassow at redhat.com
Tue Jan 13 17:50:48 UTC 2009
On Jan 12, 2009, at 9:26 PM, malahal at us.ibm.com wrote:
>> 4) Transient fault handling
>> - Since we can't just assume "wait 5 seconds and then see if the
>> failure
>> still exists", we are going to have to make this configurable.
>> Discussion should proceed on this in parallel with #2 and #3, since
>> this
>> phase will take a long time for everyone to agree. We have to
>> determine
>> where the user specifies the configuration - lvm.conf? CLI? We also
>> have to determine /what/ their configuration will be based on - time?
>> percentage of mirror out-of-sync?
>
> Thank you Jonathan for the nice write up. Transient failure are
> generally recoverable after a period of time. The 'time' may vary from
> device to device though. lvm.conf based configuration is a good
> place to
> start. Do we really need LV or PV based configuration for this
> 'timeout'?
>
> The recovery itself doesn't depend on the %of out-of-sync regions, but
> that is a good place to start looking for re-allocating the regions if
> configured for re-allocation.
>
> Here are my thoughts:
> handle_mirror_transient_failure()
> {
> do {
> if (device-came-back-to-life()) {
> start-resynchronization();
> break;
> }
>
> if (reallocation-timeout exceeded or
> re-allocation-too-much out-of-sync) {
> re-allocate();
> break;
> }
> if (some-other-timeout exceeded) {
> log a message and break;
> }
> sleep(for-few-seconds);
> timeout =- few-seconds;
> } while (1)
> }
If we put the configuration in lvm.conf, then it would globally apply
to all volume groups and all logical volumes. I might be willing to
accept that for a while, but others may want a plan for something
better going forward. We don't want to pollute the conf file with new
fields that will be useless shortly into the future. If you look in
LVM2/doc/example.conf and search for _fault_policy, you can see that
there are already some configuration options there. We might stick
the new ones there as well. (Although this somewhat confuses me,
because they apply only to our default DSO, and you can change the DSO
you want to use in a completely different section of the config
file... So now you have settings that are worthless because a custom
DSO is being used.)
What I meant in regards to "/what/ their configuration will be based
on", is that the user may not care about the time they wait for a
device to come back, but how far the mirror has gone out of sync while
the device has been gone... If one of the legs fails and the mirror
is 75% out of sync before the device comes back, the user may just
want the device removed and stop waiting. If the user specifies "5
minutes" wait time, but there have been no writes to the mirror in
that time, then we could probably wait longer. You see what I mean?
A user may wish to use a combination of the two methods... "Wait 20
minutes for the device to come back, but only if the mirror stays >
95% in-sync".
As for the pseudo-code... I wouldn't use a 'while(1)' there... leave
the thread free to continue. We could use dmeventd's timer events to
trigger the next check for the device coming back (I hope). Your code
seems to suggest that you understand my point in the preceding
paragraph, but I am a bit confused by the use of '[re-]allocation'.
In this piece of code, we are only concerned about whether or not to
take action. The action is user defined (see the example.conf
mentioned above), so the space may or may not be reallocated.
brassow
More information about the dm-devel
mailing list