[lvm-devel] dmeventd doesn't handle failures during mirror resync.

Jonathan Brassow jbrassow at redhat.com
Wed May 5 13:22:32 UTC 2010


On May 5, 2010, at 3:08 AM, Petr Rockai wrote:

> Neil Brown <neilb at suse.de> writes:
>
>> I was surprised to discover that while a normal write error is
>> handled properly - dmeventd runs 'lvconvert' to fix the array up,
>> this does not happen in response to a write error while syncing
>> the array.
>>
>> If I arrange for the new device to die, then
>>          lvconvert --repair --use-policies
>>
>> will fix it up as I would expect, but dmeventd never asks it to do
>> this.
>>
>> This seems to be a deliberate decision:  in _process_status_code
>> in dmeventd_mirror.c, a status of 'F' will cause lvconvert to be
>> run while 'S' and 'R' (sync and read errors) will not.
>>
>> Is there a reason for this?
> I think the rationale is that:
>
> For read errors, we should *not* strip the mirror leg, since we want  
> to
> keep as much redundancy as possible in this scenario. The failure  
> should
> be logged, but I think that's it.
>
> For sync, I am not sure. It may be that the reason for this is that  
> sync
> is usually related to manual action and dmeventd intervention may be
> unexpected and unwanted in this case. But that case could be argued.
>
>> Can we change dmeventd to response to sync (and read) errors in the  
>> same
>> way that it responds to write errors?
> I think it's a bad idea for read errors, unless maybe we could have a
> new feature for that -- one that'd upconvert the mirror first (if
> there's a hotspare) and only if that finishes OK, kill the bad leg.  
> Just
> log the error if there are no hotspares.
>
> For sync errors, I am ambivalent. Any further opinions?

I think for sync errors, we should restart the sync.  This can be done  
by a suspend/resume of the mirror device.  Effectively, we are  
assuming a transient failure.  Perhaps if we have tried to clear the  
fault a couple times, then we could remove the failed device.

Read errors I would definitely leave alone.  Drives can often relocate  
bad sectors, but that is done on writes.  If the relocation fails, we  
will know about it when the write fails.

  brassow




More information about the lvm-devel mailing list