[lvm-devel] [PATCH 2 of 4] Handle transient secondary mirror leg failures

Fri Dec 18 18:25:38 UTC 2009

On 12/18/09 12:10, Jonathan Brassow wrote:
> 2) If you don't get a new table loaded, it will behave as a suspend/ 
> resume only.  Recent code changes in dm-raid1.c are causing  
> 'log_failure' and 'leg_failure' to not be reset in those cases.  IOW,  
> all these steps could be for nothing.  :(

I would like to know how effective the retry is. As Jon explained
above, recent upstream kernel blocks all write I/Os on NOSYNC regions.
This means that those write I/Os are kept blocked for a long time.
For example, mirror retry interval in your patch #4 is 30 seconds and
application or  filesystem will be waited for 30 seconds (330 seconds
if retry count is 10). Can your application wait for more than 5 minutes?

This behaviour will not been solved even if kernel is fixed so that
log_failure and leg_failure are reset. The write I/Os blocked will
be re-queued in the kernel when suspend/resume are done, but they
will be put in the hold queue again if the device failure is not
transient but permanent.

I would like to know the use case of this patch set.

Thanks,
Taka