[dm-devel] [PATCH 7/7] Hold all write bios when errors are handled

Wed Nov 25 23:20:10 UTC 2009

Takahiro Yasui [tyasui at redhat.com] wrote:
> I think again the scenario which Mikulas pointed. It looks double failures
> (fails happened on two legs), and human intervention would be acceptable.
> However, how do we know if the second leg contains valid data?
> 
> There might be two cases.
> 
>   1) System crashed during write operations without any disk failures, and
>      the first leg fails at the next boot.
> 
>      We can use the secondary leg because data in the secondary leg is valid.
> 
>   2) System crashed after the secondary leg failed, and the first leg fails
>      and the secondary leg gets back online at the next boot.
> 
>      We can't use the secondary leg because data might be stale.
> 
> I haven't checked the contents of log disk, but I guess we can't
> differentiate these cases from log disks.

There were plans to add a new region state to make sure that all the
mirror legs have same data after a "crash". Currently your best bet is a
complete resync after a crash! I am not sure if the state is written to
log disk though. It may be possible to distinguish the above two cases
with this...

Or just have LVM meta data that records a device failure. Suspend writes
[for any kind of leg] and record device failure in the LVM meta data and
restart writes. This requires LVM meta data change though!

> Another possibility I thought was error messages.
> If any error messages for the secondary leg are recorded, we can judge that
> the secondary leg contains stale data, but I suspect that it is not a secure
> way because syslog might not be written in disk before system crash.

We should be able to fix it in the LVM meta data!