[dm-devel] DM-RAID1 data corruption
Takahiro Yasui
tyasui at redhat.com
Tue Apr 14 21:07:16 UTC 2009
Hi Mikulas,
I know this data corruption issue can happen. To make this
condition easily, I stopped dmeventd and injected an error
to leg 0, then this issue happened in my environment.
The problem is leg 0 is always the default mirror without
checking any information. To store the information which
leg is the default mirror might solve this issue.
Thanks,
Taka
> Hi
>
> This is the scenario of data corruption that I was talking about:
>
> Mirror has two legs, 0 and 1 and a log. Disk 0 is the default.
>
> A write is propagated to both legs. The write fails on leg 0 and succeeds
> on leg 1.
>
> The function "write_callback" puts the bio to "failure" list (if
> errors_handled was true). It also wakes userspace.
>
> do_failures pops the bios from ms->log_failure and calls dm_rh_mark_nosync
> on them to mark the region nosync. dm_rh_mark_nosync completes the bio
> with success.
>
> *the computer crahes* (before the userspace daemon had a chance to run)
>
> On next reboot, disk is 0 revived (suppose that it temporarily failed
> because of a loose cable, overheating, insufficient power or so, and the
> condition is repaired), raid1 sees set bit in the dirty bitmap and starts
> copying data from disk 0 to disk 1.
>
> The result: write bio was ended as succes, but the data was lost. For
> databases, this might have bad consequences - committed transactions being
> forgotten.
>
> -
>
> If the above scenario can't happen, pls. describe why.
>
> What would be a possible way to fix this?
>
> Delay all bios until the userspace code removes the failed mirror?
> Or store the number of the default mirror in the log?
>
> Mikulas
>
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
More information about the dm-devel
mailing list