[dm-devel] DM-RAID1 data corruption

Wed Jun 24 03:03:26 UTC 2009

On Tuesday June 23, tyasui at redhat.com wrote:
> >> MD-RAID1 solves this problem by having counters in superblocks on both 
> >> legs. If some leg dies, the counter on the other devices is increased. If 
> >> the dead disk comes online again, it is found that it has old counter and 
> >> cannot be trusted.
> >>
> >> Would it be possible to extend a logical volume when converting it to a 
> >> raid1 and use the last area of the volume as a superblock?
> >>
> >> Mikulas
> > 
> > This is an old thread, I am just trying to revitalize it! :-) How about
> > dm-raid1 taking superblock storage as arguments in the command line,
> > just like the log device? The superblock storage is entirely managed by
> > the kernel, LVM just allocates it. Error handling can be instant this
> > way. LVM can auto convert the exiting mirrors to this kind of mirrors if
> > space is available.
> 
> Interesting idea. The superblock storage managed by kernel is really
> important to handle an error quickly inside the kernel.

I don't think that it is important to handle errors quickly - they
really shouldn't happen often enough that speed is an issue.  All you
need to do is handle errors correctly.

I would suggest that you simply get raid1 to block any write requests
until all drive failures have been acknowledged by userspace.
So you would need to differentiate between an acknowledged drive
failure and an unacknowledged failure.  Writes block when ever there
are unacknowledged failures.
Then you need a message that can be sent to the raid1 to acknowledge
the failure of a particular device.
'suspend' would need to fail if there are any unacknowledged failures
as otherwise it would block.

NeilBrown