[Linux-cluster] cmirror status

Thu Dec 22 04:45:08 UTC 2005

On Wed, 2005-12-21 at 23:25 +0100, Sylvain Coutant wrote:
> > cmirror won't be production ready until it gets some more testing.
> 
> I'm ready to test. Apart from testing needs, is there some known bugs? The CVS activity is quite small during last months.

k, there aren't any bugs that I know WRT the kernel pieces of the code.

> Is this part of the project dead or been left apart ?

It's just been waiting for the rest of the project to catch up.

> Will it have active support and dev in the future ?

Yes.

> > > Also I didn't found useful docs on how to make use of it ... The README
> > could have a more lines to better explain how to interact with device
> > mapper.
> > 
> > Yeah, sorry 'bout that.  New LVM packages are needed that support
> > mirroring.
> 
> Are they available somewhere ?

Not yet, soon.

> > LVM will "do the right thing" when it encounters volume
> > groups that are clustered vs non-clustered, and you should be able to
> > switch between the two w/o having to re-sync your volume.
> 
> I don't think I'll try that switch too much ;-) In fact, my target setup is a little bit complex and I won't play too much with those things.
> 
> 
> > If you'd like to use [cluster] mirroring 
> 
> I could give it a try using some test servers.
> 
> 
> > before LVM is ready,
> 
> You mean once LVM will be ready, the cluster mirroring will be integrated and somewhat "transparent" ? We will just setup a lv with several copies, that's right ?

Yes.  You will not have to specify whether the mirror is to be clustered
or not, LVM will just know by the "clustered" flag associated with the
volume group.

> > I'll give
> > you the table arguments.
> 
> Just explain them a little bit more (ie why the "2" after the cluster keyword ?) please. And this log file thing (local/shared, size, ...)
> 
> 

The "2" in that line refers to the number of arguments the log type
(cluster in this case) takes.  In that example, it took 2; but that is
out of date.  Now it looks more like:

<start> <length> mirror \
cluster <# log args> <disk | core> <uuid> [device] <region size>
[[no]sync] [block_on_error] \
<N> <dev1> <offset> <dev2> <offset> ... <devN> <offset>

So an example might be (all one line):
0 4724028 mirror cluster 5 disk my_uuid /dev/sda 1024 block_on_error
2 /dev/sdb 0 /dev/sdc 0

Note that the uuid needs to be unique for each mirror device you are
creating.

> >   Note that LVM has a big part in how recovery
> > happens.
> 
> Btw, how does it happens ? What about error detection ?

Two types of errors in cluster mirroring; machine failure and disk
failure.

When a machine fails, whatever blocks it was working on are re-synced;
and if that machine was the log server, a new one is elected.

If a disk fails, I/O completions are frozen to prevent other machines in
the cluster from reading inconsistent data and a device-mapper event is
raised (this is where LVM comes in normally).  The device must be
suspended and a new table for the device loaded in which excludes the
failed device.

So, if you are using dmsetup, there are a number of steps involved to
get your volume running again after a disk failure.  When LVM is ready,
it will handle all those steps for you.

If you want to play around and keep it somewhat simple, you would test
cluster mirroring with a cluster file system (like GFS) and see what
happens when you kill machines.

 brassow