[Linux-cluster] Re: Linux-cluster Digest, Preventing LVM from concurrent access

Sun Aug 30 14:21:14 UTC 2009

Hi Edson,

El jue, 27-08-2009 a las 17:47 -0300, Edson Marquezani Filho escribió:
> >
> > Yes, you can bypass the exclusive flag by hand if you dont take it into
> > account when activating the exclusive LV on a second node.
> >
> > While the exclusive LV is mounted in other node, the command "lvchange
> > -aey XXX/YYY" should give you an error message. "lvchange -ay XXX/YYY"
> > will bypass it.
> >
> 
> Ok, but I was able to access that LV without changing any flag, while
> both nodes were joined to the cluster.
> 

This should not happen. You should only have the Logical Volume on one
node if you use the exclusive flag correctly.

> But now, I don't know why, it's working as it should be, and I can't
> use the LV to launch the VM while it is running on another node, under
> rgmanager control. OK. That's what I wanted.

Ok

> 
> I could move the service through servers, but sometimes it has failed.
> I have done this many times in sequence, from master to slave, and
> vice-versa, just waiting a few seconds between each attempt, and it
> seems that sometimes the LV is not released. Anyway, such a test would
> not match a real situation, so I think this can be ignored.
> 

Maybe this is caused because your VM script. Maybe the lvm-cluster
resource tries to deport a device that is in-use, so it will fail. Did
you check your logs for any error messages? I built the resource very
"verbose".

> ( I have been changing my cluster.conf a lot, so, it's difficult to
> say what was wrong, if it was. Forget about. =/ )
> 
> >> I runned a test, disconecting the heartbeat link, making one server to
> >> be fenced, and the VM launched on the "winner" as expected. But when
> >> the "loser" server came back, still without heartbeat link, it
> >> launched the same VM again, and service appeared as running locally on
> >> boths nodes. I guess lvm-cluster should avoid this, shouldn't ?
> >>
> >
> > This should not happen. Have you set "exclusive=yes" into the resource
> > definitions in cluster.conf? Can we have a copy of your current
> > cluster.conf?
> 
> Yes, and I can say for sure that this "problem" (if I can call it like
> this) keeps happening to me. Look, remember that this happens when I
> bring back the server that was fenced, still without any connection
> with the other, which VM was "moved" to.
>  I can see that cman, clvmd, rgmanger and etc are running normaly, and
> each server thinks that he is alone.

> Let me try to make it easier to undestand, listing the steps I did:
> 
> 1) Server A and B is up, VM service is started on cluster, and get
> running on server A;
> 2) Cable of heartbeat interface is disconnected;
> 3) Server A is fenced by B, and server B starts the VM service;
> 4( Clabe is kept disconnected;
> 5) Server A is turned on again and after it has completed the
> initialization, I can see that VM service is running, as well as
> server B, both getting access to the same LV.
> 
> That's it, I hope you understand what I meant.

I'm pretty sure this is caused by an ineffective fencing configuration
and wrong configuration of LVM. What i think is happening to you,
without any info of your logs, is:

1.- Primary node gets fenced out. VM is moved to alive secondary node.
2.- Fenced node boots again. 
3.- While comms are interrupted, cluster services are started again on
restarted node. This services should be, at least, CMAN, CLVMD and
RGMANAGER.
4.- Restarted node thinks it is alone because comms are interrupted. It
applies fencing on secondary node.
5a.- If fencing is OK, secondary node gets shut down, and primary node
gets the VM again. 
5b.- If fencing is not working properly and primary node believes the
secondary node has been fenced, it will try to get control of the LVM
and the VM. The exclusive flag protection should forbade this.

Check your fencing devices. Maybe they always return "success" code
error and its not true. 

On the other hand, about the LVM volume, please check this in order to
establish your service:

https://www.redhat.com/archives/linux-cluster/2009-July/msg00259.html

> 
> My cluster.conf with which I have runned those test is here [1].
> 
> [1] http://pastebin.com/m6a23734a
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Cheers,

Rafael
-- 
Rafael Micó Miranda