From Mark.Vallevand at UNISYS.com  Thu Oct  1 16:25:48 2015
From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K)
Date: Thu, 1 Oct 2015 16:25:48 +0000
Subject: [Linux-cluster] Resource placement after node comes online.
Message-ID: <06e4dceeda324a9f9fde6d7354c3799b@US-EXCH13-5.na.uis.unisys.com>

In a multiple node cluster with resources distributed across the nodes, is there a way to automatically 'rebalance' resources when a node comes online?

If this has been asked and answered, I apologize.  And, pointer to the relevant information would be welcome.

Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com<mailto:Mark.Vallevand at Unisys.com>
Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20151001/47f3615c/attachment.htm>

From misch at schwartzkopff.org  Thu Oct  1 16:39:38 2015
From: misch at schwartzkopff.org (Michael Schwartzkopff)
Date: Thu, 01 Oct 2015 18:39:38 +0200
Subject: [Linux-cluster] Resource placement after node comes online.
In-Reply-To: <06e4dceeda324a9f9fde6d7354c3799b@US-EXCH13-5.na.uis.unisys.com>
References: <06e4dceeda324a9f9fde6d7354c3799b@US-EXCH13-5.na.uis.unisys.com>
Message-ID: <4171197.EtMQoEnJ5x@nb003>

Am Donnerstag, 1. Oktober 2015, 16:25:48 schrieb Vallevand, Mark K:
> In a multiple node cluster with resources distributed across the nodes, is
> there a way to automatically 'rebalance' resources when a node comes
> online?


pacemaker decides where to stop / start resources based on a score system 
every time the cluster receives an event. A node joing the cluster is 
definitely an event that triggers a recalculation of the scores that a resource 
would collect on every node.

Baed on constraints you can configure your cluster to re-distribute resources 
in a very granular way.

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 M?nchen

Tel: (0162) 1650044
Fax: (089) 620 304 13
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20151001/ac61b183/attachment.sig>

From franchu.garcia at gmail.com  Thu Oct  1 16:45:08 2015
From: franchu.garcia at gmail.com (Fran Garcia)
Date: Thu, 1 Oct 2015 18:45:08 +0200
Subject: [Linux-cluster] Resource placement after node comes online.
In-Reply-To: <06e4dceeda324a9f9fde6d7354c3799b@US-EXCH13-5.na.uis.unisys.com>
References: <06e4dceeda324a9f9fde6d7354c3799b@US-EXCH13-5.na.uis.unisys.com>
Message-ID: <CAD+pdNdswV_S0maAEccGdo86wusHjix379Gnkk2M8UDQ21kJ5A@mail.gmail.com>

On 1 October 2015 at 18:25, Vallevand, Mark K <> wrote:
> In a multiple node cluster with resources distributed across the nodes, is
> there a way to automatically ?rebalance? resources when a node comes online?

if you're using RedHat Cluster:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/s1-config-failover-domain-conga-CA.html

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/ch-resourceconstraints-HAAR.html

HTH

Fran


From Mark.Vallevand at UNISYS.com  Thu Oct  1 17:17:17 2015
From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K)
Date: Thu, 1 Oct 2015 17:17:17 +0000
Subject: [Linux-cluster] Resource placement after node comes online.
In-Reply-To: <4171197.EtMQoEnJ5x@nb003>
References: <06e4dceeda324a9f9fde6d7354c3799b@US-EXCH13-5.na.uis.unisys.com>
	<4171197.EtMQoEnJ5x@nb003>
Message-ID: <5fad8468d6d44ab4bbd3eb815d461f67@US-EXCH13-5.na.uis.unisys.com>

Thank you.

As a mind-experiment, consider this very simple example.  Is my understanding correct?

3 node cluster.  All node equal with capacity 10.  Total cluster capacity 30.
15 resources.  All equal with utilization 1.
All nodes are off.
Node1 turns on.  Joins cluster.  10 resources start. 
Some time later, node2 turns on.  Joins cluster.  5 resources start.  Resources are distributed how?  10 on node1 and 5 on node2?  Will some resources migrate to node2?
And even later, node3 turns on.  Joins cluster.  No more resources start.  Resources are distributed how?  10 on node1 and 5 on node2 and 0 on node3?  Or will some resources migrate to node3?

Our clustering has been very successful to this point.  We are considering future options.


Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com <mailto:Mark.Vallevand at Unisys.com> 
Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Michael Schwartzkopff
Sent: Thursday, October 01, 2015 11:40 AM
To: linux clustering
Subject: Re: [Linux-cluster] Resource placement after node comes online.

Am Donnerstag, 1. Oktober 2015, 16:25:48 schrieb Vallevand, Mark K:
> In a multiple node cluster with resources distributed across the nodes, is
> there a way to automatically 'rebalance' resources when a node comes
> online?


pacemaker decides where to stop / start resources based on a score system 
every time the cluster receives an event. A node joing the cluster is 
definitely an event that triggers a recalculation of the scores that a resource 
would collect on every node.

Baed on constraints you can configure your cluster to re-distribute resources 
in a very granular way.

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 M?nchen

Tel: (0162) 1650044
Fax: (089) 620 304 13


From Mark.Vallevand at UNISYS.com  Thu Oct  1 18:34:28 2015
From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K)
Date: Thu, 1 Oct 2015 18:34:28 +0000
Subject: [Linux-cluster] Resource placement after node comes online.
In-Reply-To: <CAD+pdNdswV_S0maAEccGdo86wusHjix379Gnkk2M8UDQ21kJ5A@mail.gmail.com>
References: <06e4dceeda324a9f9fde6d7354c3799b@US-EXCH13-5.na.uis.unisys.com>
	<CAD+pdNdswV_S0maAEccGdo86wusHjix379Gnkk2M8UDQ21kJ5A@mail.gmail.com>
Message-ID: <16d24567f0df452c88270cec94f4d430@US-EXCH13-5.na.uis.unisys.com>

We are using pacemaker + cman + corosync.


Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com <mailto:Mark.Vallevand at Unisys.com> 
Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.


-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Fran Garcia
Sent: Thursday, October 01, 2015 11:45 AM
To: linux clustering
Subject: Re: [Linux-cluster] Resource placement after node comes online.

On 1 October 2015 at 18:25, Vallevand, Mark K <> wrote:
> In a multiple node cluster with resources distributed across the nodes, is
> there a way to automatically ?rebalance? resources when a node comes online?

if you're using RedHat Cluster:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/s1-config-failover-domain-conga-CA.html

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/ch-resourceconstraints-HAAR.html

HTH

Fran

-- 
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From Mark.Vallevand at UNISYS.com  Fri Oct  2 15:25:15 2015
From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K)
Date: Fri, 2 Oct 2015 15:25:15 +0000
Subject: [Linux-cluster] Resource placement after node comes online.
References: <06e4dceeda324a9f9fde6d7354c3799b@US-EXCH13-5.na.uis.unisys.com>
	<4171197.EtMQoEnJ5x@nb003> 
Message-ID: <0f79ad36d4814b9b841d636807bcd94f@US-EXCH13-5.na.uis.unisys.com>

So, changing this from a mind-experiment to a real experiment I set up a simple cluster.
pacemaker 1.1.10
cman 3.1.7
corosync 1.4.6

There are 2 nodes.  No utilization attributes are defined for the nodes.
There are 3 resources.  There are no capacities defined for the resources.  These are simple 'Dummy' resources.
When the resources are created I see them allocated to node1, node2, and then node1.
However, when I restart a node, the resources will always be allocated to a single node - the node that was not restarted.  The node that restarts does not get any resources allocated to it.
If I restart both nodes at about the same time, I see resources allocated to both nodes.
If I restart a resource, I see it allocated to the node with the fewest resources.

Good.  This is the behavior I expected to see.

I would like to know of a strategy to get resources 'rebalanced' when a node restarts so that all the nodes have resources.
Is this possible?

I know that it might not be good to have resources automatically move.  The interruption of a resource probably isn't desirable.  But, if I wanted to do it, could I?


Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com <mailto:Mark.Vallevand at Unisys.com> 
Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

-----Original Message-----
From: Vallevand, Mark K 
Sent: Thursday, October 01, 2015 12:17 PM
To: 'misch at schwartzkopff.org'; linux clustering
Subject: RE: [Linux-cluster] Resource placement after node comes online.

Thank you.

As a mind-experiment, consider this very simple example.  Is my understanding correct?

3 node cluster.  All node equal with capacity 10.  Total cluster capacity 30.
15 resources.  All equal with utilization 1.
All nodes are off.
Node1 turns on.  Joins cluster.  10 resources start. 
Some time later, node2 turns on.  Joins cluster.  5 resources start.  Resources are distributed how?  10 on node1 and 5 on node2?  Will some resources migrate to node2?
And even later, node3 turns on.  Joins cluster.  No more resources start.  Resources are distributed how?  10 on node1 and 5 on node2 and 0 on node3?  Or will some resources migrate to node3?

Our clustering has been very successful to this point.  We are considering future options.


Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com <mailto:Mark.Vallevand at Unisys.com> 
Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Michael Schwartzkopff
Sent: Thursday, October 01, 2015 11:40 AM
To: linux clustering
Subject: Re: [Linux-cluster] Resource placement after node comes online.

Am Donnerstag, 1. Oktober 2015, 16:25:48 schrieb Vallevand, Mark K:
> In a multiple node cluster with resources distributed across the nodes, is
> there a way to automatically 'rebalance' resources when a node comes
> online?


pacemaker decides where to stop / start resources based on a score system 
every time the cluster receives an event. A node joing the cluster is 
definitely an event that triggers a recalculation of the scores that a resource 
would collect on every node.

Baed on constraints you can configure your cluster to re-distribute resources 
in a very granular way.

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 M?nchen

Tel: (0162) 1650044
Fax: (089) 620 304 13


From Mark.Vallevand at UNISYS.com  Thu Oct 15 17:18:39 2015
From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K)
Date: Thu, 15 Oct 2015 17:18:39 +0000
Subject: [Linux-cluster] Alternative to resource monitor polling?
Message-ID: <4b27f2dbfad444a8bb5778777d90af8f@US-EXCH13-5.na.uis.unisys.com>

Is there an alternative to resource monitor polling to detect a resource failure?
If, for example, a resource failure is detected by our own software, could it signal clustering that a resource has failed?

Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com<mailto:Mark.Vallevand at Unisys.com>
Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20151015/c6baa93c/attachment.htm>

From Mark.Vallevand at UNISYS.com  Thu Oct 15 19:42:25 2015
From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K)
Date: Thu, 15 Oct 2015 19:42:25 +0000
Subject: [Linux-cluster] Alternative to resource monitor polling?
In-Reply-To: <4b27f2dbfad444a8bb5778777d90af8f@US-EXCH13-5.na.uis.unisys.com>
References: <4b27f2dbfad444a8bb5778777d90af8f@US-EXCH13-5.na.uis.unisys.com>
Message-ID: <94bcaf055d324da3866c97585fd15795@US-EXCH13-5.na.uis.unisys.com>

Is this the correct forum for questions like this?

Ubuntu 12.04 LTS
pacemaker 1.1.10
cman 3.1.7
corosync 1.4.6

One more question:
If my cluster has no resources, it seems like it takes 20s for a stopped node to be detected.  Is the value really 20s and is it a parameter that can be adjusted?

Thanks.

Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com<mailto:Mark.Vallevand at Unisys.com>
Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Vallevand, Mark K
Sent: Thursday, October 15, 2015 12:19 PM
To: linux clustering
Subject: [Linux-cluster] Alternative to resource monitor polling?

Is there an alternative to resource monitor polling to detect a resource failure?
If, for example, a resource failure is detected by our own software, could it signal clustering that a resource has failed?

Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com<mailto:Mark.Vallevand at Unisys.com>
Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20151015/b7f3e5d6/attachment.htm>

From lists at alteeve.ca  Thu Oct 15 19:51:37 2015
From: lists at alteeve.ca (Digimer)
Date: Thu, 15 Oct 2015 15:51:37 -0400
Subject: [Linux-cluster] Alternative to resource monitor polling?
In-Reply-To: <94bcaf055d324da3866c97585fd15795@US-EXCH13-5.na.uis.unisys.com>
References: <4b27f2dbfad444a8bb5778777d90af8f@US-EXCH13-5.na.uis.unisys.com>
	<94bcaf055d324da3866c97585fd15795@US-EXCH13-5.na.uis.unisys.com>
Message-ID: <562003C9.2080506@alteeve.ca>

I would ask on the Cluster Labs mailing list; Either users or Developers.

digimer

On 15/10/15 03:42 PM, Vallevand, Mark K wrote:
> Is this the correct forum for questions like this?
> 
>  
> 
> Ubuntu 12.04 LTS
> 
> pacemaker 1.1.10
> 
> cman 3.1.7
> 
> corosync 1.4.6
> 
>  
> 
> One more question:
> 
> If my cluster has no resources, it seems like it takes 20s for a stopped
> node to be detected.  Is the value really 20s and is it a parameter that
> can be adjusted?
> 
>  
> 
> Thanks.
> 
>  
> 
> Regards.
> Mark K Vallevand   Mark.Vallevand at Unisys.com
> <mailto:Mark.Vallevand at Unisys.com>
> Never try and teach a pig to sing: it's a waste of time, and it annoys
> the pig.
> 
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
> MATERIAL and is thus for use only by the intended recipient. If you
> received this in error, please contact the sender and delete the e-mail
> and its attachments from all computers.
> 
> *From:*linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Vallevand, Mark K
> *Sent:* Thursday, October 15, 2015 12:19 PM
> *To:* linux clustering
> *Subject:* [Linux-cluster] Alternative to resource monitor polling?
> 
>  
> 
> Is there an alternative to resource monitor polling to detect a resource
> failure?
> 
> If, for example, a resource failure is detected by our own software,
> could it signal clustering that a resource has failed?
> 
>  
> 
> Regards.
> Mark K Vallevand   Mark.Vallevand at Unisys.com
> <mailto:Mark.Vallevand at Unisys.com>
> Never try and teach a pig to sing: it's a waste of time, and it annoys
> the pig.
> 
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
> MATERIAL and is thus for use only by the intended recipient. If you
> received this in error, please contact the sender and delete the e-mail
> and its attachments from all computers.
> 
> 
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?


From Mark.Vallevand at UNISYS.com  Thu Oct 15 20:51:50 2015
From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K)
Date: Thu, 15 Oct 2015 20:51:50 +0000
Subject: [Linux-cluster] Alternative to resource monitor polling?
In-Reply-To: <562003C9.2080506@alteeve.ca>
References: <4b27f2dbfad444a8bb5778777d90af8f@US-EXCH13-5.na.uis.unisys.com>
	<94bcaf055d324da3866c97585fd15795@US-EXCH13-5.na.uis.unisys.com>
	<562003C9.2080506@alteeve.ca>
Message-ID: <b1158a4e7ea947b9b0b415915c01784c@US-EXCH13-5.na.uis.unisys.com>

Thanks. 


Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com <mailto:Mark.Vallevand at Unisys.com> 
Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.


-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
Sent: Thursday, October 15, 2015 02:52 PM
To: linux clustering
Subject: Re: [Linux-cluster] Alternative to resource monitor polling?

I would ask on the Cluster Labs mailing list; Either users or Developers.

digimer

On 15/10/15 03:42 PM, Vallevand, Mark K wrote:
> Is this the correct forum for questions like this?
> 
>  
> 
> Ubuntu 12.04 LTS
> 
> pacemaker 1.1.10
> 
> cman 3.1.7
> 
> corosync 1.4.6
> 
>  
> 
> One more question:
> 
> If my cluster has no resources, it seems like it takes 20s for a stopped
> node to be detected.  Is the value really 20s and is it a parameter that
> can be adjusted?
> 
>  
> 
> Thanks.
> 
>  
> 
> Regards.
> Mark K Vallevand   Mark.Vallevand at Unisys.com
> <mailto:Mark.Vallevand at Unisys.com>
> Never try and teach a pig to sing: it's a waste of time, and it annoys
> the pig.
> 
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
> MATERIAL and is thus for use only by the intended recipient. If you
> received this in error, please contact the sender and delete the e-mail
> and its attachments from all computers.
> 
> *From:*linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Vallevand, Mark K
> *Sent:* Thursday, October 15, 2015 12:19 PM
> *To:* linux clustering
> *Subject:* [Linux-cluster] Alternative to resource monitor polling?
> 
>  
> 
> Is there an alternative to resource monitor polling to detect a resource
> failure?
> 
> If, for example, a resource failure is detected by our own software,
> could it signal clustering that a resource has failed?
> 
>  
> 
> Regards.
> Mark K Vallevand   Mark.Vallevand at Unisys.com
> <mailto:Mark.Vallevand at Unisys.com>
> Never try and teach a pig to sing: it's a waste of time, and it annoys
> the pig.
> 
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
> MATERIAL and is thus for use only by the intended recipient. If you
> received this in error, please contact the sender and delete the e-mail
> and its attachments from all computers.
> 
> 
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

-- 
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From Mark.Vallevand at UNISYS.com  Fri Oct 16 14:31:22 2015
From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K)
Date: Fri, 16 Oct 2015 14:31:22 +0000
Subject: [Linux-cluster] Alternative to resource monitor polling?
References: <4b27f2dbfad444a8bb5778777d90af8f@US-EXCH13-5.na.uis.unisys.com>
	<94bcaf055d324da3866c97585fd15795@US-EXCH13-5.na.uis.unisys.com>
	<562003C9.2080506@alteeve.ca> 
Message-ID: <13ff227035f1469d89c7037e4c2ee3b3@US-EXCH13-5.na.uis.unisys.com>

Not seeing any response to my signup.  Pretty sure I signed up earlier.  Maybe getting spam filtered.


Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com <mailto:Mark.Vallevand at Unisys.com> 
Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.


-----Original Message-----
From: Vallevand, Mark K 
Sent: Thursday, October 15, 2015 03:52 PM
To: linux clustering
Subject: RE: [Linux-cluster] Alternative to resource monitor polling?

Thanks. 


Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com <mailto:Mark.Vallevand at Unisys.com> 
Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.


-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
Sent: Thursday, October 15, 2015 02:52 PM
To: linux clustering
Subject: Re: [Linux-cluster] Alternative to resource monitor polling?

I would ask on the Cluster Labs mailing list; Either users or Developers.

digimer

On 15/10/15 03:42 PM, Vallevand, Mark K wrote:
> Is this the correct forum for questions like this?
> 
>  
> 
> Ubuntu 12.04 LTS
> 
> pacemaker 1.1.10
> 
> cman 3.1.7
> 
> corosync 1.4.6
> 
>  
> 
> One more question:
> 
> If my cluster has no resources, it seems like it takes 20s for a stopped
> node to be detected.  Is the value really 20s and is it a parameter that
> can be adjusted?
> 
>  
> 
> Thanks.
> 
>  
> 
> Regards.
> Mark K Vallevand   Mark.Vallevand at Unisys.com
> <mailto:Mark.Vallevand at Unisys.com>
> Never try and teach a pig to sing: it's a waste of time, and it annoys
> the pig.
> 
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
> MATERIAL and is thus for use only by the intended recipient. If you
> received this in error, please contact the sender and delete the e-mail
> and its attachments from all computers.
> 
> *From:*linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Vallevand, Mark K
> *Sent:* Thursday, October 15, 2015 12:19 PM
> *To:* linux clustering
> *Subject:* [Linux-cluster] Alternative to resource monitor polling?
> 
>  
> 
> Is there an alternative to resource monitor polling to detect a resource
> failure?
> 
> If, for example, a resource failure is detected by our own software,
> could it signal clustering that a resource has failed?
> 
>  
> 
> Regards.
> Mark K Vallevand   Mark.Vallevand at Unisys.com
> <mailto:Mark.Vallevand at Unisys.com>
> Never try and teach a pig to sing: it's a waste of time, and it annoys
> the pig.
> 
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
> MATERIAL and is thus for use only by the intended recipient. If you
> received this in error, please contact the sender and delete the e-mail
> and its attachments from all computers.
> 
> 
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

-- 
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From vm at sykora.cz  Fri Oct 30 14:04:34 2015
From: vm at sykora.cz (Vladimir Martinek)
Date: Fri, 30 Oct 2015 15:04:34 +0100
Subject: [Linux-cluster] Two cluster nodes hold exclusive POSIX lock on the
	same file
Message-ID: <563378F2.6090801@sykora.cz>

Hello,

I have a 3 node cluster and fencing agent that takes about 30 seconds to 
complete the fencing. In those 30 seconds it is possible for two nodes 
of the cluster to get exclusive POSIX lock on the same file.

Did I miss something here or is this correct behaviour?

Also, when trying with BSD flock, it works as I would expect - the locks 
are only released after the fencing completes and node 1 is confirmed to 
be fenced.

Following is output of dlm_tool dump command. Watch for the line "gfs2fs 
purged 1 plocks for 1" - the locks of failed node 1 are purged long 
before the fencing is completed.

Thank you for any advice.

Vladimir Martinek


217 dlm:controld conf 2 0 1 memb 2 3 join left 1
217 dlm:controld left reason nodedown 1 procdown 0 leave 0
217 set_fence_actors for 1 low 2 count 2
217 daemon remove 1 nodedown need_fencing 1 low 2
217 fence work wait for cpg ringid
217 dlm:controld ring 2:1292 2 memb 2 3
217 fence work wait for cluster ringid
217 dlm:ls:gfs2fs conf 2 0 1 memb 2 3 join left 1
217 gfs2fs add_change cg 4 remove nodeid 1 reason nodedown
217 gfs2fs add_change cg 4 counts member 2 joined 0 remove 1 failed 1
217 gfs2fs stop_kernel cg 4
217 write "0" to "/sys/kernel/dlm/gfs2fs/control"
217 gfs2fs purged 1 plocks for 1
217 gfs2fs check_ringid wait cluster 1288 cpg 1:1288
217 dlm:ls:gfs2fs ring 2:1292 2 memb 2 3
217 gfs2fs check_ringid cluster 1288 cpg 2:1292
217 fence work wait for cluster ringid
217 gfs2fs check_ringid cluster 1288 cpg 2:1292
217 cluster quorum 1 seq 1292 nodes 2
217 cluster node 1 removed seq 1292
217 del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/1"
217 fence request 1 pos 0
217 fence request 1 pid 4046 nodedown time 1446211577 fence_all dlm_stonith
217 fence wait 1 pid 4046 running
217 gfs2fs check_ringid done cluster 1292 cpg 2:1292
217 gfs2fs check_fencing 1 wait start 30 fail 217
217 gfs2fs check_fencing wait_count 1
217 gfs2fs wait for fencing
218 fence wait 1 pid 4046 running
218 gfs2fs wait for fencing
219 fence wait 1 pid 4046 running
219 gfs2fs wait for fencing
220 fence wait 1 pid 4046 running
220 gfs2fs wait for fencing
221 fence wait 1 pid 4046 running
221 gfs2fs wait for fencing
222 fence wait 1 pid 4046 running
222 gfs2fs wait for fencing
223 fence wait 1 pid 4046 running
223 gfs2fs wait for fencing
224 fence wait 1 pid 4046 running
224 gfs2fs wait for fencing
225 fence wait 1 pid 4046 running
225 gfs2fs wait for fencing
226 fence wait 1 pid 4046 running
226 gfs2fs wait for fencing
227 fence wait 1 pid 4046 running
227 gfs2fs wait for fencing
228 fence wait 1 pid 4046 running
228 gfs2fs wait for fencing
229 fence wait 1 pid 4046 running
229 gfs2fs wait for fencing
230 fence wait 1 pid 4046 running
230 gfs2fs wait for fencing
231 fence wait 1 pid 4046 running
231 gfs2fs wait for fencing
232 fence wait 1 pid 4046 running
232 gfs2fs wait for fencing
233 fence wait 1 pid 4046 running
233 gfs2fs wait for fencing
234 fence wait 1 pid 4046 running
234 gfs2fs wait for fencing
235 fence wait 1 pid 4046 running
235 gfs2fs wait for fencing
236 fence wait 1 pid 4046 running
236 gfs2fs wait for fencing
237 fence wait 1 pid 4046 running
237 gfs2fs wait for fencing
238 fence wait 1 pid 4046 running
238 gfs2fs wait for fencing
239 fence wait 1 pid 4046 running
239 gfs2fs wait for fencing
240 fence wait 1 pid 4046 running
240 gfs2fs wait for fencing
241 fence wait 1 pid 4046 running
241 gfs2fs wait for fencing
242 fence wait 1 pid 4046 running
242 gfs2fs wait for fencing
243 fence wait 1 pid 4046 running
243 gfs2fs wait for fencing
244 fence wait 1 pid 4046 running
244 gfs2fs wait for fencing
245 fence wait 1 pid 4046 running
245 gfs2fs wait for fencing
246 fence wait 1 pid 4046 running
246 gfs2fs wait for fencing
247 fence wait 1 pid 4046 running
247 gfs2fs wait for fencing
248 fence result 1 pid 4046 result 0 exit status
248 fence wait 1 pid 4046 result 0
248 gfs2fs wait for fencing
248 fence status 1 receive 0 from 2 walltime 1446211608 local 248
248 gfs2fs check_fencing 1 done start 30 fail 217 fence 248
248 gfs2fs check_fencing done
248 gfs2fs send_start 2:4 counts 2 2 0 1 1
248 gfs2fs receive_start 2:4 len 80
248 gfs2fs match_change 2:4 matches cg 4
248 gfs2fs wait_messages cg 4 need 1 of 2
248 gfs2fs receive_start 3:2 len 80
248 gfs2fs match_change 3:2 matches cg 4
248 gfs2fs wait_messages cg 4 got all 2
248 gfs2fs start_kernel cg 4 member_count 2
248 dir_member 3
248 dir_member 2
248 dir_member 1
248 set_members rmdir "/sys/kernel/config/dlm/cluster/spaces/gfs2fs/nodes/1"
248 write "1" to "/sys/kernel/dlm/gfs2fs/control"
248 gfs2fs prepare_plocks
248 gfs2fs set_plock_data_node from 1 to 2
248 gfs2fs send_plocks_done 2:4 counts 2 2 0 1 1 plocks_data 1426592608
248 gfs2fs receive_plocks_done 2:4 flags 2 plocks_data 1426592608 need 0 
save 0


From teigland at redhat.com  Fri Oct 30 20:12:22 2015
From: teigland at redhat.com (David Teigland)
Date: Fri, 30 Oct 2015 15:12:22 -0500
Subject: [Linux-cluster] Two cluster nodes hold exclusive POSIX lock on
 the same file
In-Reply-To: <563378F2.6090801@sykora.cz>
References: <563378F2.6090801@sykora.cz>
Message-ID: <20151030201222.GD14890@redhat.com>

On Fri, Oct 30, 2015 at 03:04:34PM +0100, Vladimir Martinek wrote:
> Hello,
> 
> I have a 3 node cluster and fencing agent that takes about 30
> seconds to complete the fencing. In those 30 seconds it is possible
> for two nodes of the cluster to get exclusive POSIX lock on the same
> file.
> 
> Did I miss something here or is this correct behaviour?
> 
> Also, when trying with BSD flock, it works as I would expect - the
> locks are only released after the fencing completes and node 1 is
> confirmed to be fenced.
> 
> Following is output of dlm_tool dump command. Watch for the line
> "gfs2fs purged 1 plocks for 1" - the locks of failed node 1 are
> purged long before the fencing is completed.
> 
> Thank you for any advice.

It works as expected; recovery of posix locks does not need to wait for
fencing to complete.
Dave