From Mark.Vallevand at UNISYS.com  Thu Apr  3 20:58:30 2014
From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K)
Date: Thu, 3 Apr 2014 15:58:30 -0500
Subject: [Linux-cluster] Simple data replication in a cluster
Message-ID: <99C8B2929B39C24493377AC7A121E21FC5E47F3C5D@USEA-EXCH8.na.uis.unisys.com>

I'm looking for a simple way to replicate data within a cluster.

It looks like my resources will be self-configuring and may need to push changes they see to all nodes in the cluster.  The idea being that when a node crashes, the resource will have its configuration present on the node on which it is restarted.  We're talking about a few kb of data, probably in one file, probably text.  A typical cluster would have multiple resources (more than two), one resource per node and one extra node.

Ideas?

Could I use the CIB directly to replicate data?  Use cibadmin to update something and sync?
How big can a resource parameter be?  Could a resource modify its parameters so that they are replicated throughout the cluster?
Is there a simple file replication Resource Agent?
Drdb seems like overkill.

Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com<mailto:Mark.Vallevand at Unisys.com>
May you live in interesting times, may you come to the attention of important people and may all your wishes come true.
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140403/687cf1fd/attachment.htm>

From lists at alteeve.ca  Thu Apr  3 21:17:40 2014
From: lists at alteeve.ca (Digimer)
Date: Thu, 03 Apr 2014 17:17:40 -0400
Subject: [Linux-cluster] Simple data replication in a cluster
In-Reply-To: <99C8B2929B39C24493377AC7A121E21FC5E47F3C5D@USEA-EXCH8.na.uis.unisys.com>
References: <99C8B2929B39C24493377AC7A121E21FC5E47F3C5D@USEA-EXCH8.na.uis.unisys.com>
Message-ID: <533DCFF4.2070404@alteeve.ca>

On 03/04/14 04:58 PM, Vallevand, Mark K wrote:
> I?m looking for a simple way to replicate data within a cluster.
>
> It looks like my resources will be self-configuring and may need to push
> changes they see to all nodes in the cluster.  The idea being that when
> a node crashes, the resource will have its configuration present on the
> node on which it is restarted.  We?re talking about a few kb of data,
> probably in one file, probably text.  A typical cluster would have
> multiple resources (more than two), one resource per node and one extra
> node.
>
> Ideas?
>
> Could I use the CIB directly to replicate data?  Use cibadmin to update
> something and sync?
>
> How big can a resource parameter be?  Could a resource modify its
> parameters so that they are replicated throughout the cluster?
>
> Is there a simple file replication Resource Agent?
>
> Drdb seems like overkill.
>
> Regards.
> Mark K Vallevand Mark.Vallevand at Unisys.com

If you don't want to use DRBD + gfs2 (what I use), then you'll probably 
want to look at corosync directly for keeping the data in sync. 
Pacemaker itself is a cluster resource manager and I don't think the cib 
is well suited for general data sync'ing.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From morpheus.ibis at gmail.com  Thu Apr  3 21:30:32 2014
From: morpheus.ibis at gmail.com (Pavel Herrmann)
Date: Thu, 03 Apr 2014 23:30:32 +0200
Subject: [Linux-cluster] Simple data replication in a cluster
In-Reply-To: <99C8B2929B39C24493377AC7A121E21FC5E47F3C5D@USEA-EXCH8.na.uis.unisys.com>
References: <99C8B2929B39C24493377AC7A121E21FC5E47F3C5D@USEA-EXCH8.na.uis.unisys.com>
Message-ID: <2878037.uhyBF5WAPy@bloomfield>

Hi

On Thursday 03 of April 2014 15:58:30 Vallevand, Mark K wrote:
> I'm looking for a simple way to replicate data within a cluster.
> 
> It looks like my resources will be self-configuring and may need to push
> changes they see to all nodes in the cluster.  The idea being that when a
> node crashes, the resource will have its configuration present on the node
> on which it is restarted.  We're talking about a few kb of data, probably
> in one file, probably text.  A typical cluster would have multiple
> resources (more than two), one resource per node and one extra node.

I was facing a similar issue, but instead of going for full cluster stack (i 
didnt need it for the failover), I went for csync2

if you absolutely need to have the changes propagated instantly, you need to 
hook csync2 to inotify (google should give you options), or if you dont expect 
any changes right before crashes, running with cron every few minutes might 
suit your needs

Regards
Pavel Herrmann

> 
> Ideas?
> 
> Could I use the CIB directly to replicate data?  Use cibadmin to update
> something and sync? How big can a resource parameter be?  Could a resource
> modify its parameters so that they are replicated throughout the cluster?
> Is there a simple file replication Resource Agent?
> Drdb seems like overkill.
> 
> Regards.
> Mark K Vallevand  
> Mark.Vallevand at Unisys.com<mailto:Mark.Vallevand at Unisys.com> May you live in
> interesting times, may you come to the attention of important people and
> may all your wishes come true. THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL
> AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the
> intended recipient. If you received this in error, please contact the
> sender and delete the e-mail and its attachments from all computers.


From ricks at alldigital.com  Thu Apr  3 21:51:33 2014
From: ricks at alldigital.com (Rick Stevens)
Date: Thu, 3 Apr 2014 14:51:33 -0700
Subject: [Linux-cluster] Simple data replication in a cluster
In-Reply-To: <99C8B2929B39C24493377AC7A121E21FC5E47F3C5D@USEA-EXCH8.na.uis.unisys.com>
References: <99C8B2929B39C24493377AC7A121E21FC5E47F3C5D@USEA-EXCH8.na.uis.unisys.com>
Message-ID: <533DD7E5.2020709@alldigital.com>

On 04/03/2014 01:58 PM, Vallevand, Mark K issued this missive:
> I?m looking for a simple way to replicate data within a cluster.
>
> It looks like my resources will be self-configuring and may need to push
> changes they see to all nodes in the cluster.  The idea being that when
> a node crashes, the resource will have its configuration present on the
> node on which it is restarted.  We?re talking about a few kb of data,
> probably in one file, probably text.  A typical cluster would have
> multiple resources (more than two), one resource per node and one extra
> node.
>
> Ideas?
>
> Could I use the CIB directly to replicate data?  Use cibadmin to update
> something and sync?
>
> How big can a resource parameter be?  Could a resource modify its
> parameters so that they are replicated throughout the cluster?
>
> Is there a simple file replication Resource Agent?
>
> Drdb seems like overkill.

If you're OK with it and it's a small group of files/directories,
why not use something like inotifywait and have it run a script that
rsyncs the altered files to the other nodes when the files change? I've
done it before and it works pretty well.
-- 
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer, AllDigital    ricks at alldigital.com -
- AIM/Skype: therps2        ICQ: 22643734            Yahoo: origrps2 -
-                                                                    -
-              Never eat anything larger than your head              -
----------------------------------------------------------------------


From Mark.Vallevand at UNISYS.com  Fri Apr  4 14:44:26 2014
From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K)
Date: Fri, 4 Apr 2014 09:44:26 -0500
Subject: [Linux-cluster] Simple data replication in a cluster
In-Reply-To: <533DCFF4.2070404@alteeve.ca>
References: <99C8B2929B39C24493377AC7A121E21FC5E47F3C5D@USEA-EXCH8.na.uis.unisys.com>
	<533DCFF4.2070404@alteeve.ca>
Message-ID: <99C8B2929B39C24493377AC7A121E21FC5E48A8EE3@USEA-EXCH8.na.uis.unisys.com>

Thanks.  I'll check out corosync.


Regards.
Mark K Vallevand?? Mark.Vallevand at Unisys.com
May you live in interesting times, may you come to the attention of important people and may all your wishes come true.
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.


-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
Sent: Thursday, April 03, 2014 04:18 PM
To: linux clustering
Subject: Re: [Linux-cluster] Simple data replication in a cluster

On 03/04/14 04:58 PM, Vallevand, Mark K wrote:
> I'm looking for a simple way to replicate data within a cluster.
>
> It looks like my resources will be self-configuring and may need to push
> changes they see to all nodes in the cluster.  The idea being that when
> a node crashes, the resource will have its configuration present on the
> node on which it is restarted.  We're talking about a few kb of data,
> probably in one file, probably text.  A typical cluster would have
> multiple resources (more than two), one resource per node and one extra
> node.
>
> Ideas?
>
> Could I use the CIB directly to replicate data?  Use cibadmin to update
> something and sync?
>
> How big can a resource parameter be?  Could a resource modify its
> parameters so that they are replicated throughout the cluster?
>
> Is there a simple file replication Resource Agent?
>
> Drdb seems like overkill.
>
> Regards.
> Mark K Vallevand Mark.Vallevand at Unisys.com

If you don't want to use DRBD + gfs2 (what I use), then you'll probably 
want to look at corosync directly for keeping the data in sync. 
Pacemaker itself is a cluster resource manager and I don't think the cib 
is well suited for general data sync'ing.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

-- 
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From Mark.Vallevand at UNISYS.com  Fri Apr  4 14:48:31 2014
From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K)
Date: Fri, 4 Apr 2014 09:48:31 -0500
Subject: [Linux-cluster] Simple data replication in a cluster
In-Reply-To: <2878037.uhyBF5WAPy@bloomfield>
References: <99C8B2929B39C24493377AC7A121E21FC5E47F3C5D@USEA-EXCH8.na.uis.unisys.com>
	<2878037.uhyBF5WAPy@bloomfield>
Message-ID: <99C8B2929B39C24493377AC7A121E21FC5E48A8EFD@USEA-EXCH8.na.uis.unisys.com>

Thanks.  A good idea.  I've considered using csync2 (and similar things).  I use unison for syncing in my current clusters.  But that is done to simplify installation/configuration by the user.  My scripts push the static application data around the cluster when the user is installing.  


Regards.
Mark K Vallevand?? Mark.Vallevand at Unisys.com
May you live in interesting times, may you come to the attention of important people and may all your wishes come true.
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.


-----Original Message-----
From: Pavel Herrmann [mailto:morpheus.ibis at gmail.com] 
Sent: Thursday, April 03, 2014 04:31 PM
To: linux-cluster at redhat.com
Cc: Vallevand, Mark K
Subject: Re: [Linux-cluster] Simple data replication in a cluster

Hi

On Thursday 03 of April 2014 15:58:30 Vallevand, Mark K wrote:
> I'm looking for a simple way to replicate data within a cluster.
> 
> It looks like my resources will be self-configuring and may need to push
> changes they see to all nodes in the cluster.  The idea being that when a
> node crashes, the resource will have its configuration present on the node
> on which it is restarted.  We're talking about a few kb of data, probably
> in one file, probably text.  A typical cluster would have multiple
> resources (more than two), one resource per node and one extra node.

I was facing a similar issue, but instead of going for full cluster stack (i 
didnt need it for the failover), I went for csync2

if you absolutely need to have the changes propagated instantly, you need to 
hook csync2 to inotify (google should give you options), or if you dont expect 
any changes right before crashes, running with cron every few minutes might 
suit your needs

Regards
Pavel Herrmann

> 
> Ideas?
> 
> Could I use the CIB directly to replicate data?  Use cibadmin to update
> something and sync? How big can a resource parameter be?  Could a resource
> modify its parameters so that they are replicated throughout the cluster?
> Is there a simple file replication Resource Agent?
> Drdb seems like overkill.
> 
> Regards.
> Mark K Vallevand  
> Mark.Vallevand at Unisys.com<mailto:Mark.Vallevand at Unisys.com> May you live in
> interesting times, may you come to the attention of important people and
> may all your wishes come true. THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL
> AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the
> intended recipient. If you received this in error, please contact the
> sender and delete the e-mail and its attachments from all computers.


From Mark.Vallevand at UNISYS.com  Fri Apr  4 14:48:58 2014
From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K)
Date: Fri, 4 Apr 2014 09:48:58 -0500
Subject: [Linux-cluster] Simple data replication in a cluster
In-Reply-To: <533DD7E5.2020709@alldigital.com>
References: <99C8B2929B39C24493377AC7A121E21FC5E47F3C5D@USEA-EXCH8.na.uis.unisys.com>
	<533DD7E5.2020709@alldigital.com>
Message-ID: <99C8B2929B39C24493377AC7A121E21FC5E48A8F02@USEA-EXCH8.na.uis.unisys.com>

Yup.  I've considered similar.  Thanks!


Regards.
Mark K Vallevand?? Mark.Vallevand at Unisys.com
May you live in interesting times, may you come to the attention of important people and may all your wishes come true.
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.


-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Rick Stevens
Sent: Thursday, April 03, 2014 04:52 PM
To: linux clustering
Subject: Re: [Linux-cluster] Simple data replication in a cluster

On 04/03/2014 01:58 PM, Vallevand, Mark K issued this missive:
> I'm looking for a simple way to replicate data within a cluster.
>
> It looks like my resources will be self-configuring and may need to push
> changes they see to all nodes in the cluster.  The idea being that when
> a node crashes, the resource will have its configuration present on the
> node on which it is restarted.  We're talking about a few kb of data,
> probably in one file, probably text.  A typical cluster would have
> multiple resources (more than two), one resource per node and one extra
> node.
>
> Ideas?
>
> Could I use the CIB directly to replicate data?  Use cibadmin to update
> something and sync?
>
> How big can a resource parameter be?  Could a resource modify its
> parameters so that they are replicated throughout the cluster?
>
> Is there a simple file replication Resource Agent?
>
> Drdb seems like overkill.

If you're OK with it and it's a small group of files/directories,
why not use something like inotifywait and have it run a script that
rsyncs the altered files to the other nodes when the files change? I've
done it before and it works pretty well.
-- 
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer, AllDigital    ricks at alldigital.com -
- AIM/Skype: therps2        ICQ: 22643734            Yahoo: origrps2 -
-                                                                    -
-              Never eat anything larger than your head              -
----------------------------------------------------------------------

-- 
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From bjoern.teipel at internetbrands.com  Mon Apr  7 07:26:43 2014
From: bjoern.teipel at internetbrands.com (Bjoern Teipel)
Date: Mon, 7 Apr 2014 00:26:43 -0700
Subject: [Linux-cluster] DLM nodes disconnected issue
Message-ID: <CAE6679=O70LmPCRD2HD7uwNpC+exQMqaStkqinEf0W+v2=4iBg@mail.gmail.com>

H all,

i did a dlm_tool leave clvmd on one node (node06) of a CMAN cluster with CLVMD
Now I have the problem that clvmd is stuck and all nodes lost
connections to DLM.
For some reason dlm want's to fence member 8 I guess and that might
stuck the whole dlm?
All other stacks, cman, corosync look fine...

Thanks,
Bjoern

Error:

dlm: closing connection to node 2
dlm: closing connection to node 3
dlm: closing connection to node 4
dlm: closing connection to node 5
dlm: closing connection to node 6
dlm: closing connection to node 8
dlm: closing connection to node 9
dlm: closing connection to node 10
dlm: closing connection to node 2
dlm: closing connection to node 3
dlm: closing connection to node 4
dlm: closing connection to node 5
dlm: closing connection to node 6
dlm: closing connection to node 8
dlm: closing connection to node 9
dlm: closing connection to node 10
INFO: task dlm_tool:33699 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
dlm_tool      D 0000000000000003     0 33699  33698 0x00000080
 ffff88138905dcc0 0000000000000082 ffffffff81168043 ffff88138905dd18
 ffff88138905dd08 ffff88305b30ccc0 ffff88304fa5c800 ffff883058e49900
 ffff881857329058 ffff88138905dfd8 000000000000fb88 ffff881857329058
Call Trace:
 [<ffffffff81168043>] ? kmem_cache_alloc_trace+0x1a3/0x1b0
 [<ffffffff8132f79a>] ? misc_open+0x1ca/0x320
 [<ffffffff81510725>] rwsem_down_failed_common+0x95/0x1d0
 [<ffffffff81185505>] ? chrdev_open+0x125/0x230
 [<ffffffff815108b6>] rwsem_down_read_failed+0x26/0x30
 [<ffffffff8117e5ff>] ? __dentry_open+0x23f/0x360
 [<ffffffff81283894>] call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff8150fdb4>] ? down_read+0x24/0x30
 [<ffffffffa06d948d>] dlm_clear_proc_locks+0x3d/0x2a0 [dlm]
 [<ffffffff811dfed6>] ? generic_acl_chmod+0x46/0xd0
 [<ffffffffa06e4b36>] device_close+0x66/0xc0 [dlm]
 [<ffffffff81182b45>] __fput+0xf5/0x210
 [<ffffffff81182c85>] fput+0x25/0x30
 [<ffffffff8117e0dd>] filp_close+0x5d/0x90
 [<ffffffff8117e1b5>] sys_close+0xa5/0x100
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b


Status:

cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M  18908   2014-03-24 19:01:00  node01
   2   M  18972   2014-04-06 22:47:57  node02
   3   M  18972   2014-04-06 22:47:57  node03
   4   M  18972   2014-04-06 22:47:57  node04
   5   M  18972   2014-04-06 22:47:57  node05
   6   X  18960                        node06
   7   X  18928                        node07
   8   M  18972   2014-04-06 22:47:57  node08
   9   M  18972   2014-04-06 22:47:57  node09
  10   M  18972   2014-04-06 22:47:57  node10

dlm lockspaces
name          clvmd
id            0x4104eefa
flags         0x00000004 kern_stop
change        member 8 joined 0 remove 1 failed 0 seq 11,11
members       1 2 3 4 5 8 9 10
new change    member 8 joined 1 remove 0 failed 0 seq 12,41
new status    wait_messages 0 wait_condition 1 fencing
new members   1 2 3 4 5 8 9 10


DLM dump:
1396849677 cluster node 2 added seq 18972
1396849677 set_configfs_node 2 10.14.18.66 local 0
1396849677 cluster node 3 added seq 18972
1396849677 set_configfs_node 3 10.14.18.67 local 0
1396849677 cluster node 4 added seq 18972
1396849677 set_configfs_node 4 10.14.18.68 local 0
1396849677 cluster node 5 added seq 18972
1396849677 set_configfs_node 5 10.14.18.70 local 0
1396849677 cluster node 8 added seq 18972
1396849677 set_configfs_node 8 10.14.18.80 local 0
1396849677 cluster node 9 added seq 18972
1396849677 set_configfs_node 9 10.14.18.81 local 0
1396849677 cluster node 10 added seq 18972
1396849677 set_configfs_node 10 10.14.18.77 local 0
1396849677 dlm:ls:clvmd conf 2 1 0 memb 1 3 join 3 left
1396849677 clvmd add_change cg 35 joined nodeid 3
1396849677 clvmd add_change cg 35 counts member 2 joined 1 remove 0 failed 0
1396849677 dlm:ls:clvmd conf 3 1 0 memb 1 2 3 join 2 left
1396849677 clvmd add_change cg 36 joined nodeid 2
1396849677 clvmd add_change cg 36 counts member 3 joined 1 remove 0 failed 0
1396849677 dlm:ls:clvmd conf 4 1 0 memb 1 2 3 9 join 9 left
1396849677 clvmd add_change cg 37 joined nodeid 9
1396849677 clvmd add_change cg 37 counts member 4 joined 1 remove 0 failed 0
1396849677 dlm:ls:clvmd conf 5 1 0 memb 1 2 3 8 9 join 8 left
1396849677 clvmd add_change cg 38 joined nodeid 8
1396849677 clvmd add_change cg 38 counts member 5 joined 1 remove 0 failed 0
1396849677 dlm:ls:clvmd conf 6 1 0 memb 1 2 3 8 9 10 join 10 left
1396849677 clvmd add_change cg 39 joined nodeid 10
1396849677 clvmd add_change cg 39 counts member 6 joined 1 remove 0 failed 0
1396849677 dlm:ls:clvmd conf 7 1 0 memb 1 2 3 5 8 9 10 join 5 left
1396849677 clvmd add_change cg 40 joined nodeid 5
1396849677 clvmd add_change cg 40 counts member 7 joined 1 remove 0 failed 0
1396849677 dlm:ls:clvmd conf 8 1 0 memb 1 2 3 4 5 8 9 10 join 4 left
1396849677 clvmd add_change cg 41 joined nodeid 4
1396849677 clvmd add_change cg 41 counts member 8 joined 1 remove 0 failed 0
1396849677 dlm:controld conf 2 1 0 memb 1 3 join 3 left
1396849677 dlm:controld conf 3 1 0 memb 1 2 3 join 2 left
1396849677 dlm:controld conf 4 1 0 memb 1 2 3 9 join 9 left
1396849677 dlm:controld conf 5 1 0 memb 1 2 3 8 9 join 8 left
1396849677 dlm:controld conf 6 1 0 memb 1 2 3 8 9 10 join 10 left
1396849677 dlm:controld conf 7 1 0 memb 1 2 3 5 8 9 10 join 5 left
1396849677 dlm:controld conf 8 1 0 memb 1 2 3 4 5 8 9 10 join 4 left


From emi2fast at gmail.com  Mon Apr  7 07:44:03 2014
From: emi2fast at gmail.com (emmanuel segura)
Date: Mon, 7 Apr 2014 09:44:03 +0200
Subject: [Linux-cluster] DLM nodes disconnected issue
In-Reply-To: <CAE6679=O70LmPCRD2HD7uwNpC+exQMqaStkqinEf0W+v2=4iBg@mail.gmail.com>
References: <CAE6679=O70LmPCRD2HD7uwNpC+exQMqaStkqinEf0W+v2=4iBg@mail.gmail.com>
Message-ID: <CAE7pJ3B9Z3EmS9YhAfq7iRasj31EW7R_DMLy+UH6zQaTN08e6w@mail.gmail.com>

your fencing is working ? because i see this from your dlm lockspace "new
status    wait_messages 0 wait_condition 1 fencing".


2014-04-07 9:26 GMT+02:00 Bjoern Teipel <bjoern.teipel at internetbrands.com>:

> H all,
>
> i did a dlm_tool leave clvmd on one node (node06) of a CMAN cluster with
> CLVMD
> Now I have the problem that clvmd is stuck and all nodes lost
> connections to DLM.
> For some reason dlm want's to fence member 8 I guess and that might
> stuck the whole dlm?
> All other stacks, cman, corosync look fine...
>
> Thanks,
> Bjoern
>
> Error:
>
> dlm: closing connection to node 2
> dlm: closing connection to node 3
> dlm: closing connection to node 4
> dlm: closing connection to node 5
> dlm: closing connection to node 6
> dlm: closing connection to node 8
> dlm: closing connection to node 9
> dlm: closing connection to node 10
> dlm: closing connection to node 2
> dlm: closing connection to node 3
> dlm: closing connection to node 4
> dlm: closing connection to node 5
> dlm: closing connection to node 6
> dlm: closing connection to node 8
> dlm: closing connection to node 9
> dlm: closing connection to node 10
> INFO: task dlm_tool:33699 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> dlm_tool      D 0000000000000003     0 33699  33698 0x00000080
>  ffff88138905dcc0 0000000000000082 ffffffff81168043 ffff88138905dd18
>  ffff88138905dd08 ffff88305b30ccc0 ffff88304fa5c800 ffff883058e49900
>  ffff881857329058 ffff88138905dfd8 000000000000fb88 ffff881857329058
> Call Trace:
>  [<ffffffff81168043>] ? kmem_cache_alloc_trace+0x1a3/0x1b0
>  [<ffffffff8132f79a>] ? misc_open+0x1ca/0x320
>  [<ffffffff81510725>] rwsem_down_failed_common+0x95/0x1d0
>  [<ffffffff81185505>] ? chrdev_open+0x125/0x230
>  [<ffffffff815108b6>] rwsem_down_read_failed+0x26/0x30
>  [<ffffffff8117e5ff>] ? __dentry_open+0x23f/0x360
>  [<ffffffff81283894>] call_rwsem_down_read_failed+0x14/0x30
>  [<ffffffff8150fdb4>] ? down_read+0x24/0x30
>  [<ffffffffa06d948d>] dlm_clear_proc_locks+0x3d/0x2a0 [dlm]
>  [<ffffffff811dfed6>] ? generic_acl_chmod+0x46/0xd0
>  [<ffffffffa06e4b36>] device_close+0x66/0xc0 [dlm]
>  [<ffffffff81182b45>] __fput+0xf5/0x210
>  [<ffffffff81182c85>] fput+0x25/0x30
>  [<ffffffff8117e0dd>] filp_close+0x5d/0x90
>  [<ffffffff8117e1b5>] sys_close+0xa5/0x100
>  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
>
>
>
> Status:
>
> cman_tool nodes
> Node  Sts   Inc   Joined               Name
>    1   M  18908   2014-03-24 19:01:00  node01
>    2   M  18972   2014-04-06 22:47:57  node02
>    3   M  18972   2014-04-06 22:47:57  node03
>    4   M  18972   2014-04-06 22:47:57  node04
>    5   M  18972   2014-04-06 22:47:57  node05
>    6   X  18960                        node06
>    7   X  18928                        node07
>    8   M  18972   2014-04-06 22:47:57  node08
>    9   M  18972   2014-04-06 22:47:57  node09
>   10   M  18972   2014-04-06 22:47:57  node10
>
> dlm lockspaces
> name          clvmd
> id            0x4104eefa
> flags         0x00000004 kern_stop
> change        member 8 joined 0 remove 1 failed 0 seq 11,11
> members       1 2 3 4 5 8 9 10
> new change    member 8 joined 1 remove 0 failed 0 seq 12,41
> new status    wait_messages 0 wait_condition 1 fencing
> new members   1 2 3 4 5 8 9 10
>
>
>
> DLM dump:
> 1396849677 cluster node 2 added seq 18972
> 1396849677 set_configfs_node 2 10.14.18.66 local 0
> 1396849677 cluster node 3 added seq 18972
> 1396849677 set_configfs_node 3 10.14.18.67 local 0
> 1396849677 cluster node 4 added seq 18972
> 1396849677 set_configfs_node 4 10.14.18.68 local 0
> 1396849677 cluster node 5 added seq 18972
> 1396849677 set_configfs_node 5 10.14.18.70 local 0
> 1396849677 cluster node 8 added seq 18972
> 1396849677 set_configfs_node 8 10.14.18.80 local 0
> 1396849677 cluster node 9 added seq 18972
> 1396849677 set_configfs_node 9 10.14.18.81 local 0
> 1396849677 cluster node 10 added seq 18972
> 1396849677 set_configfs_node 10 10.14.18.77 local 0
> 1396849677 dlm:ls:clvmd conf 2 1 0 memb 1 3 join 3 left
> 1396849677 clvmd add_change cg 35 joined nodeid 3
> 1396849677 clvmd add_change cg 35 counts member 2 joined 1 remove 0 failed
> 0
> 1396849677 dlm:ls:clvmd conf 3 1 0 memb 1 2 3 join 2 left
> 1396849677 clvmd add_change cg 36 joined nodeid 2
> 1396849677 clvmd add_change cg 36 counts member 3 joined 1 remove 0 failed
> 0
> 1396849677 dlm:ls:clvmd conf 4 1 0 memb 1 2 3 9 join 9 left
> 1396849677 clvmd add_change cg 37 joined nodeid 9
> 1396849677 clvmd add_change cg 37 counts member 4 joined 1 remove 0 failed
> 0
> 1396849677 dlm:ls:clvmd conf 5 1 0 memb 1 2 3 8 9 join 8 left
> 1396849677 clvmd add_change cg 38 joined nodeid 8
> 1396849677 clvmd add_change cg 38 counts member 5 joined 1 remove 0 failed
> 0
> 1396849677 dlm:ls:clvmd conf 6 1 0 memb 1 2 3 8 9 10 join 10 left
> 1396849677 clvmd add_change cg 39 joined nodeid 10
> 1396849677 clvmd add_change cg 39 counts member 6 joined 1 remove 0 failed
> 0
> 1396849677 dlm:ls:clvmd conf 7 1 0 memb 1 2 3 5 8 9 10 join 5 left
> 1396849677 clvmd add_change cg 40 joined nodeid 5
> 1396849677 clvmd add_change cg 40 counts member 7 joined 1 remove 0 failed
> 0
> 1396849677 dlm:ls:clvmd conf 8 1 0 memb 1 2 3 4 5 8 9 10 join 4 left
> 1396849677 clvmd add_change cg 41 joined nodeid 4
> 1396849677 clvmd add_change cg 41 counts member 8 joined 1 remove 0 failed
> 0
> 1396849677 dlm:controld conf 2 1 0 memb 1 3 join 3 left
> 1396849677 dlm:controld conf 3 1 0 memb 1 2 3 join 2 left
> 1396849677 dlm:controld conf 4 1 0 memb 1 2 3 9 join 9 left
> 1396849677 dlm:controld conf 5 1 0 memb 1 2 3 8 9 join 8 left
> 1396849677 dlm:controld conf 6 1 0 memb 1 2 3 8 9 10 join 10 left
> 1396849677 dlm:controld conf 7 1 0 memb 1 2 3 5 8 9 10 join 5 left
> 1396849677 dlm:controld conf 8 1 0 memb 1 2 3 4 5 8 9 10 join 4 left
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140407/7c1a9906/attachment.htm>

From mgrac at redhat.com  Mon Apr  7 12:39:22 2014
From: mgrac at redhat.com (Marek Grac)
Date: Mon, 07 Apr 2014 14:39:22 +0200
Subject: [Linux-cluster] fence-agents-4.0.8 stable release
Message-ID: <53429C7A.9050905@redhat.com>

Welcome to the fence-agents 4.0.8 release.

This release includes new fence agent for Raritan and several bugfixes:

* fence_wti respects delay option in telnet connections
* fixed problem when using identity file for login via ssh
* correct values in manual pages for symlinks
* allow SSl connection to fallback to SSL 3.0 (--notls) used for HP iLO2


The new source tarball can be downloaded here:

https://fedorahosted.org/releases/f/e/fence-agents/fence-agents-4.0.8.tar.xz 


To report bugs or issues:

https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

    Join us on IRC (irc.freenode.net #linux-cluster) and share your
    experience  with other sysadministrators or power users.

Thanks/congratulations to all people that contributed to achieve this
great milestone.

m,


From lists at alteeve.ca  Tue Apr 15 21:03:55 2014
From: lists at alteeve.ca (Digimer)
Date: Tue, 15 Apr 2014 17:03:55 -0400
Subject: [Linux-cluster] KVM Live migration when node's FS is read-only
Message-ID: <534D9EBB.80200@alteeve.ca>

Hi all,

   So I hit a weird issue last week... (EL6 + cman + rgamanager + drbd)

   For reasons unknown, a client thought they could start yanking and 
replacing hard drives on a running node. Obviously, that did not end 
well. The VMs that had been running on the node continues to operate 
fine and they just started using the peer's storage.

   The problem came when I tried to live-migrate the VMs over to the 
still-good node. Obviously, the old host couldn't write to logs, and the 
live-migration failed. Once failed, rgmanager also stopped working once 
the migration failed. In the end, I had to manually fence the node 
(corosync never failed, so it didn't get automatically fenced).

   This obviously caused the VMs running on the node to reboot, causing 
a ~40 second outage. It strikes me that the system *should* have been 
able to migrate, had it not tried to write to the logs.

   Is there a way, or can there be made a way, to migrate VMs off of a 
node whose underlying FS is read-only/corrupt/destroyed, so long as the 
programs in memory are still working?

   I am sure this is part a part rgmanager, part KVM/qemu question.

Thanks for any feedback!

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From david.l.henley at hp.com  Thu Apr 17 13:20:11 2014
From: david.l.henley at hp.com (Henley, David (Solutions Architect Chicago))
Date: Thu, 17 Apr 2014 13:20:11 +0000
Subject: [Linux-cluster] KVM availability groups
Message-ID: <9195F18F518EC7428E0397625DF6E1AE1890FF01@G2W2431.americas.hpqcorp.net>

I have 8 to 10 Rack mount Servers running Red Hat KVM.
I need to create 2 availability zones and a backup zone.


1.       What tools do you use to create these? Is it always scripted or is there an open source interface similar to say Vcenter.

2.       Are there KVM tools that monitor the zones?

Thanks Dave

David Henley
Solutions Architect
Hewlett-Packard Company
+1 815 341 2463
dhenley at hp.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140417/cf0c7b62/attachment.htm>

From morpheus.ibis at gmail.com  Thu Apr 17 19:16:22 2014
From: morpheus.ibis at gmail.com (Pavel Herrmann)
Date: Thu, 17 Apr 2014 21:16:22 +0200
Subject: [Linux-cluster] KVM availability groups
In-Reply-To: <9195F18F518EC7428E0397625DF6E1AE1890FF01@G2W2431.americas.hpqcorp.net>
References: <9195F18F518EC7428E0397625DF6E1AE1890FF01@G2W2431.americas.hpqcorp.net>
Message-ID: <2179081.39Bc0pasea@bloomfield>

Hi,

I am not an expert in this, but as far as i understand it works like this

On Thursday 17 of April 2014 13:20:11 Henley, David wrote:
> I have 8 to 10 Rack mount Servers running Red Hat KVM.
> I need to create 2 availability zones and a backup zone.
> 
> 
> 1.       What tools do you use to create these? Is it always scripted or is
> there an open source interface similar to say Vcenter.

There are vcenter-like interfaces, but I'm not sure how they handle HA, have a 
look at ganeti and/or openstack

this list is rather more concerned about the low level workings of clustered 
systems, with tools such as cman or pacemaker (depending on your OS version, I 
think all current RHEL versions use cman) to monitor and manage availability 
of your services (a VM is a service in this context), and corosync to keep 
your cluster in a consistent state.

if you are looking for a vsphere replacement, you might have better luck with 
openstack than tinkering with linux clustering directly, in my opinion.


> 2.       Are there KVM tools that monitor the zones?

You would probably use libvirt interface to manipulate with your KVM instances

regards,
Pavel Herrmann


From mgalan at ujaen.es  Fri Apr 18 17:44:57 2014
From: mgalan at ujaen.es (=?ISO-8859-1?Q?Manuel_Gal=E1n_=28UJA=29?=)
Date: Fri, 18 Apr 2014 19:44:57 +0200
Subject: [Linux-cluster] Linux-cluster Digest, Vol 120, Issue 5
Message-ID: <tqrum52bx91j8uu65ks5waxm.1397842963095@email.android.com>

Hello all,?
? ? What about ovirt? visit ovirt.org

Good weekend...


Enviado de Samsung Mobile

-------- Mensaje original --------
De: linux-cluster-request at redhat.com 
Fecha: 18/04/2014  18:00  (GMT+01:00) 
Para: linux-cluster at redhat.com 
Asunto: Linux-cluster Digest, Vol 120, Issue 5 
 
Send Linux-cluster mailing list submissions to
linux-cluster at redhat.com

To subscribe or unsubscribe via the World Wide Web, visit
https://www.redhat.com/mailman/listinfo/linux-cluster
or, via email, send a message with subject or body 'help' to
linux-cluster-request at redhat.com

You can reach the person managing the list at
linux-cluster-owner at redhat.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-cluster digest..."


Today's Topics:

?? 1. Re: KVM availability groups (Pavel Herrmann)


----------------------------------------------------------------------

Message: 1
Date: Thu, 17 Apr 2014 21:16:22 +0200
From: Pavel Herrmann <morpheus.ibis at gmail.com>
To: linux-cluster at redhat.com
Cc: "Henley, David \(Solutions Architect Chicago\)"
<david.l.henley at hp.com>
Subject: Re: [Linux-cluster] KVM availability groups
Message-ID: <2179081.39Bc0pasea at bloomfield>
Content-Type: text/plain; charset="us-ascii"

Hi,

I am not an expert in this, but as far as i understand it works like this

On Thursday 17 of April 2014 13:20:11 Henley, David wrote:
> I have 8 to 10 Rack mount Servers running Red Hat KVM.
> I need to create 2 availability zones and a backup zone.
> 
> 
> 1.?????? What tools do you use to create these? Is it always scripted or is
> there an open source interface similar to say Vcenter.

There are vcenter-like interfaces, but I'm not sure how they handle HA, have a 
look at ganeti and/or openstack

this list is rather more concerned about the low level workings of clustered 
systems, with tools such as cman or pacemaker (depending on your OS version, I 
think all current RHEL versions use cman) to monitor and manage availability 
of your services (a VM is a service in this context), and corosync to keep 
your cluster in a consistent state.

if you are looking for a vsphere replacement, you might have better luck with 
openstack than tinkering with linux clustering directly, in my opinion.


> 2.?????? Are there KVM tools that monitor the zones?

You would probably use libvirt interface to manipulate with your KVM instances

regards,
Pavel Herrmann


------------------------------

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 120, Issue 5
*********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140418/0de46287/attachment.htm>

From swhiteho at redhat.com  Fri Apr 25 11:42:59 2014
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Fri, 25 Apr 2014 12:42:59 +0100
Subject: [Linux-cluster] mixing OS versions?
In-Reply-To: <53593BD9.4050602@ucl.ac.uk>
References: <12440.1396024637@localhost> <5335CE1A.3060509@redhat.com>	
	<5335F2B4.6080605@mssl.ucl.ac.uk>
	<1396179266.2659.30.camel@menhir> <53593BD9.4050602@ucl.ac.uk>
Message-ID: <535A4A43.8000005@redhat.com>

Hi,

On 24/04/14 17:29, Alan Brown wrote:
> On 30/03/14 12:34, Steven Whitehouse wrote:
>
>> Well that is not entirely true. We have done a great deal of
>> investigation into this issue. We do test quotas (among many other
>> things) on each release to ensure that they are working. Our tests have
>> all passed correctly, and to date you have provided the only report of
>> this particular issue via our support team. So it is certainly not
>> something that lots of people are hitting.
>
> Someone else reported it on this list (on centos), so we're not an 
> isolated case.
>
>> We do now have a good idea of where the issue is. However it is clear
>> that simply exceeding quotas is not enough to trigger it. Instead quotas
>> need to be exceeded in a particular way.
>
> My suspicion is that it's some kind of interaction between quotas and 
> NFS, but it'd be good if you could provide a fuller explanation.
>
Yes, thats what we thought to start with... however that turned out to 
be a bit of a red herring. Or at least the issue has nothing 
specifically to do with NFS. The problem was related to when quota was 
exceeded, and specifically what operation was in progress. You could 
write to files as often as you wanted to, and exceeding quota would be 
handled correctly. The problem was a specific code path within the inode 
creation code, if it didn't result in quota being exceeded on that one 
specific code path, then everything would work as expected.

Also, quite often when the problem did appear, it did not actually 
trigger a problem until later, making it difficult to track down.

You are correct that someone else reported the issue on the list, 
however I'm not aware of any other reports beyond yours and theirs. 
Also, this was specific to certain versions of GFS2, and not something 
that relates to all versions.

The upstream patch is here:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/gfs2?id=059788039f1e6343f34f46d202f8d9f2158c2783

It should be available in RHEL shortly - please ping support via the 
ticket for updates,

Steve.

>> Returning to the original point however, it is certainly not recommended
>> to have mixed RHEL or CentOS versions running in the same cluster. It is
>> much better to keep everything the same, even though the GFS2 on-disk
>> format has not changed between the versions.
>
> More specfically (for those who are curious): Whilst the on-disk 
> format has not changed between EL5 and EL6, the way that RH cluster 
> members communicate with each other has.
>
> I ran a quick test some time back and the 2 different OS cluster 
> versions didn't see each other for LAN heartbeating.
>
>
>


From Micah.Schaefer at jhuapl.edu  Fri Apr 25 14:05:37 2014
From: Micah.Schaefer at jhuapl.edu (Schaefer, Micah)
Date: Fri, 25 Apr 2014 10:05:37 -0400
Subject: [Linux-cluster]  iSCSI GFS2 CMIRRORD
Message-ID: <CF7FE3F1.7382%micah.schaefer@jhuapl.edu>

Hello All,
I have been successfully running a cluster for about a year. I have a question about best practice for my storage setup.

Currently, I have 2 front end nodes and two back end nodes. The front end nodes are part of the cluster, run all the services, etc. The back end nodes are only exporting raw block devices via iSCSI and are not cluster aware. The front end import the raw block and use GFS2 with LVM for storage. At this time, I am only using the block devices from one of the back end nodes.

I would like the LVMs to be mirrored across the two iSCSI devices, creating redundancy at the block level. The last time I tried this, when creating the LVM, it basically sat for 2 days making no progress. I now have 10GB network connections at my front end and back end nodes (was 1GB only before).

Also, on topology, these 4 nodes are across 2 buildings, 1 front end and 1 back end in each building. There are switches in each building that have layer 2 connectivity (10GB) to each other. I also have 2 each 10GB connections per node, and multiple 1GB connections per node.

I have come up with the following scenarios, and am looking for advise on which of these methods to use (or none).

1:

 *   Connect all nodes to the 10GB switches.
 *   Use 1 10GB for iSCSI only and 1 for other ip traffic

2:

 *   Connect each back end node to each from end node via 10GB
 *   Use 1GB for other ip traffic

3:

 *   Connect the front end nodes to each other via 10GB
 *   Connect front end and back end nodes to 10GB switch for Ip traffic

I am also willing to use device mapper multi path if needed.

Thanks in advance for any assistance.

Regards,
-------
Micah Schaefer
JHU/ APL
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140425/41bce1ed/attachment.htm>

From emi2fast at gmail.com  Fri Apr 25 16:12:36 2014
From: emi2fast at gmail.com (emmanuel segura)
Date: Fri, 25 Apr 2014 18:12:36 +0200
Subject: [Linux-cluster] iSCSI GFS2 CMIRRORD
In-Reply-To: <CF7FE3F1.7382%micah.schaefer@jhuapl.edu>
References: <CF7FE3F1.7382%micah.schaefer@jhuapl.edu>
Message-ID: <CAE7pJ3D4Zr-LK7SVMFjdOxdEaOOG1ek8Usg4OqyGoNgWuA2G0A@mail.gmail.com>

you can use multipath when the system see a lun from more than one path,
but you your case are importing two differents devices from your backend
servers in your frontend server, sou you can use lvm mirror with cmirror in
your fronted cluster


2014-04-25 16:05 GMT+02:00 Schaefer, Micah <Micah.Schaefer at jhuapl.edu>:

> Hello All,
> I have been successfully running a cluster for about a year. I have a
> question about best practice for my storage setup.
>
> Currently, I have 2 front end nodes and two back end nodes. The front end
> nodes are part of the cluster, run all the services, etc. The back end
> nodes are only exporting raw block devices via iSCSI and are not cluster
> aware. The front end import the raw block and use GFS2 with LVM for
> storage. At this time, I am only using the block devices from one of the
> back end nodes.
>
> I would like the LVMs to be mirrored across the two iSCSI devices,
> creating redundancy at the block level. The last time I tried this, when
> creating the LVM, it basically sat for 2 days making no progress. I now
> have 10GB network connections at my front end and back end nodes (was 1GB
> only before).
>
> Also, on topology, these 4 nodes are across 2 buildings, 1 front end and 1
> back end in each building. There are switches in each building that have
> layer 2 connectivity (10GB) to each other. I also have 2 each 10GB
> connections per node, and multiple 1GB connections per node.
>
> I have come up with the following scenarios, and am looking for advise on
> which of these methods to use (or none).
>
> 1:
>
>    - Connect all nodes to the 10GB switches.
>    - Use 1 10GB for iSCSI only and 1 for other ip traffic
>
> 2:
>
>    - Connect each back end node to each from end node via 10GB
>    - Use 1GB for other ip traffic
>
> 3:
>
>    - Connect the front end nodes to each other via 10GB
>    - Connect front end and back end nodes to 10GB switch for Ip traffic
>
> I am also willing to use device mapper multi path if needed.
>
> Thanks in advance for any assistance.
>
> Regards,
> -------
> Micah Schaefer
> JHU/ APL
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140425/6fb97e20/attachment.htm>

From neale at sinenomine.net  Fri Apr 25 19:13:33 2014
From: neale at sinenomine.net (Neale Ferguson)
Date: Fri, 25 Apr 2014 19:13:33 +0000
Subject: [Linux-cluster] luci question
Message-ID: <8C5B1F22-0A70-4705-9DC2-749230BEF1D0@sinenomine.net>

Hi,
 One of the guys created a simple configuration and was attempting to use luci to administer the cluster. It comes up fine but the links "Admin ... Logout" at the top left of the window that usually appears is not appearing. Looking at the code in the header html I see the following:

<span py:if="tg.auth_stack_enabled" py:strip="True">
        <py:if test="request.identity">
          <li class="loginlogout"><a href="${tg.url('/admin')}" class="${('', 'active')[defined('page') and page==page=='admin']}">Admin</a></li>
          <li class="loginlogout"><a href="${tg.url('/prefs')}" class="${('', 'active')[defined('page') and page==page=='prefs']}">Preferences</a></li>
          <li id="login" class="loginlogout"><a href="${tg.url('/logout_handler')}">Logout</a></li>
        </py:if>
       <li py:if="not request.identity" id="login" class="loginlogout"><a href="${tg.url('/login')}">Login</a></li>
</span>

What affects (or effects) the tg.auth_stack_enabled value? I assume its some browser setting but really have no clue.

Neale


From vinh.cao at hp.com  Fri Apr 25 21:37:30 2014
From: vinh.cao at hp.com (Cao, Vinh)
Date: Fri, 25 Apr 2014 21:37:30 +0000
Subject: [Linux-cluster] luci question
In-Reply-To: <8C5B1F22-0A70-4705-9DC2-749230BEF1D0@sinenomine.net>
References: <8C5B1F22-0A70-4705-9DC2-749230BEF1D0@sinenomine.net>
Message-ID: <E277764ADBC61145B70AC4639275B49902698509@G1W3646.americas.hpqcorp.net>

What type of the browser are you using?
I have the same issue with IE. But if I use Firefox. It's there for me. 
I'm hoping that is it what you are looking for.

Vinh
-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Neale Ferguson
Sent: Friday, April 25, 2014 3:14 PM
To: linux clustering
Subject: [Linux-cluster] luci question

Hi,
 One of the guys created a simple configuration and was attempting to use
luci to administer the cluster. It comes up fine but the links "Admin ...
Logout" at the top left of the window that usually appears is not appearing.
Looking at the code in the header html I see the following:

<span py:if="tg.auth_stack_enabled" py:strip="True">
        <py:if test="request.identity">
          <li class="loginlogout"><a href="${tg.url('/admin')}"
class="${('', 'active')[defined('page') and
page==page=='admin']}">Admin</a></li>
          <li class="loginlogout"><a href="${tg.url('/prefs')}"
class="${('', 'active')[defined('page') and
page==page=='prefs']}">Preferences</a></li>
          <li id="login" class="loginlogout"><a
href="${tg.url('/logout_handler')}">Logout</a></li>
        </py:if>
       <li py:if="not request.identity" id="login" class="loginlogout"><a
href="${tg.url('/login')}">Login</a></li>
</span>

What affects (or effects) the tg.auth_stack_enabled value? I assume its some
browser setting but really have no clue.

Neale


-- 
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6168 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140425/7a319b28/attachment.p7s>

From morpheus.ibis at gmail.com  Sat Apr 26 00:06:51 2014
From: morpheus.ibis at gmail.com (Pavel Herrmann)
Date: Sat, 26 Apr 2014 02:06:51 +0200
Subject: [Linux-cluster] mixing OS versions?
In-Reply-To: <535A4A43.8000005@redhat.com>
References: <12440.1396024637@localhost> <53593BD9.4050602@ucl.ac.uk>
	<535A4A43.8000005@redhat.com>
Message-ID: <1787478.6mil9TdHqg@bloomfield>

Hi,

On Friday 25 of April 2014 12:42:59 Steven Whitehouse wrote:
> Hi,
> 
> On 24/04/14 17:29, Alan Brown wrote:
> > On 30/03/14 12:34, Steven Whitehouse wrote:
> >> Well that is not entirely true. We have done a great deal of
> >> investigation into this issue. We do test quotas (among many other
> >> things) on each release to ensure that they are working. Our tests have
> >> all passed correctly, and to date you have provided the only report of
> >> this particular issue via our support team. So it is certainly not
> >> something that lots of people are hitting.
> > 
> > Someone else reported it on this list (on centos), so we're not an
> > isolated case.
> > 
> >> We do now have a good idea of where the issue is. However it is clear
> >> that simply exceeding quotas is not enough to trigger it. Instead quotas
> >> need to be exceeded in a particular way.
> > 
> > My suspicion is that it's some kind of interaction between quotas and
> > NFS, but it'd be good if you could provide a fuller explanation.
> 
> Yes, thats what we thought to start with... however that turned out to
> be a bit of a red herring. Or at least the issue has nothing
> specifically to do with NFS. The problem was related to when quota was
> exceeded, and specifically what operation was in progress. You could
> write to files as often as you wanted to, and exceeding quota would be
> handled correctly. The problem was a specific code path within the inode
> creation code, if it didn't result in quota being exceeded on that one
> specific code path, then everything would work as expected.

could you please provide a (somewhat reliable) test case to reproduce this 
bug? I have looked at the patch, and found nothing obviously related to quotas 
(it seems the patch only changes the fail-path of posix_acl_create() call, 
which doesn't appear to have nothing to do with quotas)

I have been facing a possibly quota-related oops in GFS2 for some time, which 
I am unable to reproduce without switching my cluster to production use (which 
means potentialy facing the anger of my users, which I'd rather not do without 
at least a chance of the issue being fixed).

sadly, I don't have RedHat support subscription (nor do I use RHEL or 
derivates), my kernel is mostly upstream.

thanks
Pavel Herrmann

> 
> Also, quite often when the problem did appear, it did not actually
> trigger a problem until later, making it difficult to track down.
> 
> You are correct that someone else reported the issue on the list,
> however I'm not aware of any other reports beyond yours and theirs.
> Also, this was specific to certain versions of GFS2, and not something
> that relates to all versions.
> 
> The upstream patch is here:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/gfs
> 2?id=059788039f1e6343f34f46d202f8d9f2158c2783
> 
> It should be available in RHEL shortly - please ping support via the
> ticket for updates,
> 
> Steve.
> 
> >> Returning to the original point however, it is certainly not recommended
> >> to have mixed RHEL or CentOS versions running in the same cluster. It is
> >> much better to keep everything the same, even though the GFS2 on-disk
> >> format has not changed between the versions.
> > 
> > More specfically (for those who are curious): Whilst the on-disk
> > format has not changed between EL5 and EL6, the way that RH cluster
> > members communicate with each other has.
> > 
> > I ran a quick test some time back and the 2 different OS cluster
> > versions didn't see each other for LAN heartbeating.


From neale at sinenomine.net  Tue Apr 29 14:01:18 2014
From: neale at sinenomine.net (Neale Ferguson)
Date: Tue, 29 Apr 2014 14:01:18 +0000
Subject: [Linux-cluster] luci question
In-Reply-To: <E277764ADBC61145B70AC4639275B49902698509@G1W3646.americas.hpqcorp.net>
References: <8C5B1F22-0A70-4705-9DC2-749230BEF1D0@sinenomine.net>
	<E277764ADBC61145B70AC4639275B49902698509@G1W3646.americas.hpqcorp.net>
Message-ID: <62696CB1-2791-4F09-95D5-03AF87F096AD@sinenomine.net>

Thanks Vinh. He is using IE8 (company policy!!). I've tried it with IE8, IE10, Chrome, and Safari and all worked fine. He has cookies enabled so I'm at a loss as to how that auth_stack_enabled setting is set/updated/cleared.

Neale

On Apr 25, 2014, at 5:37 PM, Cao, Vinh <vinh.cao at hp.com> wrote:

> What type of the browser are you using?
> I have the same issue with IE. But if I use Firefox. It's there for me. 
> I'm hoping that is it what you are looking for.
> 
> Vinh
> -----Original Message-----
> Hi,
> One of the guys created a simple configuration and was attempting to use
> luci to administer the cluster. It comes up fine but the links "Admin ...
> Logout" at the top left of the window that usually appears is not appearing.
> Looking at the code in the header html I see the following:
> 
> <span py:if="tg.auth_stack_enabled" py:strip="True">
>        <py:if test="request.identity">
>          <li class="loginlogout"><a href="${tg.url('/admin')}"
> class="${('', 'active')[defined('page') and
> page==page=='admin']}">Admin</a></li>
>          <li class="loginlogout"><a href="${tg.url('/prefs')}"
> class="${('', 'active')[defined('page') and
> page==page=='prefs']}">Preferences</a></li>
>          <li id="login" class="loginlogout"><a
> href="${tg.url('/logout_handler')}">Logout</a></li>
>        </py:if>
>       <li py:if="not request.identity" id="login" class="loginlogout"><a
> href="${tg.url('/login')}">Login</a></li>
> </span>
> 
> What affects (or effects) the tg.auth_stack_enabled value? I assume its some
> browser setting but really have no clue.
> 
> Neale


From jpokorny at redhat.com  Tue Apr 29 15:18:21 2014
From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=)
Date: Tue, 29 Apr 2014 17:18:21 +0200
Subject: [Linux-cluster] luci question
In-Reply-To: <8C5B1F22-0A70-4705-9DC2-749230BEF1D0@sinenomine.net>
References: <8C5B1F22-0A70-4705-9DC2-749230BEF1D0@sinenomine.net>
Message-ID: <20140429151821.GA4367@redhat.com>

Hello Neal,

On 25/04/14 19:13 +0000, Neale Ferguson wrote:
>  One of the guys created a simple configuration and was attempting
>  to use luci to administer the cluster. It comes up fine but the
>  links "Admin ... Logout" at the top left of the window that usually
>  appears is not appearing. Looking at the code in the header html I
>  see the following:
> 
> <span py:if="tg.auth_stack_enabled" py:strip="True">
>         <py:if test="request.identity">
>           <li class="loginlogout"><a href="${tg.url('/admin')}" class="${('', 'active')[defined('page') and page==page=='admin']}">Admin</a></li>
>           <li class="loginlogout"><a href="${tg.url('/prefs')}" class="${('', 'active')[defined('page') and page==page=='prefs']}">Preferences</a></li>
>           <li id="login" class="loginlogout"><a href="${tg.url('/logout_handler')}">Logout</a></li>
>         </py:if>
>        <li py:if="not request.identity" id="login" class="loginlogout"><a href="${tg.url('/login')}">Login</a></li>
> </span>
> 
> What affects (or effects) the tg.auth_stack_enabled value? I assume
> its some browser setting but really have no clue.

could you be more specific as to which versions of luci, TurboGears
and repoze.who?  In RHEL-like distros, the latter map to TurboGears2
and python-repoze-who packages.

I don't recall any issue like what you described.

-- 
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140429/806510aa/attachment.sig>

From neale at sinenomine.net  Tue Apr 29 16:02:17 2014
From: neale at sinenomine.net (Neale Ferguson)
Date: Tue, 29 Apr 2014 16:02:17 +0000
Subject: [Linux-cluster] luci question
In-Reply-To: <20140429151821.GA4367@redhat.com>
References: <8C5B1F22-0A70-4705-9DC2-749230BEF1D0@sinenomine.net>
	<20140429151821.GA4367@redhat.com>
Message-ID: <C7F166CB-FB92-43D8-BDF6-A5C9ACCBBB5E@sinenomine.net>

luci-0.26.0-48 (tried -13 as well)
TurboGears2-2.0.3-4.
kernel-2.6.32-358.2.1
python-repoze-who-1.0.18-1 (I believe - am verifying)

On Apr 29, 2014, at 11:18 AM, Jan Pokorn? <jpokorny at redhat.com> wrote:

> could you be more specific as to which versions of luci, TurboGears
> and repoze.who?  In RHEL-like distros, the latter map to TurboGears2
> and python-repoze-who packages.
> 
> I don't recall any issue like what you described.
> 
> -- 
> Jan
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140429/1e6a2e28/attachment.sig>

From jpokorny at redhat.com  Tue Apr 29 17:17:40 2014
From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=)
Date: Tue, 29 Apr 2014 19:17:40 +0200
Subject: [Linux-cluster] luci question
In-Reply-To: <C7F166CB-FB92-43D8-BDF6-A5C9ACCBBB5E@sinenomine.net>
References: <8C5B1F22-0A70-4705-9DC2-749230BEF1D0@sinenomine.net>
	<20140429151821.GA4367@redhat.com>
	<C7F166CB-FB92-43D8-BDF6-A5C9ACCBBB5E@sinenomine.net>
Message-ID: <20140429171740.GB4367@redhat.com>

On 29/04/14 16:02 +0000, Neale Ferguson wrote:
> luci-0.26.0-48 (tried -13 as well)
> TurboGears2-2.0.3-4.
> kernel-2.6.32-358.2.1
> python-repoze-who-1.0.18-1 (I believe - am verifying)

Thanks, this looks sane.

Actually there used to be an issue with Genshi generating strict
XML by default, notably baffling IE, but it should be sufficiently
solved for a long time (https://bugzilla.redhat.com/663103),
definitely in the version you used.

Just to be sure could you provide also your python-genshi version?


Now I am thinking about another thing:

> One of the guys created a simple configuration

seems to be pretty generic expression, and makes me confused.  Does it
mean mere luci installation/deployment (along the lines of what
specfile does, or better yet, directly from package), or (also) some
configuration files tweaking (having especially
/var/lib/luci/etc/luci.ini in mind, modifying file analogous to
/etc/sysconfig/luci should be fine)?

Because if the latter, chances are that admittedly a bit fragile
start up process involving hierarchical configuration (via two stated
files) and intentional run-time substitutions of middleware
initialization routines (cf. luci.initwrappers) could suffer from
that.  On the other hand, if logging in works as expected, even when
reproduced with cookies previously cleared, the issue remains
a mystery. [If it meant a _cluster_ configuration in luci, I don't
think this has any relevance to the issue.]


Also to this point:

> the links "Admin ... Logout" at the top left of the window that
> usually appears is not appearing.

not even "Login" is shown at that very position, right?


Further debugging pointers:

- inspect source code of the generated page, best when static original
  is preserved (some code inspectors tend to rather work with live,
  dynamically modified DOM serialization), wget/curl output when
  pretending being logged in via cookies might also help

- last and promise-less attempt: try enabling verbose logging in
  /var/lib/luci/etc/luci.ini or equivalent (substitute to fit):

# sed -i.old ':0;/\[logger_root\]/b1;p;d;:1;n;s|\(level[ \t]*=[ \t]*\).*|\1DEBUG|;t0;b1' \
  /var/lib/luci/etc/luci.ini

  followed by luci restart and accessing the page in question;
  there may be something suspicious in the log (usually
  /var/log/luci/luci.log) but expectedly amongst plenty
  of other/worthless messages :-/

> On Apr 29, 2014, at 11:18 AM, Jan Pokorn? <jpokorny at redhat.com> wrote:
> 
>> could you be more specific as to which versions of luci, TurboGears
>> and repoze.who?  In RHEL-like distros, the latter map to TurboGears2
>> and python-repoze-who packages.
>> 
>> I don't recall any issue like what you described.

-- 
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140429/7ed1c2e1/attachment.sig>

From neale at sinenomine.net  Tue Apr 29 17:32:58 2014
From: neale at sinenomine.net (Neale Ferguson)
Date: Tue, 29 Apr 2014 17:32:58 +0000
Subject: [Linux-cluster] luci question
In-Reply-To: <20140429171740.GB4367@redhat.com>
References: <8C5B1F22-0A70-4705-9DC2-749230BEF1D0@sinenomine.net>
	<20140429151821.GA4367@redhat.com>
	<C7F166CB-FB92-43D8-BDF6-A5C9ACCBBB5E@sinenomine.net>
	<20140429171740.GB4367@redhat.com>
Message-ID: <7A8D29DD-99C9-4A56-9193-8CE045EF5289@sinenomine.net>

He installed luci and then pointed his browser at the host:8084. He gets the login panel, logs in as root, gets the homebase screen but it doesn't have those links at the top right. No changes to any of the files that luci installs.

It was a clean RHEL install so I'm guessing that genshi is up to date but I'll ask.

Neale

On Apr 29, 2014, at 1:17 PM, Jan Pokorn? <jpokorny at redhat.com> wrote:

> On 29/04/14 16:02 +0000, Neale Ferguson wrote:
>> luci-0.26.0-48 (tried -13 as well)
>> TurboGears2-2.0.3-4.
>> kernel-2.6.32-358.2.1
>> python-repoze-who-1.0.18-1 (I believe - am verifying)
> 
> Thanks, this looks sane.
> 
> Actually there used to be an issue with Genshi generating strict
> XML by default, notably baffling IE, but it should be sufficiently
> solved for a long time (https://bugzilla.redhat.com/663103),
> definitely in the version you used.
> 
> Just to be sure could you provide also your python-genshi version?
> 
> 
> Now I am thinking about another thing:
> 
>> One of the guys created a simple configuration
> 
> seems to be pretty generic expression, and makes me confused.  Does it
> mean mere luci installation/deployment (along the lines of what
> specfile does, or better yet, directly from package), or (also) some
> configuration files tweaking (having especially
> /var/lib/luci/etc/luci.ini in mind, modifying file analogous to
> /etc/sysconfig/luci should be fine)?
> 
> Because if the latter, chances are that admittedly a bit fragile
> start up process involving hierarchical configuration (via two stated
> files) and intentional run-time substitutions of middleware
> initialization routines (cf. luci.initwrappers) could suffer from
> that.  On the other hand, if logging in works as expected, even when
> reproduced with cookies previously cleared, the issue remains
> a mystery. [If it meant a _cluster_ configuration in luci, I don't
> think this has any relevance to the issue.]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140429/3287c6a8/attachment.sig>

From neale at sinenomine.net  Tue Apr 29 17:56:49 2014
From: neale at sinenomine.net (Neale Ferguson)
Date: Tue, 29 Apr 2014 17:56:49 +0000
Subject: [Linux-cluster] luci question
In-Reply-To: <20140429171740.GB4367@redhat.com>
References: <8C5B1F22-0A70-4705-9DC2-749230BEF1D0@sinenomine.net>
	<20140429151821.GA4367@redhat.com>
	<C7F166CB-FB92-43D8-BDF6-A5C9ACCBBB5E@sinenomine.net>
	<20140429171740.GB4367@redhat.com>
Message-ID: <EEE3033E-9AEC-40A3-BECE-6ED2D104B993@sinenomine.net>

Name        : python-genshi
Arch        : s390x
Version     : 0.5.1
Release     : 7.1.el6

On Apr 29, 2014, at 1:17 PM, Jan Pokorn? <jpokorny at redhat.com> wrote:
> 
> Just to be sure could you provide also your python-genshi version?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140429/149bec28/attachment.sig>

From jpokorny at redhat.com  Tue Apr 29 18:27:48 2014
From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=)
Date: Tue, 29 Apr 2014 20:27:48 +0200
Subject: [Linux-cluster] luci question
In-Reply-To: <EEE3033E-9AEC-40A3-BECE-6ED2D104B993@sinenomine.net>
References: <8C5B1F22-0A70-4705-9DC2-749230BEF1D0@sinenomine.net>
	<20140429151821.GA4367@redhat.com>
	<C7F166CB-FB92-43D8-BDF6-A5C9ACCBBB5E@sinenomine.net>
	<20140429171740.GB4367@redhat.com>
	<EEE3033E-9AEC-40A3-BECE-6ED2D104B993@sinenomine.net>
Message-ID: <20140429182748.GA9564@redhat.com>

On 29/04/14 17:56 +0000, Neale Ferguson wrote:
> Name        : python-genshi
> Arch        : s390x
> Version     : 0.5.1
> Release     : 7.1.el6

Thanks again, but I have to admit I am short of ideas.

Please see my other post wrt. next possible pointers, notably
inspecting a page dump (e.g., via save page as) because there could be
also some weird styling issue despite the demanded content reached
the browser.  Probably last item to check that I recalled is checking
that no interfering EPEL or other non-RH package is installed, perhaps
by running:

rpm -q --qf "%{NEVRA}; %{VENDOR}\n" -- luci TurboGears2 pyOpenSSL    \
 python python-babel python-beaker python-cheetah python-decorator   \
 python-decoratortools python-formencode python-genshi python-mako   \
 python-markdown python-markupsafe python-myghty python-nose         \
 python-paste python-paste-deploy python-paste-script                \
 python-peak-rules python-peak-util-addons python-peak-util-assembler\
 python-peak-util-extremes python-peak-util-symbols                  \
 python-prioritized-methods python-pygments python-pylons            \
 python-repoze-tm2 python-repoze-what python-repoze-what-pylons      \
 python-repoze-who python-repoze-who-friendlyform                    \
 python-repoze-who-testutil python-routes python-setuptools          \
 python-simplejson python-sqlalchemy python-tempita                  \
 python-toscawidgets python-transaction python-turbojson             \
 python-tw-forms python-weberror python-webflash python-webhelpers   \
 python-webob python-webtest python-zope-filesystem                  \
 python-zope-interface python-zope-sqlalchemy | grep -v 'Red Hat'

("package python-tw-forms is not installed" in the output is OK, it's
just a legacy thing)

Sadly, having no direct access to IE8, cannot track this further on my
own.

-- 
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140429/6e442460/attachment.sig>

From neale at sinenomine.net  Tue Apr 29 18:38:38 2014
From: neale at sinenomine.net (Neale Ferguson)
Date: Tue, 29 Apr 2014 18:38:38 +0000
Subject: [Linux-cluster] luci question
In-Reply-To: <20140429182748.GA9564@redhat.com>
References: <8C5B1F22-0A70-4705-9DC2-749230BEF1D0@sinenomine.net>
	<20140429151821.GA4367@redhat.com>
	<C7F166CB-FB92-43D8-BDF6-A5C9ACCBBB5E@sinenomine.net>
	<20140429171740.GB4367@redhat.com>
	<EEE3033E-9AEC-40A3-BECE-6ED2D104B993@sinenomine.net>
	<20140429182748.GA9564@redhat.com>
Message-ID: <832CABBC-F493-4032-AA3B-50D9A3B80BFB@sinenomine.net>

Thanks for the suggestions Jan. Your help is appreciated.

Neale

On Apr 29, 2014, at 2:27 PM, Jan Pokorn? <jpokorny at redhat.com> wrote:

> Sadly, having no direct access to IE8, cannot track this further on my
> own.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140429/bfda7726/attachment.sig>