From laszlo.budai at gmail.com  Mon Aug  1 10:20:12 2011
From: laszlo.budai at gmail.com (Budai Laszlo)
Date: Mon, 01 Aug 2011 13:20:12 +0300
Subject: [Linux-cluster] service startup order
In-Reply-To: <A789DDB53ED7E94396E842EE2AC9B5FF014329C3@itdzex101.ITDZ.verwalt-berlin.de>
References: <4E30AB1A.1080102@gmail.com> <4E30B25E.5010302@alteeve.com>
	<A789DDB53ED7E94396E842EE2AC9B5FF014329C3@itdzex101.ITDZ.verwalt-berlin.de>
Message-ID: <4E367DDC.9050608@gmail.com>

Hi,

I got this answer today from Fabio M. Di Nitto:

On 7/29/2011 1:25 AM, Budai Laszlo wrote:

> > Hello,
> > 
> > Please excuse my direct mail, but I've received no conclusive answer on
> > the linux-cluster mailing list, and I've seen your email on the Cluster
> > Wiki home page.
> > Please tell me what is the startup order for services in Red Hat Cluster
> > 4.5 (rgmanager-1.9.68-1)?
> > Are the services started in parallel, or are started one by one in the
> > order as these appears in cluster.conf.
> > I'm not talking about resources, or resource trees.
They are started in sequence as they appear in cluster.conf. I am not
extremely familiar with EL4 but the behavior should be pretty much the
same as other rgmanager releases.

Fabio


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110801/1781baee/attachment.htm>

From david at adurotec.com  Tue Aug  2 01:50:15 2011
From: david at adurotec.com (David)
Date: Mon, 01 Aug 2011 20:50:15 -0500
Subject: [Linux-cluster] Corosync fails to start using cman
Message-ID: <4E3757D7.2080607@adurotec.com>

I have the RHCS installed on CentOS6 x86_64.

One of the nodes in a 3 node cluster won't start after I moved the nodes 
to a new vlan.

When I start cman this is what I get:

Starting cluster:
    Checking Network Manager...                             [  OK  ]
    Global setup...                                         [  OK  ]
    Loading kernel modules...                               [  OK  ]
    Mounting configfs...                                    [  OK  ]
    Starting cman... Aug 02 01:45:17 corosync [MAIN  ] Corosync Cluster 
Engine ('1.2.3'): started and ready to provide service.
Aug 02 01:45:17 corosync [MAIN  ] Corosync built-in features: nss rdma
Aug 02 01:45:17 corosync [MAIN  ] Successfully read config from 
/etc/cluster/cluster.conf
Aug 02 01:45:17 corosync [MAIN  ] Successfully parsed cman config
Aug 02 01:45:17 corosync [TOTEM ] Token Timeout (10000 ms) retransmit 
timeout (2380 ms)
Aug 02 01:45:17 corosync [TOTEM ] token hold (1894 ms) retransmits 
before loss (4 retrans)
Aug 02 01:45:17 corosync [TOTEM ] join (60 ms) send_join (0 ms) 
consensus (12000 ms) merge (200 ms)
Aug 02 01:45:17 corosync [TOTEM ] downcheck (1000 ms) fail to recv const 
(2500 msgs)
Aug 02 01:45:17 corosync [TOTEM ] seqno unchanged const (30 rotations) 
Maximum network MTU 1402
Aug 02 01:45:17 corosync [TOTEM ] window size per rotation (50 messages) 
maximum messages per rotation (17 messages)
Aug 02 01:45:17 corosync [TOTEM ] missed count const (5 messages)
Aug 02 01:45:17 corosync [TOTEM ] send threads (0 threads)
Aug 02 01:45:17 corosync [TOTEM ] RRP token expired timeout (2380 ms)
Aug 02 01:45:17 corosync [TOTEM ] RRP token problem counter (2000 ms)
Aug 02 01:45:17 corosync [TOTEM ] RRP threshold (10 problem count)
Aug 02 01:45:17 corosync [TOTEM ] RRP mode set to none.
Aug 02 01:45:17 corosync [TOTEM ] heartbeat_failures_allowed (0)
Aug 02 01:45:17 corosync [TOTEM ] max_network_delay (50 ms)
Aug 02 01:45:17 corosync [TOTEM ] HeartBeat is Disabled. To enable set 
heartbeat_failures_allowed > 0
Aug 02 01:45:17 corosync [TOTEM ] Initializing transport (UDP/IP).
Aug 02 01:45:17 corosync [TOTEM ] Initializing transmit/receive 
security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Aug 02 01:45:17 corosync [IPC   ] you are using ipc api v2
Aug 02 01:45:18 corosync [TOTEM ] Receive multicast socket recv buffer 
size (262142 bytes).
Aug 02 01:45:18 corosync [TOTEM ] Transmit multicast socket send buffer 
size (262142 bytes).
corosync: totemsrp.c:3091: memb_ring_id_create_or_load: Assertion `res 
== sizeof (unsigned long long)' failed.
Aug 02 01:45:18 corosync [TOTEM ] The network interface [10.50.3.70] is 
now up.
corosync died with signal: 6 Check cluster logs for details


Any idea what the issue could be?

Thanks
David


From linux at alteeve.com  Tue Aug  2 01:56:47 2011
From: linux at alteeve.com (Digimer)
Date: Mon, 01 Aug 2011 21:56:47 -0400
Subject: [Linux-cluster] Corosync fails to start using cman
In-Reply-To: <4E3757D7.2080607@adurotec.com>
References: <4E3757D7.2080607@adurotec.com>
Message-ID: <4E37595F.5010003@alteeve.com>

On 08/01/2011 09:50 PM, David wrote:
> I have the RHCS installed on CentOS6 x86_64.
> 
> One of the nodes in a 3 node cluster won't start after I moved the nodes
> to a new vlan.
> 
> When I start cman this is what I get:
> 
> Starting cluster:
>    Checking Network Manager...                             [  OK  ]
>    Global setup...                                         [  OK  ]
>    Loading kernel modules...                               [  OK  ]
>    Mounting configfs...                                    [  OK  ]
>    Starting cman... Aug 02 01:45:17 corosync [MAIN  ] Corosync Cluster
> Engine ('1.2.3'): started and ready to provide service.
> Aug 02 01:45:17 corosync [MAIN  ] Corosync built-in features: nss rdma
> Aug 02 01:45:17 corosync [MAIN  ] Successfully read config from
> /etc/cluster/cluster.conf
> Aug 02 01:45:17 corosync [MAIN  ] Successfully parsed cman config
> Aug 02 01:45:17 corosync [TOTEM ] Token Timeout (10000 ms) retransmit
> timeout (2380 ms)
> Aug 02 01:45:17 corosync [TOTEM ] token hold (1894 ms) retransmits
> before loss (4 retrans)
> Aug 02 01:45:17 corosync [TOTEM ] join (60 ms) send_join (0 ms)
> consensus (12000 ms) merge (200 ms)
> Aug 02 01:45:17 corosync [TOTEM ] downcheck (1000 ms) fail to recv const
> (2500 msgs)
> Aug 02 01:45:17 corosync [TOTEM ] seqno unchanged const (30 rotations)
> Maximum network MTU 1402
> Aug 02 01:45:17 corosync [TOTEM ] window size per rotation (50 messages)
> maximum messages per rotation (17 messages)
> Aug 02 01:45:17 corosync [TOTEM ] missed count const (5 messages)
> Aug 02 01:45:17 corosync [TOTEM ] send threads (0 threads)
> Aug 02 01:45:17 corosync [TOTEM ] RRP token expired timeout (2380 ms)
> Aug 02 01:45:17 corosync [TOTEM ] RRP token problem counter (2000 ms)
> Aug 02 01:45:17 corosync [TOTEM ] RRP threshold (10 problem count)
> Aug 02 01:45:17 corosync [TOTEM ] RRP mode set to none.
> Aug 02 01:45:17 corosync [TOTEM ] heartbeat_failures_allowed (0)
> Aug 02 01:45:17 corosync [TOTEM ] max_network_delay (50 ms)
> Aug 02 01:45:17 corosync [TOTEM ] HeartBeat is Disabled. To enable set
> heartbeat_failures_allowed > 0
> Aug 02 01:45:17 corosync [TOTEM ] Initializing transport (UDP/IP).
> Aug 02 01:45:17 corosync [TOTEM ] Initializing transmit/receive
> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Aug 02 01:45:17 corosync [IPC   ] you are using ipc api v2
> Aug 02 01:45:18 corosync [TOTEM ] Receive multicast socket recv buffer
> size (262142 bytes).
> Aug 02 01:45:18 corosync [TOTEM ] Transmit multicast socket send buffer
> size (262142 bytes).
> corosync: totemsrp.c:3091: memb_ring_id_create_or_load: Assertion `res
> == sizeof (unsigned long long)' failed.
> Aug 02 01:45:18 corosync [TOTEM ] The network interface [10.50.3.70] is
> now up.
> corosync died with signal: 6 Check cluster logs for details
> 
> 
> Any idea what the issue could be?
> 
> Thanks
> David

What is your cluster.conf file (please obscure passwords only), what
does `uname -n` return and what is your network configuration (interface
names and IPs)?

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"


From linux at alteeve.com  Tue Aug  2 05:34:20 2011
From: linux at alteeve.com (Digimer)
Date: Tue, 02 Aug 2011 01:34:20 -0400
Subject: [Linux-cluster] Problem with rhel6 + rgmanager + ip service
Message-ID: <4E378C5C.8010505@alteeve.com>

I'm trying to setup a trivially simple cluster using RHEL 6.1
(cman+rgmanager). I've got three interfaces, and I want a managed IP on
192.168.2.100/24 (which should match eth1). At this point though, I'd be
happy to get any IP working on any interface.

Here's my config:

<?xml version="1.0"?>
<cluster config_version="15" name="an-clusterA">
	<cman expected_votes="1" two_node="1"/>
	<totem rrp_mode="none" secauth="off"/>
	<clusternodes>
		<clusternode name="an-node01.alteeve.com" nodeid="1">
			<fence>
				<method name="apc_pdu">
					<device action="reboot" name="pdu2" port="1"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="an-node02.alteeve.com" nodeid="2">
			<fence>
				<method name="apc_pdu">
					<device action="reboot" name="pdu2" port="2"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<fencedevices>
		<fencedevice agent="fence_apc" ipaddr="192.168.1.6" login="apc"
name="pdu2" passwd="secret"/>
	</fencedevices>
	<fence_daemon post_join_delay="30"/>
	<rm>
		<resources>
			<ip address="192.168.4.100" monitor_link="on"/>
		</resources>
		<failoverdomains>
			<failoverdomain name="an1_primary" nofailback="1" ordered="1"
restricted="0">
				<failoverdomainnode name="an-node01.alteeve.com" priority="1"/>
				<failoverdomainnode name="an-node02.alteeve.com" priority="2"/>
			</failoverdomain>
		</failoverdomains>
		<service autostart="0" domain="an1_primary" name="san_ip"
recovery="relocate">
			<script ref="192.168.2.100"/>
		</service>
	</rm>
</cluster>

Running clustat says the IP should be up:

Cluster Status for an-clusterA @ Tue Aug  2 01:32:41 2011
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 an-node01.alteeve.com                       1 Online, Local, rgmanager
 an-node02.alteeve.com                       2 Online, rgmanager

 Service Name                   Owner (Last)                   State

 ------- ----                   ----- ------                   -----

 service:san_ip                 an-node01.alteeve.com          started


I can't ping that IP from either node, nor does the IP appear in either
node's 'ipconfig -a'.

Any idea what I might be doing wrong?

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"


From Ralph.Grothe at itdz-berlin.de  Tue Aug  2 06:53:29 2011
From: Ralph.Grothe at itdz-berlin.de (Ralph.Grothe at itdz-berlin.de)
Date: Tue, 2 Aug 2011 08:53:29 +0200
Subject: [Linux-cluster] Problem with rhel6 + rgmanager + ip service
In-Reply-To: <4E378C5C.8010505@alteeve.com>
References: <4E378C5C.8010505@alteeve.com>
Message-ID: <A789DDB53ED7E94396E842EE2AC9B5FF014329DB@itdzex101.ITDZ.verwalt-berlin.de>

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
> Sent: Tuesday, August 02, 2011 7:34 AM
> To: linux clustering
> Subject: [Linux-cluster] Problem with rhel6 + rgmanager + ip
service
> 
> I'm trying to setup a trivially simple cluster using RHEL 6.1
> (cman+rgmanager). I've got three interfaces, and I want a 
> managed IP on
> 192.168.2.100/24 (which should match eth1). At this point 
> though, I'd be
> happy to get any IP working on any interface.


Hi Kelly,

since I haven't even run a RHEL 6.1 host yet, let alone RHCS
under it, I probably better should keep my mouth shut (or my
fingers at rest, more appropriately).

But anyway, have you tried if the RA binds the IP correctly by
simply executing the script manually like (I know there exists
rg_test but this circumvents any clurgmgrd interference)?


# OCF_RESKEY_address="192.168.2.100" /usr/share/cluster/ip.sh
start

and then

# OCF_RESKEY_address="192.168.2.100" /usr/share/cluster/ip.sh
status


Rgds
Ralph


From david at adurotec.com  Tue Aug  2 15:35:29 2011
From: david at adurotec.com (David)
Date: Tue, 02 Aug 2011 10:35:29 -0500
Subject: [Linux-cluster] Corosync fails to start using cman
In-Reply-To: <4E37595F.5010003@alteeve.com>
References: <4E3757D7.2080607@adurotec.com> <4E37595F.5010003@alteeve.com>
Message-ID: <4E381941.9010007@adurotec.com>

Here is my cluster.conf:

<?xml version="1.0"?>
<cluster config_version="33" name="GFSpfsCluster">
<logging debug="on"/>
<clusternodes>
<clusternode name="pfs03.ns.gfs2.us" nodeid="1" votes="1">
<fence>
<method name="single">
<device name="pfs03.ns.us.ctidata.net_vmware"/>
</method>
</fence>
</clusternode>
<clusternode name="pfs04.ns.gfs2.us" nodeid="2" votes="1">
<fence>
<method name="single">
<device name="pfs04.ns.us.ctidata.net_vmware"/>
</method>
</fence>
</clusternode>
<clusternode name="pfs05.ns.gfs2.us" nodeid="3" votes="1">
<fence>
<method name="single">
<device name="pfs05.ns.us.ctidata.net_vmware"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_vmware" ipaddr="10.50.6.20" 
login="administrator" name="pfs03.ns.us.ctidata.net_vmware" 
passwd="secret" port="pfs03.ns.us.ctidata.net"/>
<fencedevice agent="fence_vmware" ipaddr="10.50.6.20" 
login="administrator" name="pfs04.ns.us.ctidata.net_vmware" 
passwd="secret" port="pfs04.ns.us.ctidata.net"/>
<fencedevice agent="fence_vmware" ipaddr="10.50.6.20" 
login="administrator" name="pfs05.ns.us.ctidata.net_vmware" 
passwd="secret" port="pfs05.ns.us.ctidata.net"/>
</fencedevices>
<rm>
<resources>
<script file="/etc/init.d/httpd" name="httpd"/>
</resources>
<failoverdomains>
<failoverdomain name="pfs03_only" nofailback="0" ordered="0" restricted="1">
<failoverdomainnode name="pfs03.ns.gfs2.us" priority="1"/>
</failoverdomain>
<failoverdomain name="pfs04_only" nofailback="0" ordered="0" restricted="1">
<failoverdomainnode name="pfs04.ns.gfs2.us" priority="1"/>
</failoverdomain>
<failoverdomain name="pfs05_only" nofailback="0" ordered="0" restricted="1">
<failoverdomainnode name="pfs05.ns.gfs2.us" priority="1"/>
</failoverdomain>
</failoverdomains>
<service autostart="1" domain="pfs03_only" exclusive="0" 
name="pfs03_apache" recovery="restart">
<script ref="httpd"/>
</service>
<service autostart="1" domain="pfs04_only" exclusive="0" 
name="pfs04_apache" recovery="restart">
<script ref="httpd"/>
</service>
<service autostart="1" domain="pfs05_only" exclusive="0" 
name="pfs05_apache" recovery="restart">
</service>
</rm>
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<cman/>
</cluster>

uname -n = pfs05.ns.us.ctidata.net

As I am sure you will notice the cluster.conf has the node set to 
pfs05.ns.gfs2.us while the hostname is set to pfs05.ns.us.ctidata.net.  
This was working prior, is working on the other 2 nodes and is 
configured this way so that the cluster uses a private vlan specifically 
setup for cluster communications.

The network is setup as follows:

eth0 = 10.50.10.32/24 this is the production traffic interface
eth1 = 10.50.20.32/24 this is the interface used for iSCSI connections 
to our SAN
eth2 = 10.50.6.32/24 this is the interface setup for FreeIPA 
authenticated ssh access in from our mgmt vlan.
eth3 = 10.50.1.32/24 this is a legacy interface used during the 
transition from the old env to this new env
eth4 = 10.50.3.70/27 this is the interface pfs05.ns.gfs2.us resolves to 
used for cluster communications.

David


On 08/01/2011 08:56 PM, Digimer wrote:
> On 08/01/2011 09:50 PM, David wrote:
>> I have the RHCS installed on CentOS6 x86_64.
>>
>> One of the nodes in a 3 node cluster won't start after I moved the nodes
>> to a new vlan.
>>
>> When I start cman this is what I get:
>>
>> Starting cluster:
>>     Checking Network Manager...                             [  OK  ]
>>     Global setup...                                         [  OK  ]
>>     Loading kernel modules...                               [  OK  ]
>>     Mounting configfs...                                    [  OK  ]
>>     Starting cman... Aug 02 01:45:17 corosync [MAIN  ] Corosync Cluster
>> Engine ('1.2.3'): started and ready to provide service.
>> Aug 02 01:45:17 corosync [MAIN  ] Corosync built-in features: nss rdma
>> Aug 02 01:45:17 corosync [MAIN  ] Successfully read config from
>> /etc/cluster/cluster.conf
>> Aug 02 01:45:17 corosync [MAIN  ] Successfully parsed cman config
>> Aug 02 01:45:17 corosync [TOTEM ] Token Timeout (10000 ms) retransmit
>> timeout (2380 ms)
>> Aug 02 01:45:17 corosync [TOTEM ] token hold (1894 ms) retransmits
>> before loss (4 retrans)
>> Aug 02 01:45:17 corosync [TOTEM ] join (60 ms) send_join (0 ms)
>> consensus (12000 ms) merge (200 ms)
>> Aug 02 01:45:17 corosync [TOTEM ] downcheck (1000 ms) fail to recv const
>> (2500 msgs)
>> Aug 02 01:45:17 corosync [TOTEM ] seqno unchanged const (30 rotations)
>> Maximum network MTU 1402
>> Aug 02 01:45:17 corosync [TOTEM ] window size per rotation (50 messages)
>> maximum messages per rotation (17 messages)
>> Aug 02 01:45:17 corosync [TOTEM ] missed count const (5 messages)
>> Aug 02 01:45:17 corosync [TOTEM ] send threads (0 threads)
>> Aug 02 01:45:17 corosync [TOTEM ] RRP token expired timeout (2380 ms)
>> Aug 02 01:45:17 corosync [TOTEM ] RRP token problem counter (2000 ms)
>> Aug 02 01:45:17 corosync [TOTEM ] RRP threshold (10 problem count)
>> Aug 02 01:45:17 corosync [TOTEM ] RRP mode set to none.
>> Aug 02 01:45:17 corosync [TOTEM ] heartbeat_failures_allowed (0)
>> Aug 02 01:45:17 corosync [TOTEM ] max_network_delay (50 ms)
>> Aug 02 01:45:17 corosync [TOTEM ] HeartBeat is Disabled. To enable set
>> heartbeat_failures_allowed>  0
>> Aug 02 01:45:17 corosync [TOTEM ] Initializing transport (UDP/IP).
>> Aug 02 01:45:17 corosync [TOTEM ] Initializing transmit/receive
>> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>> Aug 02 01:45:17 corosync [IPC   ] you are using ipc api v2
>> Aug 02 01:45:18 corosync [TOTEM ] Receive multicast socket recv buffer
>> size (262142 bytes).
>> Aug 02 01:45:18 corosync [TOTEM ] Transmit multicast socket send buffer
>> size (262142 bytes).
>> corosync: totemsrp.c:3091: memb_ring_id_create_or_load: Assertion `res
>> == sizeof (unsigned long long)' failed.
>> Aug 02 01:45:18 corosync [TOTEM ] The network interface [10.50.3.70] is
>> now up.
>> corosync died with signal: 6 Check cluster logs for details
>>
>>
>> Any idea what the issue could be?
>>
>> Thanks
>> David
> What is your cluster.conf file (please obscure passwords only), what
> does `uname -n` return and what is your network configuration (interface
> names and IPs)?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110802/5ad251aa/attachment.htm>

From linux at alteeve.com  Tue Aug  2 18:17:18 2011
From: linux at alteeve.com (Digimer)
Date: Tue, 02 Aug 2011 14:17:18 -0400
Subject: [Linux-cluster] Problem with rhel6 + rgmanager + ip service
In-Reply-To: <A789DDB53ED7E94396E842EE2AC9B5FF014329DB@itdzex101.ITDZ.verwalt-berlin.de>
References: <4E378C5C.8010505@alteeve.com>
	<A789DDB53ED7E94396E842EE2AC9B5FF014329DB@itdzex101.ITDZ.verwalt-berlin.de>
Message-ID: <4E383F2E.7060003@alteeve.com>

On 08/02/2011 02:53 AM, Ralph.Grothe at itdz-berlin.de wrote:
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com 
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>> Sent: Tuesday, August 02, 2011 7:34 AM
>> To: linux clustering
>> Subject: [Linux-cluster] Problem with rhel6 + rgmanager + ip
> service
>>
>> I'm trying to setup a trivially simple cluster using RHEL 6.1
>> (cman+rgmanager). I've got three interfaces, and I want a 
>> managed IP on
>> 192.168.2.100/24 (which should match eth1). At this point 
>> though, I'd be
>> happy to get any IP working on any interface.
> 
> 
> Hi Kelly,
> 
> since I haven't even run a RHEL 6.1 host yet, let alone RHCS
> under it, I probably better should keep my mouth shut (or my
> fingers at rest, more appropriately).
> 
> But anyway, have you tried if the RA binds the IP correctly by
> simply executing the script manually like (I know there exists
> rg_test but this circumvents any clurgmgrd interference)?
> 
> 
> # OCF_RESKEY_address="192.168.2.100" /usr/share/cluster/ip.sh
> start
> 
> and then
> 
> # OCF_RESKEY_address="192.168.2.100" /usr/share/cluster/ip.sh
> status
> 
> 
> 
> Rgds
> Ralph
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Nah, I was being silly. I needed:

<ip ref="192.168.2.100"/>

but had:

<script ref="192.168.2.100"/>

Works fine now.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"


From jayesh.shinde at netcore.co.in  Wed Aug  3 11:28:43 2011
From: jayesh.shinde at netcore.co.in (jayesh.shinde)
Date: Wed, 03 Aug 2011 16:58:43 +0530
Subject: [Linux-cluster] clusvcadm Relocate <group> causing  reboot.
Message-ID: <4E3930EB.80209@netcore.co.in>

Hi All ,

I have query about the "clusvcadm" and "post_join_delay"
I am using the high traffic Mail Server  with  "2 node active-active 
DRBD  + RHCS."  ( i.e node1 and node2)

 From node1 server when I try to relocate the group to other server , 
the node1 get  fence and get reboot and all services get shift to node2
For relocating the service I am using following command  :--

clusvcadm -r imap2 -m node2

I am not able to figure out why the node1 get fence and reboot while 
relocating.  Some time the group relocation from one node to other 
happen smoothly. I have notice that without cluster when I run 
"/etc/init.d/cyrus-imapd stop" it takes long time. ( say 3 min). Its 
Because  cyrus-imapd do the import export inside the local db files.

in the cluster.conf I have below line
<fence_daemon post_fail_delay="0" post_join_delay="3"/>

If I change the post_join_delay="180" then will this help to solve  my 
problem ?

Is there any way to avoid rebooting while relocating group to another 
node , please suggest.

Regards
Jayesh Shinde

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110803/98dafa1b/attachment.htm>

From kkovachev at varna.net  Wed Aug  3 11:47:37 2011
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Wed, 03 Aug 2011 14:47:37 +0300
Subject: [Linux-cluster] clusvcadm Relocate <group> causing  reboot.
In-Reply-To: <4E3930EB.80209@netcore.co.in>
References: <4E3930EB.80209@netcore.co.in>
Message-ID: <22b086b2ba88d93f23c47c3a5ddfe8d6@mx.varna.net>

Hi,
 post_join_delay is the wrong parameter to change. You will need to change
the 'shutdown_wait' or 'stop' timeout for the resource

On Wed, 03 Aug 2011 16:58:43 +0530, "jayesh.shinde"
<jayesh.shinde at netcore.co.in> wrote:
> Hi All ,
> 
> I have query about the "clusvcadm" and "post_join_delay"
> I am using the high traffic Mail Server  with  "2 node active-active 
> DRBD  + RHCS."  ( i.e node1 and node2)
> 
>  From node1 server when I try to relocate the group to other server , 
> the node1 get  fence and get reboot and all services get shift to node2
> For relocating the service I am using following command  :--
> 
> clusvcadm -r imap2 -m node2
> 
> I am not able to figure out why the node1 get fence and reboot while 
> relocating.  Some time the group relocation from one node to other 
> happen smoothly. I have notice that without cluster when I run 
> "/etc/init.d/cyrus-imapd stop" it takes long time. ( say 3 min). Its 
> Because  cyrus-imapd do the import export inside the local db files.
> 
> in the cluster.conf I have below line
> <fence_daemon post_fail_delay="0" post_join_delay="3"/>
> 
> If I change the post_join_delay="180" then will this help to solve  my 
> problem ?
> 
> Is there any way to avoid rebooting while relocating group to another 
> node , please suggest.
> 
> Regards
> Jayesh Shinde


From rhayden.public at gmail.com  Thu Aug  4 13:48:23 2011
From: rhayden.public at gmail.com (Robert Hayden)
Date: Thu, 4 Aug 2011 08:48:23 -0500
Subject: [Linux-cluster] CCS to add VM to RHCS?
Message-ID: <CANqTVAEe04dnGA3O-DY9=oGZcXiDQLKcNPZn_tpo-hj_FLfodA@mail.gmail.com>

I was attempting to add VM resources to a two node cluster with the
ccs tool (RHEL 6.1).  I believe that it I am either not using the
proper ccs command or there is a bug in the ccs tool for VMs.  Wanted
to see if anyone has attempted this before I go to bugzilla.

Command:
	ccs -f cluster.build --addresource vm name=vm_b migrate=live
domain=kvm_node2_fo_domain autostart=1 recovery=restart use_virsh=1
	ccs -f cluster.build --addresource vm name=vm_a migrate=live
domain=kvm_node1_fo_domain autostart=1 recovery=restart use_virsh=1

These modify the cluster.build file as follows.  Notice that the <vm>
stanzas are located within the <resource>.  From what I have been able
to determine, the <vm> stanzas need to be in the <rm>, but not as a
"resource".  Otherwise rgmanager does not pick the VMs up.

  <rm>
    <failoverdomains>
      <failoverdomain name="kvm_node1_fo_domain" nofailback="1"
ordered="1" restricted="1">
        <failoverdomainnode name="node1" priority="1"/>
        <failoverdomainnode name="node2" priority="2"/>
      </failoverdomain>
      <failoverdomain name="kvm_node2_fo_domain" nofailback="1"
ordered="1" restricted="1">
        <failoverdomainnode name="node1" priority="2"/>
        <failoverdomainnode name="node2" priority="1"/>
      </failoverdomain>
    </failoverdomains>
    <resources>
      <vm autostart="1" domain="kvm_node1_fo_domain" migrate="live"
name="vm_a" recovery="restart" use_virsh="1"/>
      <vm autostart="1" domain="kvm_node2_fo_domain" migrate="live"
name="vm_b" recovery="restart" use_virsh="1"/>
    </resources>
  </rm>


Correct <rm> stanza as far as I know, at least this allows for the VMs
to be seen with clustat and for them to be managed.

  <rm>
    <failoverdomains>
      <failoverdomain name="kvm_node1_fo_domain" nofailback="1"
ordered="1" restricted="1">
        <failoverdomainnode name="node1" priority="1"/>
        <failoverdomainnode name="node2" priority="2"/>
      </failoverdomain>
      <failoverdomain name="kvm_node2_fo_domain" nofailback="1"
ordered="1" restricted="1">
        <failoverdomainnode name="node1" priority="2"/>
        <failoverdomainnode name="node2" priority="1"/>
      </failoverdomain>
    </failoverdomains>
    <resource/>
    <vm autostart="1" domain="kvm_node1_fo_domain" migrate="live"
name="vm_a" recovery="restart" use_virsh="1"/>
    <vm autostart="1" domain="kvm_node2_fo_domain" migrate="live"
name="vm_b" recovery="restart" use_virsh="1"/>
  </rm>


Thanks
Robert


From fdinitto at redhat.com  Thu Aug  4 14:20:53 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 04 Aug 2011 16:20:53 +0200
Subject: [Linux-cluster] CCS to add VM to RHCS?
In-Reply-To: <CANqTVAEe04dnGA3O-DY9=oGZcXiDQLKcNPZn_tpo-hj_FLfodA@mail.gmail.com>
References: <CANqTVAEe04dnGA3O-DY9=oGZcXiDQLKcNPZn_tpo-hj_FLfodA@mail.gmail.com>
Message-ID: <4E3AAAC5.1030507@redhat.com>

Hi Robert,

i was pointed to: https://bugzilla.redhat.com/show_bug.cgi?id=718230

not sure you have enough privileges to see the bz but the issue is known
and the fix is on its way.

Fabio

On 8/4/2011 3:48 PM, Robert Hayden wrote:
> I was attempting to add VM resources to a two node cluster with the
> ccs tool (RHEL 6.1).  I believe that it I am either not using the
> proper ccs command or there is a bug in the ccs tool for VMs.  Wanted
> to see if anyone has attempted this before I go to bugzilla.
> 
> Command:
> 	ccs -f cluster.build --addresource vm name=vm_b migrate=live
> domain=kvm_node2_fo_domain autostart=1 recovery=restart use_virsh=1
> 	ccs -f cluster.build --addresource vm name=vm_a migrate=live
> domain=kvm_node1_fo_domain autostart=1 recovery=restart use_virsh=1
> 
> These modify the cluster.build file as follows.  Notice that the <vm>
> stanzas are located within the <resource>.  From what I have been able
> to determine, the <vm> stanzas need to be in the <rm>, but not as a
> "resource".  Otherwise rgmanager does not pick the VMs up.
> 
>   <rm>
>     <failoverdomains>
>       <failoverdomain name="kvm_node1_fo_domain" nofailback="1"
> ordered="1" restricted="1">
>         <failoverdomainnode name="node1" priority="1"/>
>         <failoverdomainnode name="node2" priority="2"/>
>       </failoverdomain>
>       <failoverdomain name="kvm_node2_fo_domain" nofailback="1"
> ordered="1" restricted="1">
>         <failoverdomainnode name="node1" priority="2"/>
>         <failoverdomainnode name="node2" priority="1"/>
>       </failoverdomain>
>     </failoverdomains>
>     <resources>
>       <vm autostart="1" domain="kvm_node1_fo_domain" migrate="live"
> name="vm_a" recovery="restart" use_virsh="1"/>
>       <vm autostart="1" domain="kvm_node2_fo_domain" migrate="live"
> name="vm_b" recovery="restart" use_virsh="1"/>
>     </resources>
>   </rm>
> 
> 
> Correct <rm> stanza as far as I know, at least this allows for the VMs
> to be seen with clustat and for them to be managed.
> 
>   <rm>
>     <failoverdomains>
>       <failoverdomain name="kvm_node1_fo_domain" nofailback="1"
> ordered="1" restricted="1">
>         <failoverdomainnode name="node1" priority="1"/>
>         <failoverdomainnode name="node2" priority="2"/>
>       </failoverdomain>
>       <failoverdomain name="kvm_node2_fo_domain" nofailback="1"
> ordered="1" restricted="1">
>         <failoverdomainnode name="node1" priority="2"/>
>         <failoverdomainnode name="node2" priority="1"/>
>       </failoverdomain>
>     </failoverdomains>
>     <resource/>
>     <vm autostart="1" domain="kvm_node1_fo_domain" migrate="live"
> name="vm_a" recovery="restart" use_virsh="1"/>
>     <vm autostart="1" domain="kvm_node2_fo_domain" migrate="live"
> name="vm_b" recovery="restart" use_virsh="1"/>
>   </rm>
> 
> 
> Thanks
> Robert
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From cfeist at redhat.com  Mon Aug  8 05:04:49 2011
From: cfeist at redhat.com (Chris Feist)
Date: Mon, 8 Aug 2011 01:04:49 -0400 (EDT)
Subject: [Linux-cluster] CCS to add VM to RHCS?
In-Reply-To: <CANqTVAEe04dnGA3O-DY9=oGZcXiDQLKcNPZn_tpo-hj_FLfodA@mail.gmail.com>
Message-ID: <2089399540.2037507.1312779889474.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com>

Robert,

Unfortunately the 6.1 version of ccs didn't include the '--addvm' & '--rmvm' command to add vm's directly under the <rm> level in the cluster.conf file.  However, starting with 6.2 this will be added.  If you'd like to test out a beta version of the ccs command I can provide you the latest version.

Thanks!
Chris

----- Original Message -----
> From: "Robert Hayden" <rhayden.public at gmail.com>
> To: "linux clustering" <linux-cluster at redhat.com>
> Sent: Thursday, August 4, 2011 8:48:23 AM
> Subject: [Linux-cluster] CCS to add VM to RHCS?
> I was attempting to add VM resources to a two node cluster with the
> ccs tool (RHEL 6.1). I believe that it I am either not using the
> proper ccs command or there is a bug in the ccs tool for VMs. Wanted
> to see if anyone has attempted this before I go to bugzilla.
> 
> Command:
> ccs -f cluster.build --addresource vm name=vm_b migrate=live
> domain=kvm_node2_fo_domain autostart=1 recovery=restart use_virsh=1
> ccs -f cluster.build --addresource vm name=vm_a migrate=live
> domain=kvm_node1_fo_domain autostart=1 recovery=restart use_virsh=1
> 
> These modify the cluster.build file as follows. Notice that the <vm>
> stanzas are located within the <resource>. From what I have been able
> to determine, the <vm> stanzas need to be in the <rm>, but not as a
> "resource". Otherwise rgmanager does not pick the VMs up.
> 
> <rm>
> <failoverdomains>
> <failoverdomain name="kvm_node1_fo_domain" nofailback="1"
> ordered="1" restricted="1">
> <failoverdomainnode name="node1" priority="1"/>
> <failoverdomainnode name="node2" priority="2"/>
> </failoverdomain>
> <failoverdomain name="kvm_node2_fo_domain" nofailback="1"
> ordered="1" restricted="1">
> <failoverdomainnode name="node1" priority="2"/>
> <failoverdomainnode name="node2" priority="1"/>
> </failoverdomain>
> </failoverdomains>
> <resources>
> <vm autostart="1" domain="kvm_node1_fo_domain" migrate="live"
> name="vm_a" recovery="restart" use_virsh="1"/>
> <vm autostart="1" domain="kvm_node2_fo_domain" migrate="live"
> name="vm_b" recovery="restart" use_virsh="1"/>
> </resources>
> </rm>
> 
> 
> Correct <rm> stanza as far as I know, at least this allows for the VMs
> to be seen with clustat and for them to be managed.
> 
> <rm>
> <failoverdomains>
> <failoverdomain name="kvm_node1_fo_domain" nofailback="1"
> ordered="1" restricted="1">
> <failoverdomainnode name="node1" priority="1"/>
> <failoverdomainnode name="node2" priority="2"/>
> </failoverdomain>
> <failoverdomain name="kvm_node2_fo_domain" nofailback="1"
> ordered="1" restricted="1">
> <failoverdomainnode name="node1" priority="2"/>
> <failoverdomainnode name="node2" priority="1"/>
> </failoverdomain>
> </failoverdomains>
> <resource/>
> <vm autostart="1" domain="kvm_node1_fo_domain" migrate="live"
> name="vm_a" recovery="restart" use_virsh="1"/>
> <vm autostart="1" domain="kvm_node2_fo_domain" migrate="live"
> name="vm_b" recovery="restart" use_virsh="1"/>
> </rm>
> 
> 
> Thanks
> Robert
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From cos at aaaaa.org  Mon Aug  8 22:14:25 2011
From: cos at aaaaa.org (Ofer Inbar)
Date: Mon, 8 Aug 2011 18:14:25 -0400
Subject: [Linux-cluster] Expected behaviour when service fails to stop
In-Reply-To: <BANLkTikF8WF17c3VBqk9pRvHBG3POABAkA@mail.gmail.com>
References: <BANLkTikF8WF17c3VBqk9pRvHBG3POABAkA@mail.gmail.com>
Message-ID: <20110808221425.GZ341@mip.aaaaa.org>

Chris Alexander <chris.alexander at kusiri.com> wrote:
> I was wondering what the expected behaviour of the cluster would be when a
> service cannot be shutdown safely. For example, if you request a service
> group to be relocated to another node in the cluster, if one of the services
> in that group fails to stop (causing a timeout?), what would the result be?
> I should imagine that the service would be marked as Failed, is this the
> case? I have been unable to find this particular scenario documented anywhere.

This may be the documentation you're looking for:
  https://fedorahosted.org/cluster/wiki/ServiceOperationalBehaviors

Under "Service States", the "failed" state is documented as:
  failed - The service is presumed dead. This state occurs whenever a
  resource's stop operation fails. Administrator must verify that there
  are no allocated resources (mounted file systems, etc.) prior to
  issuing a disable request. The only action which can take place from
  this state is disable.

So your intuition that the service is marked as "failed" if the stop
fails, is correct.  However, I'm not sure what you mean by "causing a
timeout".  What defines a stop failure is up to the resource agent
script (located in /usr/share/cluster) corresponding to the resource
it's trying to stop.  If the "stop" operation from that script returns
a non-zero exit code, then the stop is considered to have failed.
  -- Cos


From cos at aaaaa.org  Mon Aug  8 23:24:34 2011
From: cos at aaaaa.org (Ofer Inbar)
Date: Mon, 8 Aug 2011 19:24:34 -0400
Subject: [Linux-cluster] meta-data problem: rg_test shows the wrong value
Message-ID: <20110808232433.GF7753@mip.aaaaa.org>

I'm having a perplexing issue with a resource with a custom resource
agent I've written.  Here's what the cluster.conf section for it looks
like (with some anonymization of names and IPs):

<rm log_level="6">
  <service autostart="1" name="customresource" recovery="relocate">
    <ip address="10.6.19.50" monitor_link="1">
      <customresource name="A" monitoringport="9100" status_timeout="10"/>
      <customresource name="B" monitoringport="9105" status_timeout="30"/>
    </ip>
  </service>
</rm>

In the resource agent script /usr/share/cluster/customresource.sh,
status interval is calculated to be status_timeout * 2 + 2.  So in
this case, customresource A should have an interval of 22, and B
should have an interval of 62.

When I run the resource agent by hand, I get the right values:

| # export OCF_RESKEY_name="A"
| # export OCF_RESKEY_monitoringport="9100"
| # export OCF_RESKEY_status_timeout="10"
| # /usr/share/cluster/customresource.sh meta-data
[...]
|     <actions>
|         <action name="meta-data" timeout="5s"/>   
|         <action name="methods" timeout="5s"/>
|         <action name="start" timeout="10s"/>
|         <action name="stop" timeout="30s"/>
|         <action name="status" interval="22s" timeout="10s"/>
|         <action name="monitor" interval="22s" timeout="10s"/>
|         <action name="verify" timeout="5s"/>
|     </actions>
| </resource-agent>

However, when I run rg_test on this same cluster.conf and agent script,
I get a different value:

| $ sudo rg_test test /tmp/cluster.conf
[...]
|     myresource {
|       name = "A";
|       monitoringport = "9100";
|       status_timeout = "10";
|       status_interval = "40";
|     }
|     myresource {
|       name = "B"
|       monitoringport = "9105";
|       status_timeout = "30";
|       status_interval = "40";
|     }

Where is it getting this "40" value from?

Well, the funny thing is that the correct value *used* to be 40.

That was the default the resource agent sets if you *don't* specify
status_timeout in cluster.conf.  To test my new change, I made a copy
of cluster.conf in /tmp, added the new status_timeout values, and ran
rg_test on it.  But somehow, rg_test seems to be giving me a value that
does not come from this run of the resource agent and this cluster.conf.

Anyone know what's going on here?
  -- Cos


From cos at aaaaa.org  Mon Aug  8 23:40:00 2011
From: cos at aaaaa.org (Ofer Inbar)
Date: Mon, 8 Aug 2011 19:40:00 -0400
Subject: [Linux-cluster] meta-data problem: rg_test shows the wrong value
In-Reply-To: <20110808232433.GF7753@mip.aaaaa.org>
References: <20110808232433.GF7753@mip.aaaaa.org>
Message-ID: <20110808234000.GB341@mip.aaaaa.org>

On Mon, Aug 08, 2011 at 07:24:34PM -0400,
Ofer Inbar <cos at aaaaa.org> wrote:
> |     myresource {
> |       name = "B"
> |       monitoringport = "9105";
> |       status_timeout = "30";
> |       status_interval = "40";
> |     }
> 
> Where is it getting this "40" value from?
> 
> Well, the funny thing is that the correct value *used* to be 40.
> 
> That was the default the resource agent sets if you *don't* specify
> status_timeout in cluster.conf.  To test my new change, I made a copy
> of cluster.conf in /tmp, added the new status_timeout values, and ran
> rg_test on it.  But somehow, rg_test seems to be giving me a value that
> does not come from this run of the resource agent and this cluster.conf.

I should add that this strange behavior persists even after I:

 - Change myresource.sh such that its default value is noe 60, not 40
 - Restart rgmanager on all three nodes in the cluster

Watching the service's logs, I can also see that rgmanager is actually
calling "status" every 70 seconds, which is 30+40, so it is obeying the
incorrect value.

(And separately, this makes me realize that it waits timeout+interval
seconds between status checks, which I had not realized; I had assumed
that it would only wait interval seconds between checks.  This stuff is
not well documented :/)

  -- Cos


From enakai at redhat.com  Tue Aug  9 00:43:47 2011
From: enakai at redhat.com (Etsuji Nakai)
Date: Mon, 8 Aug 2011 20:43:47 -0400 (EDT)
Subject: [Linux-cluster] ccs/ricci cluster operation design
In-Reply-To: <1402015666.1901634.1312842853298.JavaMail.root@zmail01.collab.prod.int.phx2.redhat.com>
Message-ID: <1003810351.1902878.1312850627038.JavaMail.root@zmail01.collab.prod.int.phx2.redhat.com>

Let me know your thoughts on the ccs/ricci cluster operation design. 

The bottom line is that it's a bad design to get the failed node to join the cluster automatically, and I think ccs/ricci should have options (in additon to --start/--stop) which just starts/stops services and doesn't change the chkconfig status. 

Here is the details of the problem:

You can start/stop the cluster with ccs --start/--stop, but my customer cannot adopt it from the following reason.

In the customer's cluster:

- They start/stop the cluster with starting/stopping the services directly.(Not using ccs/ricci interface at the moment.)
- They set chkconfig off for the cluster services (cman, rgmanger etc.)
- They force-reboot the failed node with the fence device.

In this setting, when a node is force-rebooted with some problem such as kernel panic, for example, the node doesn't automatically join the cluster. Then the customer logs-in to the node and investigates the problem. When they are sure that the problem is resolved, they start the cluster services on this node again.

Now, the problem is that this customer cannot adopt the ccs tool for the cluster operation. Under the ccs operation, when the failed node is
force-rebooted, it automatically tries to join the cluster as chkconfig is on although the potential problem is not yet investigated and resolved by the customer. 

Here's the related discussion on bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728041

-- Etsuji


From cos at aaaaa.org  Tue Aug  9 03:35:51 2011
From: cos at aaaaa.org (Ofer Inbar)
Date: Mon, 8 Aug 2011 23:35:51 -0400
Subject: [Linux-cluster] RHCS resource agent: status interval vs.
	monitor interval
In-Reply-To: <20110728213924.GD341@mip.aaaaa.org>
References: <20110728213924.GD341@mip.aaaaa.org>
Message-ID: <20110809033550.GG7753@mip.aaaaa.org>

On Thu, Jul 28, 2011 at 05:39:24PM -0400, I wrote:
> In the <actions> section of a RHCS resource agent's meta-data,
> there are nodes for both action name="status" and action name="monitor".
> Both of them have an interval and a timeout.  For example, in ip.sh:
> 
>         <!-- Checks to see if the IP is up and (optionally) the link is
>              working -->
>         <action name="status" interval="20" timeout="10"/>
>         <action name="monitor" interval="20" timeout="10"/>
> 
>         <!-- Checks to see if we can ping the IP address locally -->
>         <action name="status" depth="10" interval="60" timeout="20"/>
>         <action name="monitor" depth="10" interval="60" timeout="20"/>
> 
> I assume that one of them controls how often rgmanager runs the
> resource agent to check the resource status, but which one, and
> what's the point of the other one?
> 
> I tried to find the answer in:
>   https://fedorahosted.org/cluster/wiki/ResourceActions
>   http://www.opencf.org/cgi-bin/viewcvs.cgi/*checkout*/specs/ra/resource-agent-api.txt?rev=1.10
> 
> Neither of them explain why there are separate "status" and "monitor" actions.

Ralph.Grothe at itdz-berlin.de was the only person to respond.  He said
he thinks that under RHCS, "monitor" is ignored and only "status" is
used, but he's not sure.

Separately, since then I've come to understand that what I thought
"interval" and "timeout" controlled is not the case.  I believed
that rgmanager would attempt a status (or monitor?) check every
interval seconds.  That does not appear to be true.

I've been unable to find documentation of any of this on the wiki
or anywhere else I've searched.  Some references are made to things
like how to change the status interval, but what it does is implied,
not stated.

Does there exist any real documentation anywhere, of how rgmanager
reads and makes use of this metadata, and how it does status checks?
Or is diving into the source the only way of figuring this out?

(I'm resistant to that partly because I'm not really a programmer,
and partly because code often contains bugs or hidden assumptions
and doesn't really document how things are intended to work; some
answers from the source may turn out to be ephemeral, others partial.)
  -- Cos


From skjbalaji at gmail.com  Tue Aug  9 16:34:05 2011
From: skjbalaji at gmail.com (Balaji S)
Date: Tue, 9 Aug 2011 22:04:05 +0530
Subject: [Linux-cluster] Linux-cluster Digest, Vol 88, Issue 6
In-Reply-To: <mailman.45.1312905606.13658.linux-cluster@redhat.com>
References: <mailman.45.1312905606.13658.linux-cluster@redhat.com>
Message-ID: <CAD_Uw4kT-jogUVDofQ4sn1ziTw_LW_HqHR=2=Tr42RaLjtrgOg@mail.gmail.com>

Hi Ofer Inbar,
When cluster service start failover to other node, after some time still the
service in recovery mode, then the cluster again showing the service is
failed, may i know whats the default time cluster will wait for the service
to recover completely? Also can we increase the cluster wait time? If yes,
then where is the config we need to extend the default time?  Valuable
suggestions are really helpful.

In my scenario, i am facing the same kind of problem, when cluster waits for
around 15 min, if the service not recovered properly again cluster killing
the service and showing as failed. I am manually stopping the cluster
services on all the nodes and starting service as standalone to recover all
the things and putting back in cluster after service starts perfectly.

Thanks in Advance,

BSK.

On Tue, Aug 9, 2011 at 9:30 PM, <linux-cluster-request at redhat.com> wrote:

> Send Linux-cluster mailing list submissions to
>        linux-cluster at redhat.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://www.redhat.com/mailman/listinfo/linux-cluster
> or, via email, send a message with subject or body 'help' to
>        linux-cluster-request at redhat.com
>
> You can reach the person managing the list at
>        linux-cluster-owner at redhat.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Linux-cluster digest..."
>
>
> Today's Topics:
>
>   1. Re: Expected behaviour when service fails to stop (Ofer Inbar)
>   2. meta-data problem: rg_test shows the wrong value (Ofer Inbar)
>   3. Re: meta-data problem: rg_test shows the wrong value (Ofer Inbar)
>   4. ccs/ricci cluster operation design (Etsuji Nakai)
>   5. Re: RHCS resource agent: status interval vs.      monitor interval
>      (Ofer Inbar)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 8 Aug 2011 18:14:25 -0400
> From: Ofer Inbar <cos at aaaaa.org>
> To: linux clustering <linux-cluster at redhat.com>
> Subject: Re: [Linux-cluster] Expected behaviour when service fails to
>        stop
> Message-ID: <20110808221425.GZ341 at mip.aaaaa.org>
> Content-Type: text/plain; charset=us-ascii
>
> Chris Alexander <chris.alexander at kusiri.com> wrote:
> > I was wondering what the expected behaviour of the cluster would be when
> a
> > service cannot be shutdown safely. For example, if you request a service
> > group to be relocated to another node in the cluster, if one of the
> services
> > in that group fails to stop (causing a timeout?), what would the result
> be?
> > I should imagine that the service would be marked as Failed, is this the
> > case? I have been unable to find this particular scenario documented
> anywhere.
>
> This may be the documentation you're looking for:
>  https://fedorahosted.org/cluster/wiki/ServiceOperationalBehaviors
>
> Under "Service States", the "failed" state is documented as:
>  failed - The service is presumed dead. This state occurs whenever a
>  resource's stop operation fails. Administrator must verify that there
>  are no allocated resources (mounted file systems, etc.) prior to
>  issuing a disable request. The only action which can take place from
>  this state is disable.
>
> So your intuition that the service is marked as "failed" if the stop
> fails, is correct.  However, I'm not sure what you mean by "causing a
> timeout".  What defines a stop failure is up to the resource agent
> script (located in /usr/share/cluster) corresponding to the resource
> it's trying to stop.  If the "stop" operation from that script returns
> a non-zero exit code, then the stop is considered to have failed.
>  -- Cos
>
>
>
> ------------------------------
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110809/988d8132/attachment.htm>

From spam at meszi.de  Wed Aug 10 09:13:40 2011
From: spam at meszi.de (Daniel Meszaros)
Date: Wed, 10 Aug 2011 11:13:40 +0200
Subject: [Linux-cluster] Debian 6, GFS2, chkconfig
Message-ID: <4E424BC4.5020802@meszi.de>

Hi there,

I had to figure out that there seems to be an incompatibility of 
chkconfig and the cman-Initscript under Debian 6(.0.2). I recognized it 
after having installed chkconfig.

With chkconfig installed I get this...
# service cman restart
Stopping cluster:
    Leaving fence domain... [  OK  ]
    Stopping gfs_controld... [  OK  ]
    Stopping dlm_controld... [  OK  ]
    Stopping fenced... [  OK  ]
    Stopping cman... [  OK  ]
    Unloading kernel modules... [  OK  ]
    Unmounting configfs... [  OK  ]
Starting cluster:
    Checking Network Manager... NetworkManager: unknown service

Network Manager is configured to run. Please disable it in the cluster.
[FAILED]

I checked the Initscipt for the error message:
# less /etc/init.d/cman
[...]
network_manager_enabled()
{
         if type chkconfig >/dev/null 2>&1 && chkconfig NetworkManager; 
then
                 errmsg="\nNetwork Manager is configured to run. Please 
disable it in the cluster."
                 return 1
         fi

         if status NetworkManager > /dev/null 2>&1 || \
            status network-manager > /dev/null; then
                 errmsg="\nNetwork Manager is running. Please disable it 
in the cluster."
                 return 1
         fi
         return 0
}
[...]

Without chkconfig it works fine:
# service cman restart
Stopping cluster:
    Leaving fence domain... [  OK  ]
    Stopping gfs_controld... [  OK  ]
    Stopping dlm_controld... [  OK  ]
    Stopping fenced... [  OK  ]
    Stopping cman... [  OK  ]
    Unloading kernel modules... [  OK  ]
    Unmounting configfs... [  OK  ]
Starting cluster:
    Checking Network Manager... [  OK  ]
    Global setup... [  OK  ]
    Loading kernel modules... [  OK  ]
    Mounting configfs... [  OK  ]
    Starting cman... [  OK  ]
    Waiting for quorum... [  OK  ]
    Starting fenced... [  OK  ]
    Starting dlm_controld... [  OK  ]
    Starting gfs_controld... [  OK  ]
    Unfencing self... [  OK  ]
    Joining fence domain... [  OK  ]

A possible workaround could be to temporarily rename the "chkconfig" 
command. I simply uninstalled it.

CU,
M?szi.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110810/49a3f9ef/attachment.htm>

From fdinitto at redhat.com  Wed Aug 10 11:48:29 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 10 Aug 2011 13:48:29 +0200
Subject: [Linux-cluster] Debian 6, GFS2, chkconfig
In-Reply-To: <4E424BC4.5020802@meszi.de>
References: <4E424BC4.5020802@meszi.de>
Message-ID: <4E42700D.4080104@redhat.com>

Hi Daniel,

this bug has already been addressed in cman init script. I suspect
Debian 6.x has an older version.

You can ask Debian maintainers to update or at least grab the latest
init script from STABLE31 branch.

Fabio

On 08/10/2011 11:13 AM, Daniel Meszaros wrote:
> Hi there,
> 
> I had to figure out that there seems to be an incompatibility of
> chkconfig and the cman-Initscript under Debian 6(.0.2). I recognized it
> after having installed chkconfig.
> 
> With chkconfig installed I get this...
> # service cman restart
> Stopping cluster:
>    Leaving fence domain... [  OK  ]
>    Stopping gfs_controld... [  OK  ]
>    Stopping dlm_controld... [  OK  ]
>    Stopping fenced... [  OK  ]
>    Stopping cman... [  OK  ]
>    Unloading kernel modules... [  OK  ]
>    Unmounting configfs... [  OK  ]
> Starting cluster:
>    Checking Network Manager... NetworkManager: unknown service
> 
> Network Manager is configured to run. Please disable it in the cluster.
> [FAILED]
> 
> I checked the Initscipt for the error message:
> # less /etc/init.d/cman
> [...]
> network_manager_enabled()
> {
>         if type chkconfig >/dev/null 2>&1 && chkconfig NetworkManager; then
>                 errmsg="\nNetwork Manager is configured to run. Please
> disable it in the cluster."
>                 return 1
>         fi
> 
>         if status NetworkManager > /dev/null 2>&1 || \
>            status network-manager > /dev/null; then
>                 errmsg="\nNetwork Manager is running. Please disable it
> in the cluster."
>                 return 1
>         fi
>         return 0
> }
> [...]
> 
> Without chkconfig it works fine:
> # service cman restart
> Stopping cluster:
>    Leaving fence domain... [  OK  ]
>    Stopping gfs_controld... [  OK  ]
>    Stopping dlm_controld... [  OK  ]
>    Stopping fenced... [  OK  ]
>    Stopping cman... [  OK  ]
>    Unloading kernel modules... [  OK  ]
>    Unmounting configfs... [  OK  ]
> Starting cluster:
>    Checking Network Manager... [  OK  ]
>    Global setup... [  OK  ]
>    Loading kernel modules... [  OK  ]
>    Mounting configfs... [  OK  ]
>    Starting cman... [  OK  ]
>    Waiting for quorum... [  OK  ]
>    Starting fenced... [  OK  ]
>    Starting dlm_controld... [  OK  ]
>    Starting gfs_controld... [  OK  ]
>    Unfencing self... [  OK  ]
>    Joining fence domain... [  OK  ]
> 
> A possible workaround could be to temporarily rename the "chkconfig"
> command. I simply uninstalled it.
> 
> CU,
> M?szi.
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From pradhanparas at gmail.com  Thu Aug 11 22:20:15 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Thu, 11 Aug 2011 17:20:15 -0500
Subject: [Linux-cluster] EFI in CLVM
Message-ID: <CADyt5gmYPPBYa7G-omi4bnftVWtPn6sXY7NzGfbrrNsSRvbxxQ@mail.gmail.com>

Hi,

I have a 2199GB LUN assigned to my 3 node cluster. Since its >2TB, I used
parted to create the EFI GPT parittion. After that pvcreate and vgcreate
were successfull but I get the following error when doing lvcreate.

lvcreate -n prd_vg10_lv -L2197GB prd_vg10
  /dev/sdc: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdd: read failed after 0 of 4096 at 0: Input/output error
  Error locking on node prd2: device-mapper: create ioctl failed: Device or
resource busy
  Error locking on node prd3: device-mapper: create ioctl failed: Device or
resource busy
  Error locking on node prd1: device-mapper: create ioctl failed: Device or
resource busy
  Failed to activate new LV.

Thanks!
Paras.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110811/8a9f2c8e/attachment.htm>

From ashley at host365.com  Thu Aug 11 22:40:12 2011
From: ashley at host365.com (ashley at host365.com)
Date: 11 Aug 2011 23:40:12 +0100
Subject: [Linux-cluster] =?utf-8?q?EFI_in_CLVM?=
Message-ID: <20110811224012.28302.qmail@psa101.host365.com>

Hi

I am away until 30/08/11. If you require support, please email support at host365.com or call +44 (0)207 610 9911 in my absence.

Regards
Ashley


From ajb2 at mssl.ucl.ac.uk  Fri Aug 12 12:39:39 2011
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Fri, 12 Aug 2011 13:39:39 +0100
Subject: [Linux-cluster] EFI in CLVM
In-Reply-To: <CADyt5gmYPPBYa7G-omi4bnftVWtPn6sXY7NzGfbrrNsSRvbxxQ@mail.gmail.com>
References: <CADyt5gmYPPBYa7G-omi4bnftVWtPn6sXY7NzGfbrrNsSRvbxxQ@mail.gmail.com>
Message-ID: <4E451F0B.4040304@mssl.ucl.ac.uk>

Paras pradhan wrote:
> Hi,
> 
> I have a 2199GB LUN assigned to my 3 node cluster. Since its >2TB, I used
> parted to create the EFI GPT parittion. After that pvcreate and vgcreate
> were successfull but I get the following error when doing lvcreate.
> 

If the entire LUN is a PV then you don't need to partition it.


From ashley at host365.com  Fri Aug 12 12:46:19 2011
From: ashley at host365.com (ashley at host365.com)
Date: 12 Aug 2011 13:46:19 +0100
Subject: [Linux-cluster] =?utf-8?q?EFI_in_CLVM?=
Message-ID: <20110812124619.15889.qmail@psa101.host365.com>

Hi

I am away until 30/08/11. If you require support, please email support at host365.com or call +44 (0)207 610 9911 in my absence.

Regards
Ashley


From pradhanparas at gmail.com  Fri Aug 12 15:14:51 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Fri, 12 Aug 2011 10:14:51 -0500
Subject: [Linux-cluster] EFI in CLVM
In-Reply-To: <4E451F0B.4040304@mssl.ucl.ac.uk>
References: <CADyt5gmYPPBYa7G-omi4bnftVWtPn6sXY7NzGfbrrNsSRvbxxQ@mail.gmail.com>
	<4E451F0B.4040304@mssl.ucl.ac.uk>
Message-ID: <CADyt5g=+x0NZ66tngaLWJjf_FkuKd6t3MJS6uSQDWeTtE0udsA@mail.gmail.com>

On Fri, Aug 12, 2011 at 7:39 AM, Alan Brown <ajb2 at mssl.ucl.ac.uk> wrote:

> Paras pradhan wrote:
>
>> Hi,
>>
>> I have a 2199GB LUN assigned to my 3 node cluster. Since its >2TB, I used
>> parted to create the EFI GPT parittion. After that pvcreate and vgcreate
>> were successfull but I get the following error when doing lvcreate.
>>
>>
> If the entire LUN is a PV then you don't need to partition it.


You mean don't use parted or any and directly proceed to pvcreate?


>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/**mailman/listinfo/linux-cluster<https://www.redhat.com/mailman/listinfo/linux-cluster>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110812/06b5dae8/attachment.htm>

From keith.schincke at gmail.com  Fri Aug 12 16:09:18 2011
From: keith.schincke at gmail.com (Keith Schincke)
Date: Fri, 12 Aug 2011 11:09:18 -0500
Subject: [Linux-cluster] EFI in CLVM
In-Reply-To: <CADyt5g=+x0NZ66tngaLWJjf_FkuKd6t3MJS6uSQDWeTtE0udsA@mail.gmail.com>
References: <CADyt5gmYPPBYa7G-omi4bnftVWtPn6sXY7NzGfbrrNsSRvbxxQ@mail.gmail.com>
	<4E451F0B.4040304@mssl.ucl.ac.uk>
	<CADyt5g=+x0NZ66tngaLWJjf_FkuKd6t3MJS6uSQDWeTtE0udsA@mail.gmail.com>
Message-ID: <C028F637-6669-4052-8227-F4710498B7FA@gmail.com>

Your physical volume can be a whole disk (mpath0) or a partition (mpath0p1). I use the partition schema as blank disks could be accidentally used by a follow on admin who does not fully understand the system configuration. 

Keith

Sent from my iPhone

On Aug 12, 2011, at 10:14, Paras pradhan <pradhanparas at gmail.com> wrote:

> 
> 
> On Fri, Aug 12, 2011 at 7:39 AM, Alan Brown <ajb2 at mssl.ucl.ac.uk> wrote:
> Paras pradhan wrote:
> Hi,
> 
> I have a 2199GB LUN assigned to my 3 node cluster. Since its >2TB, I used
> parted to create the EFI GPT parittion. After that pvcreate and vgcreate
> were successfull but I get the following error when doing lvcreate.
> 
> 
> If the entire LUN is a PV then you don't need to partition it.
> 
> You mean don't use parted or any and directly proceed to pvcreate?
> 
> 
>  
> 
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110812/0753e2fe/attachment.htm>

From pradhanparas at gmail.com  Fri Aug 12 16:24:40 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Fri, 12 Aug 2011 11:24:40 -0500
Subject: [Linux-cluster] EFI in CLVM
In-Reply-To: <C028F637-6669-4052-8227-F4710498B7FA@gmail.com>
References: <CADyt5gmYPPBYa7G-omi4bnftVWtPn6sXY7NzGfbrrNsSRvbxxQ@mail.gmail.com>
	<4E451F0B.4040304@mssl.ucl.ac.uk>
	<CADyt5g=+x0NZ66tngaLWJjf_FkuKd6t3MJS6uSQDWeTtE0udsA@mail.gmail.com>
	<C028F637-6669-4052-8227-F4710498B7FA@gmail.com>
Message-ID: <CADyt5g=+eRvHQf4YqiW8XLzm4EUA+Anc8bTqtxLmnXPr9BK9OQ@mail.gmail.com>

Does it mean that I don't need mpath0p1 ? If its the case i don't need to
run kpartx on mpath0?

And not having mpath0p1 will take away this device mapper ioctl failed issue
when creating lvcreate?

I am really confused why this lock has failed , also not sure if this is
related to this >2TB LUN.

Paras.


On Fri, Aug 12, 2011 at 11:09 AM, Keith Schincke
<keith.schincke at gmail.com>wrote:

> Your physical volume can be a whole disk (mpath0) or a partition
> (mpath0p1). I use the partition schema as blank disks could be accidentally
> used by a follow on admin who does not fully understand the system
> configuration.
>
> Keith
>
> Sent from my iPhone
>
> On Aug 12, 2011, at 10:14, Paras pradhan <pradhanparas at gmail.com> wrote:
>
>
>
> On Fri, Aug 12, 2011 at 7:39 AM, Alan Brown < <ajb2 at mssl.ucl.ac.uk>
> ajb2 at mssl.ucl.ac.uk> wrote:
>
>> Paras pradhan wrote:
>>
>>> Hi,
>>>
>>> I have a 2199GB LUN assigned to my 3 node cluster. Since its >2TB, I used
>>> parted to create the EFI GPT parittion. After that pvcreate and vgcreate
>>> were successfull but I get the following error when doing lvcreate.
>>>
>>>
>> If the entire LUN is a PV then you don't need to partition it.
>
>
> You mean don't use parted or any and directly proceed to pvcreate?
>
>
>
>
>>
>>
>>
>>
>> --
>> Linux-cluster mailing list
>>  <Linux-cluster at redhat.com>Linux-cluster at redhat.com
>>  <https://www.redhat.com/mailman/listinfo/linux-cluster>
>> https://www.redhat.com/**mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110812/00dea489/attachment.htm>

From keith.schincke at gmail.com  Fri Aug 12 16:49:09 2011
From: keith.schincke at gmail.com (Keith Schincke)
Date: Fri, 12 Aug 2011 11:49:09 -0500
Subject: [Linux-cluster] EFI in CLVM
In-Reply-To: <CADyt5g=+eRvHQf4YqiW8XLzm4EUA+Anc8bTqtxLmnXPr9BK9OQ@mail.gmail.com>
References: <CADyt5gmYPPBYa7G-omi4bnftVWtPn6sXY7NzGfbrrNsSRvbxxQ@mail.gmail.com>
	<4E451F0B.4040304@mssl.ucl.ac.uk>
	<CADyt5g=+x0NZ66tngaLWJjf_FkuKd6t3MJS6uSQDWeTtE0udsA@mail.gmail.com>
	<C028F637-6669-4052-8227-F4710498B7FA@gmail.com>
	<CADyt5g=+eRvHQf4YqiW8XLzm4EUA+Anc8bTqtxLmnXPr9BK9OQ@mail.gmail.com>
Message-ID: <CA+y8wqxcrQQqFK7AOcf_2e0wjOVoBMQrQvLVFu0_7vPUVCjAPg@mail.gmail.com>

I have used multiple >2T physical volumes in my large cluster.
Do you have clvmd running and does your LVM have the cluster flag turned on?

On Fri, Aug 12, 2011 at 11:24 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
> Does it mean that I don't need mpath0p1 ? If its the case i don't need to
> run kpartx on mpath0?
> And not having mpath0p1 will take away this device mapper ioctl failed issue
> when creating lvcreate?
> I am really confused why this lock has failed , also not sure if this is
> related to this >2TB LUN.
> Paras.
>
> On Fri, Aug 12, 2011 at 11:09 AM, Keith Schincke <keith.schincke at gmail.com>
> wrote:
>>
>> Your physical volume can be a whole disk (mpath0) or a partition
>> (mpath0p1). I use the partition schema as blank disks could be accidentally
>> used by a follow on admin who does not fully understand the system
>> configuration.
>> Keith
>>
>> Sent from my iPhone
>> On Aug 12, 2011, at 10:14, Paras pradhan <pradhanparas at gmail.com> wrote:
>>
>>
>>
>> On Fri, Aug 12, 2011 at 7:39 AM, Alan Brown <ajb2 at mssl.ucl.ac.uk> wrote:
>>>
>>> Paras pradhan wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have a 2199GB LUN assigned to my 3 node cluster. Since its >2TB, I
>>>> used
>>>> parted to create the EFI GPT parittion. After that pvcreate and vgcreate
>>>> were successfull but I get the following error when doing lvcreate.
>>>>
>>>
>>> If the entire LUN is a PV then you don't need to partition it.
>>
>> You mean don't use parted or any and directly proceed to pvcreate?
>>
>>
>>>
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


From pradhanparas at gmail.com  Fri Aug 12 17:05:08 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Fri, 12 Aug 2011 12:05:08 -0500
Subject: [Linux-cluster] EFI in CLVM
In-Reply-To: <CA+y8wqxcrQQqFK7AOcf_2e0wjOVoBMQrQvLVFu0_7vPUVCjAPg@mail.gmail.com>
References: <CADyt5gmYPPBYa7G-omi4bnftVWtPn6sXY7NzGfbrrNsSRvbxxQ@mail.gmail.com>
	<4E451F0B.4040304@mssl.ucl.ac.uk>
	<CADyt5g=+x0NZ66tngaLWJjf_FkuKd6t3MJS6uSQDWeTtE0udsA@mail.gmail.com>
	<C028F637-6669-4052-8227-F4710498B7FA@gmail.com>
	<CADyt5g=+eRvHQf4YqiW8XLzm4EUA+Anc8bTqtxLmnXPr9BK9OQ@mail.gmail.com>
	<CA+y8wqxcrQQqFK7AOcf_2e0wjOVoBMQrQvLVFu0_7vPUVCjAPg@mail.gmail.com>
Message-ID: <CADyt5gmAQwFyWfLnPJ_Aby7nV3o8nU_xnfcuJOQT=w50wAZWyA@mail.gmail.com>

Keith,

Yes clvm is running.

Here is what I have done.

1) LUN assigned to the cluster nodes
2) multipathd detected the LUNs in all nodes
3) used GNU parted to create EFI/GPT
4) DID: kpartx -a /dev/mapper/mpath13 , then I see mpath13p1
5) pvcreate /dev/mapper/mpath13p1
6) vgcreate -c y prd_vg10 /dev/mapper/mpath13p1
7) Ran kpartx -a  and partprobe in all nodes
8)
lvcreate -n prd_vg10_lv -L2197GB prd_vg10

  Error locking on node prd2: device-mapper: create ioctl failed: Device or
resource busy
  Error locking on node prd3: device-mapper: create ioctl failed: Device or
resource busy
  Error locking on node prd1: device-mapper: create ioctl failed: Device or
resource busy
  Failed to activate new LV.


Thanks
Paras.


On Fri, Aug 12, 2011 at 11:49 AM, Keith Schincke
<keith.schincke at gmail.com>wrote:

> I have used multiple >2T physical volumes in my large cluster.
> Do you have clvmd running and does your LVM have the cluster flag turned
> on?
>
> On Fri, Aug 12, 2011 at 11:24 AM, Paras pradhan <pradhanparas at gmail.com>
> wrote:
> > Does it mean that I don't need mpath0p1 ? If its the case i don't need to
> > run kpartx on mpath0?
> > And not having mpath0p1 will take away this device mapper ioctl failed
> issue
> > when creating lvcreate?
> > I am really confused why this lock has failed , also not sure if this is
> > related to this >2TB LUN.
> > Paras.
> >
> > On Fri, Aug 12, 2011 at 11:09 AM, Keith Schincke <
> keith.schincke at gmail.com>
> > wrote:
> >>
> >> Your physical volume can be a whole disk (mpath0) or a partition
> >> (mpath0p1). I use the partition schema as blank disks could be
> accidentally
> >> used by a follow on admin who does not fully understand the system
> >> configuration.
> >> Keith
> >>
> >> Sent from my iPhone
> >> On Aug 12, 2011, at 10:14, Paras pradhan <pradhanparas at gmail.com>
> wrote:
> >>
> >>
> >>
> >> On Fri, Aug 12, 2011 at 7:39 AM, Alan Brown <ajb2 at mssl.ucl.ac.uk>
> wrote:
> >>>
> >>> Paras pradhan wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I have a 2199GB LUN assigned to my 3 node cluster. Since its >2TB, I
> >>>> used
> >>>> parted to create the EFI GPT parittion. After that pvcreate and
> vgcreate
> >>>> were successfull but I get the following error when doing lvcreate.
> >>>>
> >>>
> >>> If the entire LUN is a PV then you don't need to partition it.
> >>
> >> You mean don't use parted or any and directly proceed to pvcreate?
> >>
> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Linux-cluster mailing list
> >>> Linux-cluster at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110812/b8e8f0c8/attachment.htm>

From zagar at arlut.utexas.edu  Fri Aug 12 17:17:43 2011
From: zagar at arlut.utexas.edu (Randy Zagar)
Date: Fri, 12 Aug 2011 12:17:43 -0500
Subject: [Linux-cluster] EFI in CLVM
In-Reply-To: <mailman.31.1313164818.18101.linux-cluster@redhat.com>
References: <mailman.31.1313164818.18101.linux-cluster@redhat.com>
Message-ID: <4E456037.3040304@arlut.utexas.edu>

On Fri, Aug 12, 2011 at 10:14 AM, Paras Pradhan wrote:
> On Fri, Aug 12, 2011 at 7:39 AM, Alan Brown <ajb2 at mssl.ucl.ac.uk> wrote:
>
>> Paras pradhan wrote:
>>
>>> Hi,
>>>
>>> I have a 2199GB LUN assigned to my 3 node cluster. Since its >2TB, I 
>>> used
>>> parted to create the EFI GPT parittion. After that pvcreate and vgcreate
>>> were successfull but I get the following error when doing lvcreate.
>>>
>>>
>> If the entire LUN is a PV then you don't need to partition it.
>
>
> You mean don't use parted or any and directly proceed to pvcreate?
That's correct.  Pvcreate can be used on raw unpartitioned devices (e.g. 
pvcreate /dev/sdc).

If you try to do this on your disks, I'm pretty sure pvcreate will 
complain/abort because it detects an existing partition table...

Personally, I find GPT partition tables to be annoying... (a) because I 
have to use "parted", and (b) because they're so difficult to erase from 
a disk.

If you want to get rid of that GPT partition table, you'll have to zero 
out (dd if=/dev/zero ...) the first three blocks AND the entire last 
cylinder of your disk to obliterate all traces of it (there's a backup 
GPT partition table hiding in the last cylinder).

-RZ

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5434 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110812/7cab655f/attachment.p7s>

From pradhanparas at gmail.com  Fri Aug 12 17:53:15 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Fri, 12 Aug 2011 12:53:15 -0500
Subject: [Linux-cluster] EFI in CLVM
In-Reply-To: <4E456037.3040304@arlut.utexas.edu>
References: <mailman.31.1313164818.18101.linux-cluster@redhat.com>
	<4E456037.3040304@arlut.utexas.edu>
Message-ID: <CADyt5gk1jC1S+y_0i6Db8_BA+ue63NTz+svesnK7z+pvKV5AVg@mail.gmail.com>

Removed the first few blocks and last cylinder successfully.So no GPT
signatures now.

pvcreate and vgcreate were successfully but not the lvcreate again.

*pvs o/p:*

  /dev/mpath/mpath13                             prd_vg10 lvm2 a-      2.00T
2.00T

*vgs o/p:*

 prd_vg10   1   0   0 wz--nc    2.00T 2.00T


*lvcreate o/p:*

lvcreate -n prd_vg10_lv -L2047G prd_vg10

Error locking on node prd3: device-mapper: create ioctl failed: Device or
resource busy
  Error locking on node prd1: device-mapper: create ioctl failed: Device or
resource busy
  Error locking on node prd2: device-mapper: create ioctl failed: Device or
resource busy


Now since I don't have device partitions, I don't care about kapartx . Am I
correct?

Paras.


On Fri, Aug 12, 2011 at 12:17 PM, Randy Zagar <zagar at arlut.utexas.edu>wrote:

> On Fri, Aug 12, 2011 at 10:14 AM, Paras Pradhan wrote:
>
>> On Fri, Aug 12, 2011 at 7:39 AM, Alan Brown <ajb2 at mssl.ucl.ac.uk> wrote:
>>
>>  Paras pradhan wrote:
>>>
>>>  Hi,
>>>>
>>>> I have a 2199GB LUN assigned to my 3 node cluster. Since its >2TB, I
>>>> used
>>>> parted to create the EFI GPT parittion. After that pvcreate and vgcreate
>>>> were successfull but I get the following error when doing lvcreate.
>>>>
>>>>
>>>>  If the entire LUN is a PV then you don't need to partition it.
>>>
>>
>>
>> You mean don't use parted or any and directly proceed to pvcreate?
>>
> That's correct.  Pvcreate can be used on raw unpartitioned devices (e.g.
> pvcreate /dev/sdc).
>
> If you try to do this on your disks, I'm pretty sure pvcreate will
> complain/abort because it detects an existing partition table...
>
> Personally, I find GPT partition tables to be annoying... (a) because I
> have to use "parted", and (b) because they're so difficult to erase from a
> disk.
>
> If you want to get rid of that GPT partition table, you'll have to zero out
> (dd if=/dev/zero ...) the first three blocks AND the entire last cylinder of
> your disk to obliterate all traces of it (there's a backup GPT partition
> table hiding in the last cylinder).
>
> -RZ
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110812/f7f5c800/attachment.htm>

From ajb2 at mssl.ucl.ac.uk  Sat Aug 13 01:32:45 2011
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Sat, 13 Aug 2011 02:32:45 +0100
Subject: [Linux-cluster] EFI in CLVM
In-Reply-To: <CADyt5g=+x0NZ66tngaLWJjf_FkuKd6t3MJS6uSQDWeTtE0udsA@mail.gmail.com>
References: <CADyt5gmYPPBYa7G-omi4bnftVWtPn6sXY7NzGfbrrNsSRvbxxQ@mail.gmail.com>
	<4E451F0B.4040304@mssl.ucl.ac.uk>
	<CADyt5g=+x0NZ66tngaLWJjf_FkuKd6t3MJS6uSQDWeTtE0udsA@mail.gmail.com>
Message-ID: <4E45D43D.2010401@mssl.ucl.ac.uk>

On 12/08/2011 16:14, Paras pradhan wrote:
>
>
>     If the entire LUN is a PV then you don't need to partition it.
>
>
> You mean don't use parted or any and directly proceed to pvcreate?
>

Correct.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110813/9f9465e7/attachment.htm>

From ashley at host365.com  Sat Aug 13 01:39:35 2011
From: ashley at host365.com (ashley at host365.com)
Date: 13 Aug 2011 02:39:35 +0100
Subject: [Linux-cluster] =?utf-8?q?EFI_in_CLVM?=
Message-ID: <20110813013935.8858.qmail@psa101.host365.com>

Hi

I am away until 30/08/11. If you require support, please email support at host365.com or call +44 (0)207 610 9911 in my absence.

Regards
Ashley


From ajb2 at mssl.ucl.ac.uk  Sat Aug 13 01:39:36 2011
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Sat, 13 Aug 2011 02:39:36 +0100
Subject: [Linux-cluster] EFI in CLVM
In-Reply-To: <CADyt5g=+eRvHQf4YqiW8XLzm4EUA+Anc8bTqtxLmnXPr9BK9OQ@mail.gmail.com>
References: <CADyt5gmYPPBYa7G-omi4bnftVWtPn6sXY7NzGfbrrNsSRvbxxQ@mail.gmail.com>
	<4E451F0B.4040304@mssl.ucl.ac.uk>
	<CADyt5g=+x0NZ66tngaLWJjf_FkuKd6t3MJS6uSQDWeTtE0udsA@mail.gmail.com>
	<C028F637-6669-4052-8227-F4710498B7FA@gmail.com>
	<CADyt5g=+eRvHQf4YqiW8XLzm4EUA+Anc8bTqtxLmnXPr9BK9OQ@mail.gmail.com>
Message-ID: <4E45D5D8.5090507@mssl.ucl.ac.uk>

On 12/08/2011 17:24, Paras pradhan wrote:
> Does it mean that I don't need mpath0p1 ? If its the case i don't need 
> to run kpartx on mpath0?

You still need kpartx, but that's a bit clunky anyway. Let dm-multipath 
take care of all that for you.

(The last time I used kpartx and friends was 2003. Dm-multipath and 
multipathd are much more user-friendly. All you need then is multipath 
-v2 -ll to verify things are where they should be...)

> And not having mpath0p1 will take away this device mapper ioctl failed 
> issue when creating lvcreate?
>

I think that's a separate issue. What's the underlaying structure? SAN? 
FC? iscsi? drdb?

> I am really confused why this lock has failed , also not sure if this 
> is related to this >2TB LUN.
>

It's not. Some of my LUNs are 25+Tb

FWIW having PVs on LUN partitions introduces a small but measurable 
speed penalty over making the entire LUN a PV - this is mostly down to 
the small offset a partition table adds to the front of the LUN.


From pradhanparas at gmail.com  Sat Aug 13 03:24:22 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Fri, 12 Aug 2011 22:24:22 -0500
Subject: [Linux-cluster] EFI in CLVM
In-Reply-To: <4E45D5D8.5090507@mssl.ucl.ac.uk>
References: <CADyt5gmYPPBYa7G-omi4bnftVWtPn6sXY7NzGfbrrNsSRvbxxQ@mail.gmail.com>
	<4E451F0B.4040304@mssl.ucl.ac.uk>
	<CADyt5g=+x0NZ66tngaLWJjf_FkuKd6t3MJS6uSQDWeTtE0udsA@mail.gmail.com>
	<C028F637-6669-4052-8227-F4710498B7FA@gmail.com>
	<CADyt5g=+eRvHQf4YqiW8XLzm4EUA+Anc8bTqtxLmnXPr9BK9OQ@mail.gmail.com>
	<4E45D5D8.5090507@mssl.ucl.ac.uk>
Message-ID: <CADyt5gkO7v9+ZYuQ7a5-eL2xfZuHF--GyeoK--Zs+9GF+=4_BQ@mail.gmail.com>

Alan,

Its a FC SAN.

Here is multipath -v2 -ll output and looks good .

--
mpath13 (360060e8004770d000000770d000003e9) dm-28 HITACHI,OPEN-V*4
[size=2.0T][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=2][active]
 \_ 5:0:1:7 sdt 65:48 [active][ready]
 \_ 6:0:1:7 sdu 65:64 [active][ready]
---


If I don't make an entire LUN a PV, I think I would then need partitions. Am
i right? and you think this will reduce the speed penalty?


Thanks
Paras.


On Fri, Aug 12, 2011 at 8:39 PM, Alan Brown <ajb2 at mssl.ucl.ac.uk> wrote:

> On 12/08/2011 17:24, Paras pradhan wrote:
>
>> Does it mean that I don't need mpath0p1 ? If its the case i don't need to
>> run kpartx on mpath0?
>>
>
> You still need kpartx, but that's a bit clunky anyway. Let dm-multipath
> take care of all that for you.
>
> (The last time I used kpartx and friends was 2003. Dm-multipath and
> multipathd are much more user-friendly. All you need then is multipath -v2
> -ll to verify things are where they should be...)
>
>
>  And not having mpath0p1 will take away this device mapper ioctl failed
>> issue when creating lvcreate?
>>
>>
> I think that's a separate issue. What's the underlaying structure? SAN? FC?
> iscsi? drdb?
>
>
>  I am really confused why this lock has failed , also not sure if this is
>> related to this >2TB LUN.
>>
>>
> It's not. Some of my LUNs are 25+Tb
>
>


> FWIW having PVs on LUN partitions introduces a small but measurable speed
> penalty over making the entire LUN a PV - this is mostly down to the small
> offset a partition table adds to the front of the LUN.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110812/93e5f47e/attachment.htm>

From klusterfsck at outofoptions.net  Sat Aug 13 18:53:44 2011
From: klusterfsck at outofoptions.net (klusterfsck at outofoptions.net)
Date: Sat, 13 Aug 2011 14:53:44 -0400
Subject: [Linux-cluster] Specifiying interface for cluster traffic
Message-ID: <20110813145344.14952ynvb5624gu0@outofoptions.net>

I saw an example config for this but seem to have misplaced the link.   
Anyone help me on this?  I need to have the cluster traffic over the  
hardwired link so that it doesn't go through the switch.

Thank You
Ken


From linux at alteeve.com  Sat Aug 13 23:51:40 2011
From: linux at alteeve.com (Digimer)
Date: Sat, 13 Aug 2011 19:51:40 -0400
Subject: [Linux-cluster] Specifiying interface for cluster traffic
In-Reply-To: <20110813145344.14952ynvb5624gu0@outofoptions.net>
References: <20110813145344.14952ynvb5624gu0@outofoptions.net>
Message-ID: <4E470E0C.4010408@alteeve.com>

On 08/13/2011 02:53 PM, klusterfsck at outofoptions.net wrote:
> I saw an example config for this but seem to have misplaced the link. 
> Anyone help me on this?  I need to have the cluster traffic over the
> hardwired link so that it doesn't go through the switch.
> 
> Thank You
> Ken

That depends largely on what kind of cluster you're talking about.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"


From ashley at host365.com  Sun Aug 14 00:05:00 2011
From: ashley at host365.com (ashley at host365.com)
Date: 14 Aug 2011 01:05:00 +0100
Subject: [Linux-cluster]
	=?utf-8?q?Specifiying_interface_for_cluster_traff?= =?utf-8?q?ic?=
Message-ID: <20110814000500.6018.qmail@psa101.host365.com>

Hi

I am away until 30/08/11. If you require support, please email support at host365.com or call +44 (0)207 610 9911 in my absence.

Regards
Ashley


From klusterfsck at outofoptions.net  Sun Aug 14 04:31:46 2011
From: klusterfsck at outofoptions.net (klusterfsck at outofoptions.net)
Date: Sun, 14 Aug 2011 00:31:46 -0400
Subject: [Linux-cluster] Specifiying interface for cluster traffic
In-Reply-To: <4E470E0C.4010408@alteeve.com>
References: <20110813145344.14952ynvb5624gu0@outofoptions.net>
	<4E470E0C.4010408@alteeve.com>
Message-ID: <20110814003146.19826otgn94oj3ci@outofoptions.net>

Quoting Digimer <linux at alteeve.com>:

> On 08/13/2011 02:53 PM, klusterfsck at outofoptions.net wrote:
>> I saw an example config for this but seem to have misplaced the link.
>> Anyone help me on this?  I need to have the cluster traffic over the
>> hardwired link so that it doesn't go through the switch.
>>
>> Thank You
>> Ken
>
> That depends largely on what kind of cluster you're talking about.
>
> --

It is a two machine cluster running drbd on a dedicated link.  We had  
a UPS fail during a power bump and the and the two machines decided  
they were no longer joined.  I saw a configuration for routing cluster  
communications over a dedicated link.  I want to do this with the drbd  
link since the write capacity of the drbd sync is less than the link  
bundle.  Basically take the switch out of the communications loop  
except for the outside traffic.

Thanks.
Ken


From linux at alteeve.com  Sun Aug 14 04:57:47 2011
From: linux at alteeve.com (Digimer)
Date: Sun, 14 Aug 2011 00:57:47 -0400
Subject: [Linux-cluster] Specifiying interface for cluster traffic
In-Reply-To: <20110814003146.19826otgn94oj3ci@outofoptions.net>
References: <20110813145344.14952ynvb5624gu0@outofoptions.net>
	<4E470E0C.4010408@alteeve.com>
	<20110814003146.19826otgn94oj3ci@outofoptions.net>
Message-ID: <4E4755CB.1060201@alteeve.com>

On 08/14/2011 12:31 AM, klusterfsck at outofoptions.net wrote:
> Quoting Digimer <linux at alteeve.com>:
> 
>> On 08/13/2011 02:53 PM, klusterfsck at outofoptions.net wrote:
>>> I saw an example config for this but seem to have misplaced the link.
>>> Anyone help me on this?  I need to have the cluster traffic over the
>>> hardwired link so that it doesn't go through the switch.
>>>
>>> Thank You
>>> Ken
>>
>> That depends largely on what kind of cluster you're talking about.
>>
>> -- 
> 
> It is a two machine cluster running drbd on a dedicated link.  We had a
> UPS fail during a power bump and the and the two machines decided they
> were no longer joined.  I saw a configuration for routing cluster
> communications over a dedicated link.  I want to do this with the drbd
> link since the write capacity of the drbd sync is less than the link
> bundle.  Basically take the switch out of the communications loop except
> for the outside traffic.
> 
> Thanks.
> Ken

Sorry, I meant which cluster software are you running? DRBD alone? Red
Hat Cluster Services? Corosync + Pacemaker? Heartbeat + Pacemaker?

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"


From sunhux at gmail.com  Mon Aug 15 02:55:35 2011
From: sunhux at gmail.com (sunhux G)
Date: Mon, 15 Aug 2011 10:55:35 +0800
Subject: [Linux-cluster] Options other than reboot to stop DP processes that
	can't be killed -9
Message-ID: <CABTxP=6iW+pJVT-kYEa9iq_2DHfcoiHog6JEjBcYAq3hcgj=Bg@mail.gmail.com>

Apologies if this is not the right list to post but getting desperate:

I have 2 processes (shown by ps -ef  below) which has 'jammed' the tape
drive below & I can't "kill -9" them.

Is there any way short of reboot to stop them, say "service xxx restart" or
anything else other than rebooting this Linux 4.x server?  Since reboot
involves doing "service stop xxx" of various services, surely one of the
xxx must be able to stop the processes (just an educated guess).  We
faced this issue with our Dataprotector quite often so frequent reboot
is not an option.

# ps -ef |grep -i bma |grep -v grep
root     10197     1  0 Aug13 ?        00:00:08 /opt/omni/lbin/vbda
-bmaname HP:Ultrium 4-SCSI_4 -type 2 -start 1313175661 -level 0
-access 1 0 -protection 2 1209600 -name / -ma xxxdgjt1.ss.de 22000 -id
1313175612 -volume / -profile -no_lock -hlink -no_touch -no_encode
-no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
-report 0 -on_busy  2 -no_nthlink -archattr -share_info -objname 02
xxxdgjt1.ss.de:/ // / -no_aligned
root     23303     1  0 Aug13 ?        00:00:03 /opt/omni/lbin/vbda
-bmaname HP:Ultrium 4-SCSI_1 -type 2 -start 1313192083 -level 0
-access 1 0 -protection 2 1209600 -name / -ma xxxdgjt1.ss.de 22000 -id
1313192026 -volume / -profile -no_lock -hlink -no_touch -no_encode
-no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
-report 0 -on_busy  2 -no_nthlink -archattr -share_info -objname 02
xxxdgjt1.ss.de:/ // / -no_aligned
root     25618     1  0 Aug13 ?        00:00:03 /opt/omni/lbin/vbda
-bmaname HP:Ultrium 4-SCSI_1 -type 2 -start 1313195066 -level 0
-access 1 0 -protection 2 1209600 -name / -ma xxxdgjt1.ss.de 22000 -id
1313195016 -volume / -profile -no_lock -hlink -no_touch -no_encode
-no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
-report 0 -on_busy  2 -no_nthlink -archattr -share_info -objname 02
xxxdgjt1.ss.de:/ // / -no_aligned


they're listening on the Tcp ports :

[root at xxxdgjt1 ~]# netstat -antp | grep 25618
tcp       21      0 172.17.1.47:5555            172.17.12.12:2128
     CLOSE_WAIT  25618/vbda
[root at xxxdgjt1 ~]# netstat -antp | grep 23303
tcp       21      0 172.17.1.47:5555            172.17.12.12:2073
     CLOSE_WAIT  23303/vbda


fuser all other partitions do not show processes locking/opening files, only the
root (ie / ) partition :

# fuser / |grep 25618    ==> will show 25618 & 25618r as amongst the processes
# fuser / |grep 23303    ==> will show 23303 & 23303r as amongst the processes


# cd /etc
# ls */*omni*
xinetd.d/omni

opt/omni:
client  server


From ashley at host365.com  Mon Aug 15 03:01:19 2011
From: ashley at host365.com (ashley at host365.com)
Date: 15 Aug 2011 04:01:19 +0100
Subject: [Linux-cluster]
	=?utf-8?q?Options_other_than_reboot_to_stop_DP_pr?=
	=?utf-8?q?ocesses_that=09can=27t_be_killed_-9?=
Message-ID: <20110815030119.31241.qmail@psa101.host365.com>

Hi

I am away until 30/08/11. If you require support, please email support at host365.com or call +44 (0)207 610 9911 in my absence.

Regards
Ashley


From Colin.Simpson at iongeo.com  Mon Aug 15 09:16:35 2011
From: Colin.Simpson at iongeo.com (Colin Simpson)
Date: Mon, 15 Aug 2011 10:16:35 +0100
Subject: [Linux-cluster] Options other than reboot to stop DP processes
	thatcan't be killed -9
In-Reply-To: <CABTxP=6iW+pJVT-kYEa9iq_2DHfcoiHog6JEjBcYAq3hcgj=Bg@mail.gmail.com>
References: <CABTxP=6iW+pJVT-kYEa9iq_2DHfcoiHog6JEjBcYAq3hcgj=Bg@mail.gmail.com>
Message-ID: <1313399795.27379.17.camel@bhac.iouk.ioroot.tld>

Probably not a cluster issue just pure kernel question.  Sounds like the
driver or device is locked up and the driver or device is confused, so
the processes attached to it will be hung. 

To be honest I've had similar problems on pretty much all Unixes for
many years. And I've never found a good way out of it. Maybe not an
option with your case and application, but I guess why most people have
their backup systems running on separate dedicated boxes so it can be
rebooted without affecting production systems.

I wish there was a way of saying to the kernel, something like, I want
to forceably unload this driver for a device and you can kill any
processes attached to it. Then you could reinitialise the driver and
processes.

Resetting the physical device might work (or has for me in the past) but
it equally I'd guess could panic the kernel. 

If someone else has a better way out of a hung device driver on Linux
I'd love to know too (seems particularly bad for tape devices in my
experience when it happens).

Colin

On Mon, 2011-08-15 at 03:55 +0100, sunhux G wrote:
> Apologies if this is not the right list to post but getting desperate:
> 
> I have 2 processes (shown by ps -ef  below) which has 'jammed' the
> tape
> drive below & I can't "kill -9" them.
> 
> Is there any way short of reboot to stop them, say "service xxx
> restart" or
> anything else other than rebooting this Linux 4.x server?  Since
> reboot
> involves doing "service stop xxx" of various services, surely one of
> the
> xxx must be able to stop the processes (just an educated guess).  We
> faced this issue with our Dataprotector quite often so frequent reboot
> is not an option.
> 
> # ps -ef |grep -i bma |grep -v grep
> root     10197     1  0 Aug13 ?        00:00:08 /opt/omni/lbin/vbda
> -bmaname HP:Ultrium 4-SCSI_4 -type 2 -start 1313175661 -level 0
> -access 1 0 -protection 2 1209600 -name / -ma xxxdgjt1.ss.de 22000 -id
> 1313175612 -volume / -profile -no_lock -hlink -no_touch -no_encode
> -no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
> -report 0 -on_busy  2 -no_nthlink -archattr -share_info -objname 02
> xxxdgjt1.ss.de:/ // / -no_aligned
> root     23303     1  0 Aug13 ?        00:00:03 /opt/omni/lbin/vbda
> -bmaname HP:Ultrium 4-SCSI_1 -type 2 -start 1313192083 -level 0
> -access 1 0 -protection 2 1209600 -name / -ma xxxdgjt1.ss.de 22000 -id
> 1313192026 -volume / -profile -no_lock -hlink -no_touch -no_encode
> -no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
> -report 0 -on_busy  2 -no_nthlink -archattr -share_info -objname 02
> xxxdgjt1.ss.de:/ // / -no_aligned
> root     25618     1  0 Aug13 ?        00:00:03 /opt/omni/lbin/vbda
> -bmaname HP:Ultrium 4-SCSI_1 -type 2 -start 1313195066 -level 0
> -access 1 0 -protection 2 1209600 -name / -ma xxxdgjt1.ss.de 22000 -id
> 1313195016 -volume / -profile -no_lock -hlink -no_touch -no_encode
> -no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
> -report 0 -on_busy  2 -no_nthlink -archattr -share_info -objname 02
> xxxdgjt1.ss.de:/ // / -no_aligned
> 
> 
> they're listening on the Tcp ports :
> 
> [root at xxxdgjt1 ~]# netstat -antp | grep 25618
> tcp       21      0 172.17.1.47:5555            172.17.12.12:2128
>      CLOSE_WAIT  25618/vbda
> [root at xxxdgjt1 ~]# netstat -antp | grep 23303
> tcp       21      0 172.17.1.47:5555            172.17.12.12:2073
>      CLOSE_WAIT  23303/vbda
> 
> 
> fuser all other partitions do not show processes locking/opening
> files, only the
> root (ie / ) partition :
> 
> # fuser / |grep 25618    ==> will show 25618 & 25618r as amongst the
> processes
> # fuser / |grep 23303    ==> will show 23303 & 23303r as amongst the
> processes
> 
> 
> # cd /etc
> # ls */*omni*
> xinetd.d/omni
> 
> opt/omni:
> client  server
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed.  If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


From Colin.Simpson at iongeo.com  Tue Aug 16 19:29:58 2011
From: Colin.Simpson at iongeo.com (Colin Simpson)
Date: Tue, 16 Aug 2011 20:29:58 +0100
Subject: [Linux-cluster] NFS Serving Issues
Message-ID: <1313522998.32588.70.camel@bhac.iouk.ioroot.tld>

Hi

I have two issues with clustered NFS Services on RHEL6.1. One is an
oddity and the other is a problem umounting NFS mounted file-systems.

First issue if I define my NFS services as (cluster.conf fragment):

<resources>
  <ip address="10.10.50.41" monitor_link="1"/>
  <fs device="/dev/cluvg00/lv00home" force_fsck="1" force_unmount="1"
mountpoint="/mnt/home" name="homefs" options="acl" quick_status="0"
self_fence="0"/>
  <nfsexport name="exporteclunfshome"/>
  <nfsclient name="nfsdhome" options="rw" target="10.0.0.0/8"/>
</resources>

<service autostart="0" domain="cluBnfb" exclusive="0" name="nfsdhome"
nfslock="1" recovery="relocate">
  <ip ref="10.10.50.41">
   <fs ref="homefs">
   <nfsexport ref="exportclunfshome">
     <nfsclient ref="nfsdhome"/>
   </nfsexport>
   </fs>
  </ip>
</service>

,when the service is stopped I get a "Stale NFS file handle" from
mounted filesystems accessing the NFS mount point at those times. i.e.
if I have a copy going I get on the service being disabled:

cp: cannot stat
`/home/wsmith/ww/cstst/./rhel-client-5.6-x86_64-dvd2.iso': Stale NFS
file handle
cp: cannot stat
`/home/wsmith/ww/cstst/./rhel-server-5.6-x86_64-dvd.iso': Stale NFS file
handle
cp: cannot stat `/home/wsmith/ww/cstst/./rhel-server-6.0-i386-dvd.iso':
Stale NFS file handle
cp: cannot stat
`/home/wsmith/ww/cstst/./rhel-server-6.0-x86_64-dvd.iso': Stale NFS file
handle
cp: cannot stat
`/home/wsmith/ww/cstst/./rhel-workstation-6.0-i386-dvd.iso': Stale NFS
file handle

The above format of cluster.conf having the "ip ref" contain the rest of
the things is as per the "Deploying Highly Available NFS on Red Hat
Enterprise Linux 6" document.

But if I don't enclose the nfs and fs things in the ip, the clients hang
until the services restart i.e

<service autostart="0" domain="cluBnfb" exclusive="0" name="nfsdhome"
nfslock="1" recovery="relocate">
  <ip ref="10.10.50.41"\>
   <fs ref="homefs">
   <nfsexport ref="exportclunfshome">
     <nfsclient ref="nfsdhome"/>
   </nfsexport>
   </fs>
</service>

This seems more sensible as a behaviour, as it would appear to be more
predictable from the clients (i.e their processes hang until the NFS
reappears). So in the case of my copy above it just resumes when the NFS
service reappears. This is as per the NFS cookbook.

BTW Is it best practice to use one nfsexport per nfsclient or is one
nfsexport resource enough cluster wide?

Why is there a behaviour disparity? Which is correct?

Question 2: I have the old case on either of the above where I can't
unmount the exported file system when I stop the service (so I can't
migrate it). Not unless I halt the file server hosting the file share or
force fence it. I just get the old:

# umount /mnt/home
umount: /mnt/home: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))

Of course nothing is shown in lsof or fuser. This is annoying for a
number of reasons. One is that I can't readily perform basic load
balancing by migrating NFS services to their correct nodes (as I can't
migrate a service without halting a node). 

But more seriously I can't easily shut the cluster down cleanly when
told to by a UPS on power outage. Shutting down the node will be unable
to be performed cleanly as a resource is open (so will be liable to
fencing). If I halt the node (the least bad option left I can see) it
will get fenced and it will start booting back up again (which I'd like
pretty much in all other circumstances except a power outage). Forcing
the node to leave the cluster before my halt, will result in fencing and
restart. "umount -fl" doesn't free the resource locking the services.

Any tips for how to make this work more cleanly or how to free the
things stopping the NFS exported filesystem umounting cleanly?

Thanks for any advice

Colin

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed.  If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


From ashley at host365.com  Tue Aug 16 19:38:50 2011
From: ashley at host365.com (ashley at host365.com)
Date: 16 Aug 2011 20:38:50 +0100
Subject: [Linux-cluster] =?utf-8?q?NFS_Serving_Issues?=
Message-ID: <20110816193850.16378.qmail@psa101.host365.com>

Hi

I am away until 30/08/11. If you require support, please email support at host365.com or call +44 (0)207 610 9911 in my absence.

Regards
Ashley


From ajb2 at mssl.ucl.ac.uk  Wed Aug 17 11:00:42 2011
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Wed, 17 Aug 2011 12:00:42 +0100
Subject: [Linux-cluster] NFS Serving Issues
In-Reply-To: <1313522998.32588.70.camel@bhac.iouk.ioroot.tld>
References: <1313522998.32588.70.camel@bhac.iouk.ioroot.tld>
Message-ID: <4E4B9F5A.7060001@mssl.ucl.ac.uk>

Colin Simpson wrote:

> ,when the service is stopped I get a "Stale NFS file handle" from
> mounted filesystems accessing the NFS mount point at those times. i.e.
> if I have a copy going I get on the service being disabled:

That's normal if a NFS server mount is unexported or nfsd shuts down.

It _should_ (but doesn't always) clear when NFS resumes.

The only way around this is to define an IP for the NFS service/export 
pair and make the IP the final dependency in the service (ie, the IP is 
the last thing to come up and first thing to go down):

                <service autostart="1" domain="msslap-pref" 
name="MSSLAU-X41" recovery="restart">
                         <clusterfs ref="/stage/peace12">
                                 <nfsexport ref="msslau-x41-exports">
                                          <nfsclient 
ref="/stage/peace12-- at alphac">
                                                 <nfsclient 
ref="/stage/peace12--127/8">
                                                         <nfsclient 
ref="/stage/peace12-- at linuxt">
 
<nfsclient ref="/stage/peace12-- at plasmawriter">
 
  <nfsclient ref="/stage/peace12-- at webserver">
 
          <ip ref="192.168.128.41"/>
 
   </nfsclient>
 
</nfsclient>
                                                          </nfsclient>
                                                  </nfsclient>
                                         </nfsclient>
                                 </nfsexport>
                         </clusterfs>
                 </service>


> The above format of cluster.conf having the "ip ref" contain the rest of
> the things is as per the "Deploying Highly Available NFS on Red Hat
> Enterprise Linux 6" document.


> But if I don't enclose the nfs and fs things in the ip, the clients hang
> until the services restart i.e

Which is normal NFS client behaviour. (Cluster sends out a bunch of 
gratuitous ARPs when the services change host in order to have the 
IP/arp pair updated more quickly.)

> BTW Is it best practice to use one nfsexport per nfsclient or is one
> nfsexport resource enough cluster wide?

"It depends"

If all NFS will come off one host then one resource is enough.

If NFS might run off several hosts then you'll need one resource per export.

> Why is there a behaviour disparity? Which is correct?

They're both correct - and both wrong. :)

> Question 2: I have the old case on either of the above where I can't
> unmount the exported file system when I stop the service (so I can't
> migrate it). Not unless I halt the file server hosting the file share or
> force fence it. I just get the old:
> 
> # umount /mnt/home
> umount: /mnt/home: device is busy.
>         (In some cases useful info about processes that use
>          the device is found by lsof(8) or fuser(1))

On client side, umount -l will help in these cases.

on server side, restarting the nfslock service is usually sufficent to 
get umount to work (It's safe, clients are told to reacquire their locks)

> Of course nothing is shown in lsof or fuser. This is annoying for a
> number of reasons. One is that I can't readily perform basic load
> balancing by migrating NFS services to their correct nodes (as I can't
> migrate a service without halting a node). 

What's your backend? GFS?

> But more seriously I can't easily shut the cluster down cleanly when
> told to by a UPS on power outage. Shutting down the node will be unable
> to be performed cleanly as a resource is open (so will be liable to
> fencing).

If the filesystem is GFS, fencing is about the only reliable way of 
leaving the cluster.

However: if you shut down nfsd _and_ nfslock, you should be able to 
unmount the FSes cleanly.


From ashley at host365.com  Wed Aug 17 11:09:58 2011
From: ashley at host365.com (ashley at host365.com)
Date: 17 Aug 2011 12:09:58 +0100
Subject: [Linux-cluster] =?utf-8?q?NFS_Serving_Issues?=
Message-ID: <20110817110958.31379.qmail@psa101.host365.com>

Hi

I am away until 30/08/11. If you require support, please email support at host365.com or call +44 (0)207 610 9911 in my absence.

Regards
Ashley


From bubble at hoster-ok.com  Wed Aug 17 15:13:11 2011
From: bubble at hoster-ok.com (Vladislav Bogdanov)
Date: Wed, 17 Aug 2011 18:13:11 +0300
Subject: [Linux-cluster] Handling of CPG_REASON_NODEDOWN in daemons
Message-ID: <4E4BDA87.7010507@hoster-ok.com>

Hi all,

I hope I found a correct list.

I discovering a reason why node was not fenced on CPG_REASON_NODEDOWN event.

Here what I see in dlm_tool dump:
1313579105 Processing membership 80592
1313579105 Skipped active node 939787530: born-on=80580,
last-seen=80592, this-event=80592, last-event=80580
1313579105 Skipped active node 956564746: born-on=80564,
last-seen=80592, this-event=80592, last-event=80580
1313579105 del_configfs_node rmdir
"/sys/kernel/config/dlm/cluster/comms/1543767306"
1313579105 Removed inactive node 1543767306: born-on=80572,
last-seen=80580, this-event=80592, last-event=80580
1313579105 dlm:controld conf 2 0 1 memb 939787530 956564746 join left
1543767306
1313579105 dlm:ls:clvmd conf 2 0 1 memb 939787530 956564746 join left
1543767306
1313579105 clvmd add_change cg 4 remove nodeid 1543767306 reason 3
1313579105 clvmd add_change cg 4 counts member 2 joined 0 remove 1 failed 1
1313579105 clvmd stop_kernel cg 4
1313579105 write "0" to "/sys/kernel/dlm/clvmd/control"
1313579105 Node 1543767306/mgmt01 has not been shot yet
1313579105 clvmd check_fencing 1543767306 wait add 1313562825 fail
1313579105 last 0
1313579107 Node 1543767306/mgmt01 was last shot 'now'
1313579107 clvmd check_fencing 1543767306 done add 1313562825 fail
1313579105 last 1313579107
1313579107 clvmd check_fencing done

That means that dlm_controld received CPG_REASON_NODEDOWN event for
clvmd CPG and did not call kick_node_from_cluster(), so pacemaker didn't
do fencing on behalf of clvmd cpg.

Please correct me if I'm wrong:
* Request for fencing of node on CPG_REASON_NODEDOWN event was
historically left to groupd to do.
* That's why all daemons (fenced, dlm_controld, gfs2_controld) call
kick_node_from_cluster() only on CPG_REASON_PROCDOWN event, not on
CPG_REASON_NODEDOWN.
* groupd is obsoleted in 3.x.

Shouldn't daemons request fencing on CPG_REASON_NODEDOWN too?
Now they only mark node as failed and increase cg failcount.

I use pacemaker-based setup, and actually use only (obsoleted)
dlm_controld.pcmk, but problems seems to be a little bit wider than that
daemons one.

Setup is:
corosync-1.4.1
openais-1.1.4
pacemaker-tip
clusterlib-3.1.1
dlm_controld.pcmk from 3.0.17
lvm2-cluster-2.0.85

Best,
Vladislav


From Colin.Simpson at iongeo.com  Wed Aug 17 19:01:28 2011
From: Colin.Simpson at iongeo.com (Colin Simpson)
Date: Wed, 17 Aug 2011 20:01:28 +0100
Subject: [Linux-cluster] NFS Serving Issues
In-Reply-To: <4E4B9F5A.7060001@mssl.ucl.ac.uk>
References: <1313522998.32588.70.camel@bhac.iouk.ioroot.tld>
	<4E4B9F5A.7060001@mssl.ucl.ac.uk>
Message-ID: <1313607688.8477.27.camel@bhac.iouk.ioroot.tld>

Hi Alan, 

Thanks for getting back.

On Wed, 2011-08-17 at 12:00 +0100, Alan Brown wrote:
> Colin Simpson wrote:
> 
> > ,when the service is stopped I get a "Stale NFS file handle" from
> > mounted filesystems accessing the NFS mount point at those times.
> i.e.
> > if I have a copy going I get on the service being disabled:
> 
> That's normal if a NFS server mount is unexported or nfsd shuts down.
> 
> It _should_ (but doesn't always) clear when NFS resumes.

It does clear the "stale NFS file handle" when the service fails over.
But that's not really the issue for me.  My beef is that it seems that
as the "stale NFS handle" will be liable to cause apps on clients to get
upset (will possibly just quit), whereas the hang will suspend client
apps looking at this mount point until the service failsover. Seems
better.

> > Why is there a behaviour disparity? Which is correct?
> 
> They're both correct - and both wrong. :)

But it seems a subtle change to an NFS service setup in the config i.e
the IP containing the NFS export and client vs the IP sitting at the top
level (i.e at the same level as NFS export), results in the NFS behaving
like a hard mount vs a soft mount (even though I'm mounting as hard in
both cases from the clients). 

Maybe I'm confused, just seems pretty unclear as to why the behaviour
should be different. The config fragment you gave behaves for me
properly (IMHO) and the clients hang until service failover (so exactly
like my first case IP ref contains the NFS export etc)

> on server side, restarting the nfslock service is usually sufficent to
> get umount to work (It's safe, clients are told to reacquire their
> locks)
> What's your backend? GFS?
> 
> > But more seriously I can't easily shut the cluster down cleanly when
> > told to by a UPS on power outage. Shutting down the node will be
> unable
> > to be performed cleanly as a resource is open (so will be liable to
> > fencing).
> 
> If the filesystem is GFS, fencing is about the only reliable way of
> leaving the cluster.
> 

The backend I'm trying is ext4 (failing over the mount). I had tried to
manually stop nfsd and nfslock (even though the cluster seems to drop
locks anyway from the output it's writing in the messages file) and that
make no difference sadly, still fails to umount. Even after leaving it
for ages. 

The failure to umount only seems to occur if you are actively performing
a large amount of continuous activity to this NFS export (copying a
large file over when it fails or is stopped). I wonder if this hanging
isn't unexpected with NFS given the "self_fence" option provided in the
fs resources?

Thanks again

Colin

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed.  If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


From emilio at ugr.es  Thu Aug 18 12:01:45 2011
From: emilio at ugr.es (Emilio Arjona)
Date: Thu, 18 Aug 2011 14:01:45 +0200
Subject: [Linux-cluster] Problems after cluster update
Message-ID: <CAEQGue9c-XwpmTy6-xQdBr_mOydj9WHS_fWAi02_V2WD3K9BRQ@mail.gmail.com>

Hello all,

after updating a node from a cluster running REDHAT 5.4 (PAE) I've lost
access to the shared LUN (gfs2). I see the device with 'multipath -v2'
command but pvs, vgs, don't see it anymore. The old 'fiendly name' entry of
the volume in /dev/mapper is missing too, but the mpathXY entry is still
there.

I'm getting this error when using commands related with lvm (pvs, pvscan,
lvs, etc.):

  connect() failed on local socket:
  Internal cluster locking initialisation failed.
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.

Have I to create again the physical volume, the volume group and the logical
volume?

Extra info:
uname in the updated node returns: 2.6.18-274.el5PAE
uname in the not-updated nodes returns: 2.6.18-164.9.1.el5PAE

Thanks in advance.

-- 
*******************************************
Emilio Arjona Heredia
Centro de Ense?anzas Virtuales de la Universidad de Granada
C/ Real de Cartuja 36-38
http://cevug.ugr.es
Tlfno.: 958-241000 ext. 20206
*******************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110818/7a8692a3/attachment.htm>

From ashley at host365.com  Thu Aug 18 12:48:46 2011
From: ashley at host365.com (ashley at host365.com)
Date: 18 Aug 2011 13:48:46 +0100
Subject: [Linux-cluster] =?utf-8?q?Problems_after_cluster_update?=
Message-ID: <20110818124846.30361.qmail@psa101.host365.com>

Hi

I am away until 30/08/11. If you require support, please email support at host365.com or call +44 (0)207 610 9911 in my absence.

Regards
Ashley


From cos at aaaaa.org  Thu Aug 18 13:54:28 2011
From: cos at aaaaa.org (Ofer Inbar)
Date: Thu, 18 Aug 2011 09:54:28 -0400
Subject: [Linux-cluster] rgmanager running, but cluster acts as if it's not
Message-ID: <20110818135427.GO7753@mip.aaaaa.org>

[ cman-2.0.115-34, rgmanager-2.0.52-6, running on 5.5 ]

3-node cluster.  rgmanager is running on all three nodes, but service
won't relocate over to node 3.  clustat doesn't see rgmanager on it.
Run from nodes 1 and 2, clustat shows all three nodes Online but only
nodes 1 and 2 have rgmanager.  Run from node 3, clustat shows all
three Online and no rgmanager.  This is what I'd see if rgamanger were
not running on node3 at all.  And yet:

$ sudo /etc/init.d/rgmanager status
clurgmgrd (pid  2592) is running...
$ echo $?
0
$ ps aux | grep clu
root      2412  0.0  0.0  51920  1788 ?        S<sl 09:10   0:00 modclusterd
root      2592  0.0  0.0  23536  5132 ?        S<Ls 09:11   0:00 clurgmgrd
root      2593  0.0  0.0  23536   500 ?        S<   09:11   0:00 clurgmgrd

Restarting rgmanager succeeds but things are in the same broken state.

Nothing seems wrong in /var/log/messages:
Aug 18 08:54:07 node3 kernel: dlm: Using TCP for communications
Aug 18 08:54:08 node3 kernel: dlm: connecting to 2
Aug 18 08:54:08 node3 kernel: dlm: connecting to 1
Aug 18 08:54:08 node3 kernel: dlm: got connection from 1
Aug 18 08:54:08 node3 kernel: dlm: got connection from 2

However, strace shows process 2593 (the second clurgmgrd) in a nonstop
loop of SIGCHLD, rt_sigaction, rt_sigprocmask, clone, wait4.  That is
not what clurgmgrd processes on the other nodes look like.


Next, I tried fencing the node.  It shut down, rebooted, came back up,
rejoined the cluster, started rgmanager... and is *still* in this same
bad state!

I'm attaching lsof output for both clurgmgrd processes, and a sample
of the strace from the second process.  It looks the same after
fencing as it did before.

Any ideas?
  -- Cos
-------------- next part --------------
Aug 18 08:54:07 node3 kernel: dlm: Using TCP for communications
Aug 18 08:54:08 node3 kernel: dlm: connecting to 2
Aug 18 08:54:08 node3 kernel: dlm: connecting to 1
Aug 18 08:54:08 node3 kernel: dlm: got connection from 1
Aug 18 08:54:08 node3 kernel: dlm: got connection from 2

$ sudo lsof -p 19330
COMMAND     PID USER   FD   TYPE DEVICE    SIZE    NODE NAME
clurgmgrd 19330 root  cwd    DIR  253,0    4096       2 /
clurgmgrd 19330 root  rtd    DIR  253,0    4096       2 /
clurgmgrd 19330 root  txt    REG  253,0  258408 1706604 /usr/sbin/clurgmgrd
clurgmgrd 19330 root  mem    REG  253,0  139416 1882071 /lib64/ld-2.5.so
clurgmgrd 19330 root  mem    REG  253,0 1717800 1882072 /lib64/libc-2.5.so
clurgmgrd 19330 root  mem    REG  253,0  615136 1882073 /lib64/libm-2.5.so
clurgmgrd 19330 root  mem    REG  253,0   23360 1882075 /lib64/libdl-2.5.so
clurgmgrd 19330 root  mem    REG  253,0  145824 1882074 /lib64/libpthread-2.5.so
clurgmgrd 19330 root  mem    REG  253,0   85608 1695743 /usr/lib64/libz.so.1.2.3
clurgmgrd 19330 root  mem    REG  253,0   22136 1706606 /usr/lib64/libcman.so.2.0.115
clurgmgrd 19330 root  mem    REG  253,0 1297104 1706630 /usr/lib64/libxml2.so.2.6.26
clurgmgrd 19330 root  mem    REG  253,0   23576 1706612 /usr/lib64/libdlm.so.2.0.115
clurgmgrd 19330 root  mem    REG  253,0  902744 1706540 /usr/lib64/libslang.so.2.0.6
clurgmgrd 19330 root    0u   CHR    1,3            1275 /dev/null
clurgmgrd 19330 root    1u   CHR    1,3            1275 /dev/null
clurgmgrd 19330 root    2u   CHR    1,3            1275 /dev/null

$ sudo lsof -p 19331
COMMAND     PID USER   FD   TYPE             DEVICE    SIZE     NODE NAME
clurgmgrd 19331 root  cwd    DIR              253,0    4096        2 /
clurgmgrd 19331 root  rtd    DIR              253,0    4096        2 /
clurgmgrd 19331 root  txt    REG              253,0  258408  1706604 /usr/sbin/clurgmgrd
clurgmgrd 19331 root  mem    REG              253,0  139416  1882071 /lib64/ld-2.5.so
clurgmgrd 19331 root  mem    REG              253,0 1717800  1882072 /lib64/libc-2.5.so
clurgmgrd 19331 root  mem    REG              253,0  615136  1882073 /lib64/libm-2.5.so
clurgmgrd 19331 root  mem    REG              253,0   23360  1882075 /lib64/libdl-2.5.so
clurgmgrd 19331 root  mem    REG              253,0  145824  1882074 /lib64/libpthread-2.5.so
clurgmgrd 19331 root  mem    REG              253,0   85608  1695743 /usr/lib64/libz.so.1.2.3
clurgmgrd 19331 root  mem    REG              253,0   22136  1706606 /usr/lib64/libcman.so.2.0.115
clurgmgrd 19331 root  mem    REG              253,0 1297104  1706630 /usr/lib64/libxml2.so.2.6.26
clurgmgrd 19331 root  mem    REG              253,0   23576  1706612 /usr/lib64/libdlm.so.2.0.115
clurgmgrd 19331 root  mem    REG              253,0  902744  1706540 /usr/lib64/libslang.so.2.0.6
clurgmgrd 19331 root    0u   CHR                1,3             1275 /dev/null
clurgmgrd 19331 root    1u   CHR                1,3             1275 /dev/null
clurgmgrd 19331 root    2u   CHR                1,3             1275 /dev/null
clurgmgrd 19331 root    3u  unix 0xffff810174227140         31164635 socket
clurgmgrd 19331 root    4r   CHR                1,5             1277 /dev/zero
clurgmgrd 19331 root    5r  FIFO                0,6         31164637 pipe
clurgmgrd 19331 root    6w  FIFO                0,6         31164637 pipe
clurgmgrd 19331 root    7u   CHR              10,62             6413 /dev/misc/dlm-control
clurgmgrd 19331 root    8u   CHR              10,58         31164700 /dev/misc/dlm_rgmanager

$sudo strace -p 19331
Process 19331 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
rt_sigaction(SIGINT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fff59f75318) = 24090
wait4(24090, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 24090
rt_sigaction(SIGINT, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})               = 0
rt_sigaction(SIGINT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fff59f75318) = 24093
wait4(24093, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 24093
rt_sigaction(SIGINT, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})               = 0
rt_sigaction(SIGINT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fff59f75318) = 24095
wait4(24095, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 24095
rt_sigaction(SIGINT, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})               = 0
rt_sigaction(SIGINT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fff59f75318) = 24097
wait4(24097, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 24097
rt_sigaction(SIGINT, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})               = 0
rt_sigaction(SIGINT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fff59f75318) = 24099
wait4(24099, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 24099
rt_sigaction(SIGINT, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})               = 0
rt_sigaction(SIGINT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fff59f75318) = 24101
wait4(24101, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 24101
rt_sigaction(SIGINT, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})               = 0
rt_sigaction(SIGINT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fff59f75318) = 24104
wait4(24104, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 24104
rt_sigaction(SIGINT, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})               = 0
rt_sigaction(SIGINT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fff59f75318) = 24106
wait4(24106, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 24106
rt_sigaction(SIGINT, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})               = 0
rt_sigaction(SIGINT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fff59f75318) = 24108
wait4(24108, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 24108
rt_sigaction(SIGINT, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})               = 0
rt_sigaction(SIGINT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x301ce302d0}, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fff59f75318) = 24116
wait4(24116, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 24116
rt_sigaction(SIGINT, {0x40c46c, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x301ce302d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV PIPE TERM CHLD STOP RTMIN RT_1], NULL, 8) = 0
nanosleep({1, 0},

From jonathan.barber at gmail.com  Thu Aug 18 15:13:28 2011
From: jonathan.barber at gmail.com (Jonathan Barber)
Date: Thu, 18 Aug 2011 16:13:28 +0100
Subject: [Linux-cluster] EFI in CLVM
In-Reply-To: <CADyt5gkO7v9+ZYuQ7a5-eL2xfZuHF--GyeoK--Zs+9GF+=4_BQ@mail.gmail.com>
References: <CADyt5gmYPPBYa7G-omi4bnftVWtPn6sXY7NzGfbrrNsSRvbxxQ@mail.gmail.com>
	<4E451F0B.4040304@mssl.ucl.ac.uk>
	<CADyt5g=+x0NZ66tngaLWJjf_FkuKd6t3MJS6uSQDWeTtE0udsA@mail.gmail.com>
	<C028F637-6669-4052-8227-F4710498B7FA@gmail.com>
	<CADyt5g=+eRvHQf4YqiW8XLzm4EUA+Anc8bTqtxLmnXPr9BK9OQ@mail.gmail.com>
	<4E45D5D8.5090507@mssl.ucl.ac.uk>
	<CADyt5gkO7v9+ZYuQ7a5-eL2xfZuHF--GyeoK--Zs+9GF+=4_BQ@mail.gmail.com>
Message-ID: <CAPEiEj5w4=0GQo_WR63S2SDk6yYvMtJG+QvWC7bFVn5Chn-66w@mail.gmail.com>

On 13 August 2011 04:24, Paras pradhan <pradhanparas at gmail.com> wrote:
> Alan,
> Its a FC SAN.
> Here is multipath -v2 -ll output and looks good .
> --
> mpath13 (360060e8004770d000000770d000003e9) dm-28 HITACHI,OPEN-V*4
> [size=2.0T][features=1 queue_if_no_path][hwhandler=0][rw]
> \_ round-robin 0 [prio=2][active]
> ?\_ 5:0:1:7 sdt 65:48 [active][ready]
> ?\_ 6:0:1:7 sdu 65:64 [active][ready]
> ---
>
> If I don't make an entire LUN a PV, I think I would then need partitions. Am
> i right? and you think this will reduce the speed penalty?

The (possible) speed penalty with a partition + LVM is because the
blocks in the LVM/filesystem aren't aligned with the blocks in the
storage system. So when you write a block in the the OS, the storage
system has to write to two blocks. You can overcome this by manually
aligning the partitions with the underlying storage.

You can also just not use any partitions/LVM and write the filesystem
directly to the block device... But I'd just stick with using LVM.

If you want to create a LV that uses all of the space on a VG, you can use:
# lvcreate -l 100%FREEVG -n $NAME $VGNAME

Do you see the same problem if you create the LV without CLVMD
running? This thread suggests it's possible to stop clvmd whilst the
cluster is running:
https://www.redhat.com/archives/linux-cluster/2008-November/msg00151.html

If you run "lvcreate -ddddddd -vvv ..." do you see any useful messages?

Cheers

> Thanks
> Paras.
>
>
> On Fri, Aug 12, 2011 at 8:39 PM, Alan Brown <ajb2 at mssl.ucl.ac.uk> wrote:
>>
>> On 12/08/2011 17:24, Paras pradhan wrote:
>>>
>>> Does it mean that I don't need mpath0p1 ? If its the case i don't need to
>>> run kpartx on mpath0?
>>
>> You still need kpartx, but that's a bit clunky anyway. Let dm-multipath
>> take care of all that for you.
>>
>> (The last time I used kpartx and friends was 2003. Dm-multipath and
>> multipathd are much more user-friendly. All you need then is multipath -v2
>> -ll to verify things are where they should be...)
>>
>>> And not having mpath0p1 will take away this device mapper ioctl failed
>>> issue when creating lvcreate?
>>>
>>
>> I think that's a separate issue. What's the underlaying structure? SAN?
>> FC? iscsi? drdb?
>>
>>> I am really confused why this lock has failed , also not sure if this is
>>> related to this >2TB LUN.
>>>
>>
>> It's not. Some of my LUNs are 25+Tb
>>
>
>
>
>>
>> FWIW having PVs on LUN partitions introduces a small but measurable speed
>> penalty over making the entire LUN a PV - this is mostly down to the small
>> offset a partition table adds to the front of the LUN.
>>
>
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Jonathan Barber <jonathan.barber at gmail.com>


From cos at aaaaa.org  Thu Aug 18 15:38:14 2011
From: cos at aaaaa.org (Ofer Inbar)
Date: Thu, 18 Aug 2011 11:38:14 -0400
Subject: [Linux-cluster] rgmanager running,
	but cluster acts as if it's not
In-Reply-To: <20110818135427.GO7753@mip.aaaaa.org>
References: <20110818135427.GO7753@mip.aaaaa.org>
Message-ID: <20110818153814.GJ343@mip.aaaaa.org>

> 3-node cluster.  rgmanager is running on all three nodes, but service
> won't relocate over to node 3.  clustat doesn't see rgmanager on it.
> Run from nodes 1 and 2, clustat shows all three nodes Online but only
> nodes 1 and 2 have rgmanager.  Run from node 3, clustat shows all
> three Online and no rgmanager.  This is what I'd see if rgamanger were
> not running on node3 at all.  And yet:
[...]

After I sent that email - and about an hour after the problme first
began - node2 spontaneously switched to showing rgmanager="0" in its
clustat -x output, even though node2 was where the service was running.

After rebooting node3 another time, its clurgmgrd was no longer in the
SIGCHLD loop I showed before.  Instead, it was blocked on write(7, ...
According to lsof, filehandle 7 was /dev/misc/dlm-control


On #linux-cluster IRC, lon asked what group_tool ls showed...

node1 $sudo group_tool ls
type             level name       id       state
fence            0     default    00010001 JOIN_STOP_WAIT
[1 2 3 3]
dlm              1     rgmanager  00030001 JOIN_ALL_STOPPED
[1 2 3]

node2 $sudo group_tool ls
[sudo] password for oinbar:
type             level name       id       state
fence            0     default    00010001 JOIN_STOP_WAIT
[1 2 3 3]
dlm              1     rgmanager  00030001 JOIN_ALL_STOPPED
[1 2 3]

node3 $sudo group_tool ls
[sudo] password for oinbar:
type             level name       id       state
fence            0     default    00000000 JOIN_STOP_WAIT
[1 2 3]
dlm              1     rgmanager  00000000 JOIN_STOP_WAIT
[1 2 3]

He also asked me to send SIGUSR1 to clurgmgrd and get the contents of
/tmp/rgmanager-dump*, but clurgmgrd did not respond to SIGUSR1 and I
got no dump files.

Also, I updated the cluster.conf to change <rm log_level="6"> to 7.

I started seeing this in /var/log/messages on node3:

Aug 18 10:00:07 node3 rgmanager: [8121]: <notice> Shutting down Cluster Service Manager... 
Aug 18 10:13:31 node3 kernel: dlm: Using TCP for communications
Aug 18 10:13:31 node3 dlm_controld[1857]: process_uevent online@ error -17 errno 2
Aug 18 10:14:05 node3 kernel: dlm: rgmanager: group join failed -512 0
Aug 18 10:14:05 node3 kernel: dlm: Using TCP for communications
Aug 18 10:14:05 node3 dlm_controld[1857]: process_uevent online@ error -17 errno 2
Aug 18 10:14:33 node3 kernel: dlm: rgmanager: group join failed -512 0
Aug 18 10:14:36 node3 dlm_controld[1857]: process_uevent online@ error -17 errno 2
Aug 18 10:14:36 node3 kernel: dlm: Using TCP for communications
Aug 18 10:26:15 node3 rgmanager: [22290]: <notice> Shutting down Cluster Service Manager... 
Aug 18 10:34:48 node3 kernel: dlm: rgmanager: group join failed -512 0

... and this in /var/log/messages on node1:

Aug 18 10:37:48 node1 kernel: INFO: task clurgmgrd:32606 blocked for more than 120 seconds.
Aug 18 10:37:48 node1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 18 10:37:48 node1 kernel: clurgmgrd     D ffff81016ae9abc0     0 32606  32605           633       (NOTLB)
Aug 18 10:37:48 node1 kernel:  ffff810169641de8 0000000000000086 ffff810169641d28 ffff810169641d28
Aug 18 10:37:48 node1 kernel:  0000000000000246 0000000000000008 ffff81006efad820 ffff810168493080
Aug 18 10:37:48 node1 kernel:  0003f21de24fde7f 000000000000f650 ffff81006efada08 000000007eea8300
Aug 18 10:37:48 node1 kernel: Call Trace:
Aug 18 10:37:48 node1 kernel:  [<ffffffff8002cd2c>] mntput_no_expire+0x19/0x89
Aug 18 10:37:48 node1 kernel:  [<ffffffff8000ea75>] link_path_walk+0xa6/0xb2
Aug 18 10:37:48 node1 kernel:  [<ffffffff800656ac>] __down_read+0x7a/0x92
Aug 18 10:37:48 node1 kernel:  [<ffffffff88473380>] :dlm:dlm_clear_proc_locks+0x20/0x1d2
Aug 18 10:37:48 node1 kernel:  [<ffffffff8001adcf>] cp_new_stat+0xe5/0xfd
Aug 18 10:37:48 node1 kernel:  [<ffffffff8847b0a9>] :dlm:device_close+0x55/0x99
Aug 18 10:37:48 node1 kernel:  [<ffffffff80012ac5>] __fput+0xd3/0x1bd
Aug 18 10:37:48 node1 kernel:  [<ffffffff80023bd1>] filp_close+0x5c/0x64
Aug 18 10:37:48 node1 kernel:  [<ffffffff8001dff3>] sys_close+0x88/0xbd
Aug 18 10:37:48 node1 kernel:  [<ffffffff8005e116>] system_call+0x7e/0x83
Aug 18 10:37:48 node1 kernel:


Finally, I rebooted all three cluster nodes at the same time,
After I did that, everything came back up in a good state.
I'm sending this followup in the hopes that someone can use this data
to determine what the bug was.  If you do, please reply.  Thanks!
  -- Cos


From pradhanparas at gmail.com  Thu Aug 18 17:41:38 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Thu, 18 Aug 2011 12:41:38 -0500
Subject: [Linux-cluster] EFI in CLVM
In-Reply-To: <CAPEiEj5w4=0GQo_WR63S2SDk6yYvMtJG+QvWC7bFVn5Chn-66w@mail.gmail.com>
References: <CADyt5gmYPPBYa7G-omi4bnftVWtPn6sXY7NzGfbrrNsSRvbxxQ@mail.gmail.com>
	<4E451F0B.4040304@mssl.ucl.ac.uk>
	<CADyt5g=+x0NZ66tngaLWJjf_FkuKd6t3MJS6uSQDWeTtE0udsA@mail.gmail.com>
	<C028F637-6669-4052-8227-F4710498B7FA@gmail.com>
	<CADyt5g=+eRvHQf4YqiW8XLzm4EUA+Anc8bTqtxLmnXPr9BK9OQ@mail.gmail.com>
	<4E45D5D8.5090507@mssl.ucl.ac.uk>
	<CADyt5gkO7v9+ZYuQ7a5-eL2xfZuHF--GyeoK--Zs+9GF+=4_BQ@mail.gmail.com>
	<CAPEiEj5w4=0GQo_WR63S2SDk6yYvMtJG+QvWC7bFVn5Chn-66w@mail.gmail.com>
Message-ID: <CADyt5gkGU9NReKAx1C5Fyd1UqhWUfO_-z39YzRM1AVk1TWpyBg@mail.gmail.com>

On Thu, Aug 18, 2011 at 10:13 AM, Jonathan Barber
<jonathan.barber at gmail.com> wrote:
>
> On 13 August 2011 04:24, Paras pradhan <pradhanparas at gmail.com> wrote:
> > Alan,
> > Its a FC SAN.
> > Here is multipath -v2 -ll output and looks good .
> > --
> > mpath13 (360060e8004770d000000770d000003e9) dm-28 HITACHI,OPEN-V*4
> > [size=2.0T][features=1 queue_if_no_path][hwhandler=0][rw]
> > \_ round-robin 0 [prio=2][active]
> > ?\_ 5:0:1:7 sdt 65:48 [active][ready]
> > ?\_ 6:0:1:7 sdu 65:64 [active][ready]
> > ---
> >
> > If I don't make an entire LUN a PV, I think I would then need partitions. Am
> > i right? and you think this will reduce the speed penalty?
>
> The (possible) speed penalty with a partition + LVM is because the
> blocks in the LVM/filesystem aren't aligned with the blocks in the
> storage system. So when you write a block in the the OS, the storage
> system has to write to two blocks. You can overcome this by manually
> aligning the partitions with the underlying storage.
>
> You can also just not use any partitions/LVM and write the filesystem
> directly to the block device... But I'd just stick with using LVM.
>


Here is what I have noticed though I should have done few more tests.
iozone o/p with partitions (test size is 100MB)
-
"Output is in Kbytes/sec"
" ?Initial write " ?265074.94
" ? ? ? ?Rewrite " ?909962.61
" ? ? ? ? ? Read " 1872247.78
" ? ? ? ?Re-read " 1905471.81
" ? Reverse Read " 1316265.03
" ? ?Stride read " 1448626.44
" ? ?Random read " 1119532.25
" Mixed workload " ?922532.31
" ? Random write " ?749795.80
--

without partitions:
"Output is in Kbytes/sec"
" ?Initial write " ?376417.97
" ? ? ? ?Rewrite " ?870409.73
" ? ? ? ? ? Read " 1953878.50
" ? ? ? ?Re-read " 1984553.84
" ? Reverse Read " 1353943.00
" ? ?Stride read " 1469878.76
" ? ?Random read " 1432870.66
" Mixed workload " 1328300.78
" ? Random write " ?790309.01
---


>
> If you want to create a LV that uses all of the space on a VG, you can use:
> # lvcreate -l 100%FREEVG -n $NAME $VGNAME
>
> Do you see the same problem if you create the LV without CLVMD
> running? This thread suggests it's possible to stop clvmd whilst the
> cluster is running:
> https://www.redhat.com/archives/linux-cluster/2008-November/msg00151.html
>
> If you run "lvcreate -ddddddd -vvv ..." do you see any useful messages?


I got this locking problem resolved after rebooting all the nodes .
What I have noticed is after adding a LUN, under /dev/mpath instead of
wwid i was seeing as:

lrwxrwxrwx 1 root root 8 Aug 9 17:30 mpath13 -> ../dm-28

After reboot

lrwxrwxrwx 1 root root 7 Aug 15 17:53
360060e8004770d000000770d000003e9 -> ../dm-9

So whats is going not I am not sure. Looks like the issue with
automatic dmsetup?

Thanks
Paras.


>
> Cheers
>
> > Thanks
> > Paras.
> >
> >
> > On Fri, Aug 12, 2011 at 8:39 PM, Alan Brown <ajb2 at mssl.ucl.ac.uk> wrote:
> >>
> >> On 12/08/2011 17:24, Paras pradhan wrote:
> >>>
> >>> Does it mean that I don't need mpath0p1 ? If its the case i don't need to
> >>> run kpartx on mpath0?
> >>
> >> You still need kpartx, but that's a bit clunky anyway. Let dm-multipath
> >> take care of all that for you.
> >>
> >> (The last time I used kpartx and friends was 2003. Dm-multipath and
> >> multipathd are much more user-friendly. All you need then is multipath -v2
> >> -ll to verify things are where they should be...)
> >>
> >>> And not having mpath0p1 will take away this device mapper ioctl failed
> >>> issue when creating lvcreate?
> >>>
> >>
> >> I think that's a separate issue. What's the underlaying structure? SAN?
> >> FC? iscsi? drdb?
> >>
> >>> I am really confused why this lock has failed , also not sure if this is
> >>> related to this >2TB LUN.
> >>>
> >>
> >> It's not. Some of my LUNs are 25+Tb
> >>
> >
> >
> >
> >>
> >> FWIW having PVs on LUN partitions introduces a small but measurable speed
> >> penalty over making the entire LUN a PV - this is mostly down to the small
> >> offset a partition table adds to the front of the LUN.
> >>
> >
> >
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
>
>
> --
> Jonathan Barber <jonathan.barber at gmail.com>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From emilio at ugr.es  Thu Aug 18 18:01:39 2011
From: emilio at ugr.es (Emilio Arjona)
Date: Thu, 18 Aug 2011 20:01:39 +0200
Subject: [Linux-cluster] Problems after cluster update
In-Reply-To: <20110818124846.30361.qmail@psa101.host365.com>
References: <20110818124846.30361.qmail@psa101.host365.com>
Message-ID: <CAEQGue9ndd4H3rUaJJ2e_PnhL2yUgj=cDtL9sN0jNPkxSEU9MA@mail.gmail.com>

Resolved:

clvmd was not running after the update for some reason. I re-installed it as
service again (chkconfig) and everything works now.

Thanks anyway.

2011/8/18 <ashley at host365.com>

> Hi
>
> I am away until 30/08/11. If you require support, please email
> support at host365.com or call +44 (0)207 610 9911 in my absence.
>
> Regards
> Ashley
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
*******************************************
Emilio Arjona Heredia
Centro de Ense?anzas Virtuales de la Universidad de Granada
C/ Real de Cartuja 36-38
http://cevug.ugr.es
Tlfno.: 958-241000 ext. 20206
*******************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110818/9e2bf3b0/attachment.htm>

From cos at aaaaa.org  Thu Aug 18 18:43:17 2011
From: cos at aaaaa.org (Ofer Inbar)
Date: Thu, 18 Aug 2011 14:43:17 -0400
Subject: [Linux-cluster] RHCS resource agent: status interval vs.
	monitor interval
In-Reply-To: <20110809033550.GG7753@mip.aaaaa.org>
References: <20110728213924.GD341@mip.aaaaa.org>
	<20110809033550.GG7753@mip.aaaaa.org>
Message-ID: <20110818184317.GK343@mip.aaaaa.org>

My questions last month were:

1. Why do we have both "monitor" and "status" actions in the meta-data

2. How are the "timeout" and "interval" attributes actually used by rgmanager?

> > I tried to find the answer in:
> >   https://fedorahosted.org/cluster/wiki/ResourceActions
> >   http://www.opencf.org/cgi-bin/viewcvs.cgi/*checkout*/specs/ra/resource-agent-api.txt?rev=1.10

... at the time, those docs didn't help much in answering these questions.

Today, I found lon on #linux-cluster on freenode IRC and asked about
it.  He updated the ResourceActions page on the wiki.  It now contains
a lot more information about this.

BTW, the answer to #1 is that the monitor action is part of the OCF
standard, but rgmanager ignores it and uses the status action instead.
That's now explained on that wiki page.

One thing that isn't yet explained there: the status interval clock
begins after the most recent status action has *completed*, so add
the time taken by a status check to the interval.  Also, rgmanager
checks about every 10 seconds whether any status intervals have expired
and need re-checking, so every interval is effectively rounded up to
the next multiple of 10.

For example, if:
 - status interval = 40
 - status check takes 15 seconds to complete

You'll get a new status check every 60 seconds.
  -- Cos


From forums at clustermagnet.com  Thu Aug 18 21:07:43 2011
From: forums at clustermagnet.com (V B)
Date: Thu, 18 Aug 2011 17:07:43 -0400
Subject: [Linux-cluster] cluster.conf question
Message-ID: <CAAB=VsnJ1w0T7zo-aVmrX5U5jx3-V-9S+ooG2uZfQ-NVwf5CGQ@mail.gmail.com>

Gents,

   Having a problem with an initial cluster deployment.  The idea is to have
4 nodes, connected to a FC lun, running xen... , clvm + ocfs2
   The problem I am experiencing, fencing is not starting up... Also when
/etc/init.d/cman start is issued, most hosts just sit there without giving
back any results.
   I am seeing errors such as  Cluster is not quorate.  Refusing connection.

    (firewalls are open for port 6809udp)

Can you please let me know if you think this cluster.conf is setup
correctly?

THANKS!

My cluster.conf:

    <?xml version="1.0"?>
    <cluster name="WSExen" config_version="2">
        <clusternodes>
                <clusternode name="host1" nodeid="1">
                        <fence>
                                <method name="single">
                                        <device name="manual" ipaddr="host
1`'s ip"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="host2" nodeid="2">
                        <fence>
                                <method name="single">
                                        <device name="manual"
ipaddr="host2's ip"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="host3" nodeid="3">
                        <fence>
                                <method name="single">
                                        <device name="manual"
ipaddr="host3's ip"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="host4" nodeid="4">
                        <fence>
                                <method name="single">
                                        <device name="manual" ipaddr="host
4's ip"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>

        <fence_daemon clean_start="1" post_fail_delay="0"
post_join_delay="3"/>

        <fencedevices>
                <fencedevice name="manual" agent="fence_manual"/>
        </fencedevices>

   <rm>
   </rm>

   <cman port="6809">
   </cman>
</cluster>

Thanks for all your help!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110818/6510a024/attachment.htm>

From forums at clustermagnet.com  Thu Aug 18 21:19:01 2011
From: forums at clustermagnet.com (V B)
Date: Thu, 18 Aug 2011 17:19:01 -0400
Subject: [Linux-cluster] cluster.conf question
In-Reply-To: <CAAB=VsnJ1w0T7zo-aVmrX5U5jx3-V-9S+ooG2uZfQ-NVwf5CGQ@mail.gmail.com>
References: <CAAB=VsnJ1w0T7zo-aVmrX5U5jx3-V-9S+ooG2uZfQ-NVwf5CGQ@mail.gmail.com>
Message-ID: <CAAB=Vsn87egjoOK0q72CV4=dvhWMExQQVSq-Ecm+t3w3HbrRYw@mail.gmail.com>

Guys, im also seeing this:

Why is it seeing 6 nodes?  Thanks

 cman_tool status
Version: 6.2.0
Config Version: 2
Cluster Name: WSExen
Cluster Id: 5456
Cluster Member: Yes
Cluster Generation: 616
Membership state: Cluster-Member
Nodes: 6
Expected votes: 4
Total votes: 4
Quorum: 3
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: nysv0194
Node ID: 3
Multicast addresses: 239.192.21.101
Node addresses: ip

On Thu, Aug 18, 2011 at 5:07 PM, V B <forums at clustermagnet.com> wrote:

> Gents,
>
>    Having a problem with an initial cluster deployment.  The idea is to
> have 4 nodes, connected to a FC lun, running xen... , clvm + ocfs2
>    The problem I am experiencing, fencing is not starting up... Also when
> /etc/init.d/cman start is issued, most hosts just sit there without giving
> back any results.
>    I am seeing errors such as  Cluster is not quorate.  Refusing
> connection.
>     (firewalls are open for port 6809udp)
>
> Can you please let me know if you think this cluster.conf is setup
> correctly?
>
> THANKS!
>
> My cluster.conf:
>
>     <?xml version="1.0"?>
>     <cluster name="WSExen" config_version="2">
>         <clusternodes>
>                 <clusternode name="host1" nodeid="1">
>                         <fence>
>                                 <method name="single">
>                                         <device name="manual" ipaddr="host
> 1`'s ip"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="host2" nodeid="2">
>                         <fence>
>                                 <method name="single">
>                                         <device name="manual"
> ipaddr="host2's ip"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="host3" nodeid="3">
>                         <fence>
>                                 <method name="single">
>                                         <device name="manual"
> ipaddr="host3's ip"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="host4" nodeid="4">
>                         <fence>
>                                 <method name="single">
>                                         <device name="manual" ipaddr="host
> 4's ip"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>
>         <fence_daemon clean_start="1" post_fail_delay="0"
> post_join_delay="3"/>
>
>         <fencedevices>
>                 <fencedevice name="manual" agent="fence_manual"/>
>         </fencedevices>
>
>    <rm>
>    </rm>
>
>    <cman port="6809">
>    </cman>
> </cluster>
>
> Thanks for all your help!
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110818/305f8242/attachment.htm>

From forums at clustermagnet.com  Thu Aug 18 21:57:33 2011
From: forums at clustermagnet.com (V B)
Date: Thu, 18 Aug 2011 17:57:33 -0400
Subject: [Linux-cluster] cluster.conf question
In-Reply-To: <CAAB=Vsn87egjoOK0q72CV4=dvhWMExQQVSq-Ecm+t3w3HbrRYw@mail.gmail.com>
References: <CAAB=VsnJ1w0T7zo-aVmrX5U5jx3-V-9S+ooG2uZfQ-NVwf5CGQ@mail.gmail.com>
	<CAAB=Vsn87egjoOK0q72CV4=dvhWMExQQVSq-Ecm+t3w3HbrRYw@mail.gmail.com>
Message-ID: <CAAB=VsnY6mTdYGyOH2Q_nbi0UythQ-kM64XnF4S3qKkP8+xf_A@mail.gmail.com>

Please disregard,  the issue was definitely firewall related:

[image: Screen shot 2011-08-18 at 5.56.54 PM.png]

On Thu, Aug 18, 2011 at 5:19 PM, V B <forums at clustermagnet.com> wrote:

> Guys, im also seeing this:
>
> Why is it seeing 6 nodes?  Thanks
>
>  cman_tool status
> Version: 6.2.0
> Config Version: 2
> Cluster Name: WSExen
> Cluster Id: 5456
> Cluster Member: Yes
> Cluster Generation: 616
> Membership state: Cluster-Member
> Nodes: 6
> Expected votes: 4
> Total votes: 4
> Quorum: 3
> Active subsystems: 5
> Flags:
> Ports Bound: 0
> Node name: nysv0194
> Node ID: 3
> Multicast addresses: 239.192.21.101
> Node addresses: ip
>
> On Thu, Aug 18, 2011 at 5:07 PM, V B <forums at clustermagnet.com> wrote:
>
>> Gents,
>>
>>    Having a problem with an initial cluster deployment.  The idea is to
>> have 4 nodes, connected to a FC lun, running xen... , clvm + ocfs2
>>    The problem I am experiencing, fencing is not starting up... Also when
>> /etc/init.d/cman start is issued, most hosts just sit there without giving
>> back any results.
>>    I am seeing errors such as  Cluster is not quorate.  Refusing
>> connection.
>>     (firewalls are open for port 6809udp)
>>
>> Can you please let me know if you think this cluster.conf is setup
>> correctly?
>>
>> THANKS!
>>
>> My cluster.conf:
>>
>>     <?xml version="1.0"?>
>>     <cluster name="WSExen" config_version="2">
>>         <clusternodes>
>>                 <clusternode name="host1" nodeid="1">
>>                         <fence>
>>                                 <method name="single">
>>                                         <device name="manual" ipaddr="host
>> 1`'s ip"/>
>>                                 </method>
>>                         </fence>
>>                 </clusternode>
>>                 <clusternode name="host2" nodeid="2">
>>                         <fence>
>>                                 <method name="single">
>>                                         <device name="manual"
>> ipaddr="host2's ip"/>
>>                                 </method>
>>                         </fence>
>>                 </clusternode>
>>                 <clusternode name="host3" nodeid="3">
>>                         <fence>
>>                                 <method name="single">
>>                                         <device name="manual"
>> ipaddr="host3's ip"/>
>>                                 </method>
>>                         </fence>
>>                 </clusternode>
>>                 <clusternode name="host4" nodeid="4">
>>                         <fence>
>>                                 <method name="single">
>>                                         <device name="manual" ipaddr="host
>> 4's ip"/>
>>                                 </method>
>>                         </fence>
>>                 </clusternode>
>>         </clusternodes>
>>
>>         <fence_daemon clean_start="1" post_fail_delay="0"
>> post_join_delay="3"/>
>>
>>         <fencedevices>
>>                 <fencedevice name="manual" agent="fence_manual"/>
>>         </fencedevices>
>>
>>    <rm>
>>    </rm>
>>
>>    <cman port="6809">
>>    </cman>
>> </cluster>
>>
>> Thanks for all your help!
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110818/7981e619/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen shot 2011-08-18 at 5.56.54 PM.png
Type: image/png
Size: 72878 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110818/7981e619/attachment.png>

From linux at alteeve.com  Fri Aug 19 00:03:11 2011
From: linux at alteeve.com (Digimer)
Date: Thu, 18 Aug 2011 20:03:11 -0400
Subject: [Linux-cluster] cluster.conf question
In-Reply-To: <CAAB=VsnJ1w0T7zo-aVmrX5U5jx3-V-9S+ooG2uZfQ-NVwf5CGQ@mail.gmail.com>
References: <CAAB=VsnJ1w0T7zo-aVmrX5U5jx3-V-9S+ooG2uZfQ-NVwf5CGQ@mail.gmail.com>
Message-ID: <4E4DA83F.4020003@alteeve.com>

On 08/18/2011 05:07 PM, V B wrote:
> Gents,  
> 
>    Having a problem with an initial cluster deployment.  The idea is to
> have 4 nodes, connected to a FC lun, running xen... , clvm + ocfs2
>    The problem I am experiencing, fencing is not starting up... Also
> when /etc/init.d/cman start is issued, most hosts just sit there without
> giving back any results.

>                                 <method name="single">
>                                         <device name="manual"
> ipaddr="host 1`'s ip"/>
>                                 </method>

Manual fencing is in no way supported, recommended, useful or even sane.
This is all the more true when you're using a clustered filesystem.

Please see:

https://fedorahosted.org/cluster/wiki/FAQ/Fencing#fence_manual2

Do yourself a favour and setup *real* fencing. Ideally IPMI (or iLO,
RSA, DRAC, etc) plus switched PDU.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"


From ashley at host365.com  Fri Aug 19 00:09:51 2011
From: ashley at host365.com (ashley at host365.com)
Date: 19 Aug 2011 01:09:51 +0100
Subject: [Linux-cluster] =?utf-8?q?cluster=2Econf_question?=
Message-ID: <20110819000951.32609.qmail@psa101.host365.com>

Hi

I am away until 30/08/11. If you require support, please email support at host365.com or call +44 (0)207 610 9911 in my absence.

Regards
Ashley


From jonathan.barber at gmail.com  Fri Aug 19 06:04:40 2011
From: jonathan.barber at gmail.com (Jonathan Barber)
Date: Fri, 19 Aug 2011 07:04:40 +0100
Subject: [Linux-cluster] EFI in CLVM
In-Reply-To: <CADyt5gkGU9NReKAx1C5Fyd1UqhWUfO_-z39YzRM1AVk1TWpyBg@mail.gmail.com>
References: <CADyt5gmYPPBYa7G-omi4bnftVWtPn6sXY7NzGfbrrNsSRvbxxQ@mail.gmail.com>
	<4E451F0B.4040304@mssl.ucl.ac.uk>
	<CADyt5g=+x0NZ66tngaLWJjf_FkuKd6t3MJS6uSQDWeTtE0udsA@mail.gmail.com>
	<C028F637-6669-4052-8227-F4710498B7FA@gmail.com>
	<CADyt5g=+eRvHQf4YqiW8XLzm4EUA+Anc8bTqtxLmnXPr9BK9OQ@mail.gmail.com>
	<4E45D5D8.5090507@mssl.ucl.ac.uk>
	<CADyt5gkO7v9+ZYuQ7a5-eL2xfZuHF--GyeoK--Zs+9GF+=4_BQ@mail.gmail.com>
	<CAPEiEj5w4=0GQo_WR63S2SDk6yYvMtJG+QvWC7bFVn5Chn-66w@mail.gmail.com>
	<CADyt5gkGU9NReKAx1C5Fyd1UqhWUfO_-z39YzRM1AVk1TWpyBg@mail.gmail.com>
Message-ID: <CAPEiEj67_Hrqnt4m5gasDiu3a535mmBMumpkzROEFgOMHUFSaw@mail.gmail.com>

On 18 August 2011 18:41, Paras pradhan <pradhanparas at gmail.com> wrote:
> On Thu, Aug 18, 2011 at 10:13 AM, Jonathan Barber
> <jonathan.barber at gmail.com> wrote:
>>
>> On 13 August 2011 04:24, Paras pradhan <pradhanparas at gmail.com> wrote:
>> > Alan,
>> > Its a FC SAN.

[snip]

>> > If I don't make an entire LUN a PV, I think I would then need partitions. Am
>> > i right? and you think this will reduce the speed penalty?

[snip]

>> You can also just not use any partitions/LVM and write the filesystem
>> directly to the block device... But I'd just stick with using LVM.
>>
>
>
> Here is what I have noticed though I should have done few more tests.
> iozone o/p with partitions (test size is 100MB)
> -
> "Output is in Kbytes/sec"
> " ?Initial write " ?265074.94
> " ? ? ? ?Rewrite " ?909962.61
> " ? ? ? ? ? Read " 1872247.78
> " ? ? ? ?Re-read " 1905471.81
> " ? Reverse Read " 1316265.03
> " ? ?Stride read " 1448626.44
> " ? ?Random read " 1119532.25
> " Mixed workload " ?922532.31
> " ? Random write " ?749795.80
> --
>
> without partitions:
> "Output is in Kbytes/sec"
> " ?Initial write " ?376417.97
> " ? ? ? ?Rewrite " ?870409.73
> " ? ? ? ? ? Read " 1953878.50
> " ? ? ? ?Re-read " 1984553.84
> " ? Reverse Read " 1353943.00
> " ? ?Stride read " 1469878.76
> " ? ?Random read " 1432870.66
> " Mixed workload " 1328300.78
> " ? Random write " ?790309.01
> ---

I'm not very familiar with iozone, but if you're only reading /
writing 100M, then probably all you're measuring is the speed of the
linux buffer cache. You should increase the amount of data to greater
than the RAM available to the system. Also, you should repeat these
runs multiple times and at a minimum take an average (and calculate
the standard deviation) of each metric to make sure you aren't getting
unusually good/bad performance. You can then compare the results using
a paired T-test to see if the difference is statistically significant.

[snip]

> I got this locking problem resolved after rebooting all the nodes .

That sounds like the problem encountered in the link I sent before.

> What I have noticed is after adding a LUN, under /dev/mpath instead of
> wwid i was seeing as:
>
> lrwxrwxrwx 1 root root 8 Aug 9 17:30 mpath13 -> ../dm-28
>
> After reboot
>
> lrwxrwxrwx 1 root root 7 Aug 15 17:53
> 360060e8004770d000000770d000003e9 -> ../dm-9

That's odd. Did you change your multipath configuration? It looks like
you've set "user_friendly_names" to "no".

> Thanks
> Paras.
-- 
Jonathan Barber <jonathan.barber at gmail.com>


From pradhanparas at gmail.com  Fri Aug 19 14:56:07 2011
From: pradhanparas at gmail.com (Paras pradhan)
Date: Fri, 19 Aug 2011 09:56:07 -0500
Subject: [Linux-cluster] EFI in CLVM
In-Reply-To: <CAPEiEj67_Hrqnt4m5gasDiu3a535mmBMumpkzROEFgOMHUFSaw@mail.gmail.com>
References: <CADyt5gmYPPBYa7G-omi4bnftVWtPn6sXY7NzGfbrrNsSRvbxxQ@mail.gmail.com>
	<4E451F0B.4040304@mssl.ucl.ac.uk>
	<CADyt5g=+x0NZ66tngaLWJjf_FkuKd6t3MJS6uSQDWeTtE0udsA@mail.gmail.com>
	<C028F637-6669-4052-8227-F4710498B7FA@gmail.com>
	<CADyt5g=+eRvHQf4YqiW8XLzm4EUA+Anc8bTqtxLmnXPr9BK9OQ@mail.gmail.com>
	<4E45D5D8.5090507@mssl.ucl.ac.uk>
	<CADyt5gkO7v9+ZYuQ7a5-eL2xfZuHF--GyeoK--Zs+9GF+=4_BQ@mail.gmail.com>
	<CAPEiEj5w4=0GQo_WR63S2SDk6yYvMtJG+QvWC7bFVn5Chn-66w@mail.gmail.com>
	<CADyt5gkGU9NReKAx1C5Fyd1UqhWUfO_-z39YzRM1AVk1TWpyBg@mail.gmail.com>
	<CAPEiEj67_Hrqnt4m5gasDiu3a535mmBMumpkzROEFgOMHUFSaw@mail.gmail.com>
Message-ID: <CADyt5gm_zPF1g8LJpf9Krh-i81Mq3SOmTk4CW6bek91r1rb_6A@mail.gmail.com>

On Fri, Aug 19, 2011 at 1:04 AM, Jonathan Barber
<jonathan.barber at gmail.com> wrote:
> On 18 August 2011 18:41, Paras pradhan <pradhanparas at gmail.com> wrote:
>> On Thu, Aug 18, 2011 at 10:13 AM, Jonathan Barber
>> <jonathan.barber at gmail.com> wrote:
>>>
>>> On 13 August 2011 04:24, Paras pradhan <pradhanparas at gmail.com> wrote:
>>> > Alan,
>>> > Its a FC SAN.
>
> [snip]
>
>>> > If I don't make an entire LUN a PV, I think I would then need partitions. Am
>>> > i right? and you think this will reduce the speed penalty?
>
> [snip]
>
>>> You can also just not use any partitions/LVM and write the filesystem
>>> directly to the block device... But I'd just stick with using LVM.
>>>
>>
>>
>> Here is what I have noticed though I should have done few more tests.
>> iozone o/p with partitions (test size is 100MB)
>> -
>> "Output is in Kbytes/sec"
>> " ?Initial write " ?265074.94
>> " ? ? ? ?Rewrite " ?909962.61
>> " ? ? ? ? ? Read " 1872247.78
>> " ? ? ? ?Re-read " 1905471.81
>> " ? Reverse Read " 1316265.03
>> " ? ?Stride read " 1448626.44
>> " ? ?Random read " 1119532.25
>> " Mixed workload " ?922532.31
>> " ? Random write " ?749795.80
>> --
>>
>> without partitions:
>> "Output is in Kbytes/sec"
>> " ?Initial write " ?376417.97
>> " ? ? ? ?Rewrite " ?870409.73
>> " ? ? ? ? ? Read " 1953878.50
>> " ? ? ? ?Re-read " 1984553.84
>> " ? Reverse Read " 1353943.00
>> " ? ?Stride read " 1469878.76
>> " ? ?Random read " 1432870.66
>> " Mixed workload " 1328300.78
>> " ? Random write " ?790309.01
>> ---
>
> I'm not very familiar with iozone, but if you're only reading /
> writing 100M, then probably all you're measuring is the speed of the
> linux buffer cache. You should increase the amount of data to greater
> than the RAM available to the system. Also, you should repeat these
> runs multiple times and at a minimum take an average (and calculate
> the standard deviation) of each metric to make sure you aren't getting
> unusually good/bad performance. You can then compare the results using
> a paired T-test to see if the difference is statistically significant.
>
> [snip]
>
>> I got this locking problem resolved after rebooting all the nodes .
>
> That sounds like the problem encountered in the link I sent before.
>
>> What I have noticed is after adding a LUN, under /dev/mpath instead of
>> wwid i was seeing as:
>>
>> lrwxrwxrwx 1 root root 8 Aug 9 17:30 mpath13 -> ../dm-28
>>
>> After reboot
>>
>> lrwxrwxrwx 1 root root 7 Aug 15 17:53
>> 360060e8004770d000000770d000003e9 -> ../dm-9
>
> That's odd. Did you change your multipath configuration? It looks like
> you've set "user_friendly_names" to "no".


No. I have "yes" to user_friendly_names . I hav't change anything to
multipath.conf however I can see user friendly names in multipath -ll

-
mpath13 (360060e8004770d000000770d000003e9) dm-9 HITACHI,OPEN-V*4
[size=2.0T][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=2][active]
 \_ 5:0:1:7 sdl 8:176 [active][ready]
 \_ 6:0:1:7 sdu 65:64 [active][ready]
-

Paras.


>
>> Thanks
>> Paras.
> --
> Jonathan Barber <jonathan.barber at gmail.com>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


From rossnick-lists at cybercat.ca  Fri Aug 19 19:48:30 2011
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Fri, 19 Aug 2011 15:48:30 -0400
Subject: [Linux-cluster] Network switch problem
Message-ID: <2AF90FC418974F20BA16D16E8B85831B@versa>

Hi !

We have a cluster of 8 nodes that are splited among 2 gigabit 24 ports 
network switch. Port one on each server is used for services, and port 2 for 
the "totem-ring" or cluster communications.

The servers are splited 4 on each switch, with each port configured to the 
proper vlan. We have a vlan trunk between the switchs.

I need to reboot one or both switch, without interupting the cluster 
services. In the past (i.e. before there were critical services), I did 
rebooted a switch and the cluster lost quorum and all services stoped and 
restarted as the quorum got back. I can live with a minute or so without 
services as the switch reboot, but not 5 or 10 while the services stops and 
starts.

Now, to reboot the switch, I plan on adding a 3rd temporary switch just for 
the cluster vlan, and connect, one by one, the network interfaces to that 
switch.

So, if I disconnect a the cluster network interface on a node, will that 
node immediatly be fenced or I have some time, let's say 10 seconds, to 
complete the reconnect ?

I also see that each node has a tcp connection to the other nodes. So, will 
the disconnect / reconnect sever complety that connection or will it be 
retried ?

Thanks for any insights. 


From swap_project at yahoo.com  Sat Aug 20 23:35:58 2011
From: swap_project at yahoo.com (Srija)
Date: Sat, 20 Aug 2011 16:35:58 -0700 (PDT)
Subject: [Linux-cluster] Guest is not relocating under cluster
In-Reply-To: <2AF90FC418974F20BA16D16E8B85831B@versa>
References: <2AF90FC418974F20BA16D16E8B85831B@versa>
Message-ID: <1313883358.19407.YahooMailNeo@web112815.mail.gq1.yahoo.com>

Hi 

I have six node test cluster, running on rhel5.7 86_64 bit OS. 

The nodes are under the xen environment. Trying to relocate the guest if the node fails
where the guest is running. But the guest is not relocating, it is getting stopped.

The version of cman and rgmanger are :

cman-2.0.115-85.el5
rgmanager-2.0.52-21.el5


Here is the cluster.conf
--------------------------------------

<?xml version="1.0"?>
<cluster alias="newtest" config_version="26" name="newtest">
??????? <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
??????? <clusternodes>
??????????????? <clusternode name="node1" nodeid="1" votes="1">
??????????????????????? <fence>
??????????????????????????????? <method name="1">
??????????????????????????????????????? <device action="reboot" name="ilo-node1"/>
??????????????????????????????? </method>
??????????????????????? </fence>
??????????????? </clusternode>
........
<snip>
??????? </clusternodes>
<cman>
??? <multicast addr="xxx.1.5.1"/>
</cman>
<totem token="20000"/>
??????? <fencedevices>
??????????????? <fencedevice agent="fence_ilo" hostname="node1r" login="Admin" name="ilo-node1" passwd="xxxxx"/>
........
<snip>
? </fencedevices>
??????? <rm log_level="7" log_facility="local4">
??????????????? <failoverdomains>
?????????????????? <failoverdomain name="nd1-nd2-nd3-nd4-nd5-nd6" nofailback="1" ordered="1" restricted="1">
??????????????????????? <failoverdomainnode name="node1" priority="1"/>
??????????????????????? <failoverdomainnode name="node2" priority="2"/>
??????????????????????? <failoverdomainnode name="node3" priority="3"/>
??????????????????????? <failoverdomainnode name="node4" priority="4"/>
??????????????????????? <failoverdomainnode name="node5" priority="5"/>
??????????????????????? <failoverdomainnode name="node6" priority="6"/>
??????????????? </failoverdomain>
??????????????? </failoverdomains>
??????????????? <resources/>
??????????????? <vm autostart="1" name="guest1" migrate="live" recovery="relocate"/>
??????? </rm>
??????? <cman/>
</cluster>

Here are??few lines? from the log file..
--------------------------------------------------------------

Aug 20 18:51:09 node clurgmgrd[7431]: <debug> Event: Port Opened 
Aug 20 18:51:09 node clurgmgrd[7431]: <info> State change: node3 UP 
Aug 20 18:51:14 node clurgmgrd[7431]: <debug> Evaluating RG vm:guest1, state stopped, owner none 
Aug 20 18:51:14 node clurgmgrd[7431]: <debug> Event (0:3:1) Processed 
Aug 20 18:51:19 node clurgmgrd[7431]: <debug> 1 events processed 
Aug 20 18:51:35 node clurgmgrd[7431]: <debug> No other nodes have seen vm:guest1 
Aug 20 18:51:35 node clurgmgrd[7431]: <notice> Starting stopped service vm:guest1 
Aug 20 18:51:36 node clurgmgrd: [7431]: <debug> virsh -c xen:/// start guest1 
Aug 20 18:51:37 node clurgmgrd[7431]: <notice> start on vm "guest1" returned 1 (generic error) 
Aug 20 18:51:37 node clurgmgrd[7431]: <warning> #68: Failed to start vm:guest1; return value: 1 
Aug 20 18:51:37 node clurgmgrd[7431]: <debug> Stopping failed service vm:guest1 
Aug 20 18:51:37 node clurgmgrd[7431]: <notice> Stopping service vm:guest1 
Aug 20 18:51:37 node clurgmgrd: [7431]: <debug> Virtual machine guest1 is? 
Aug 20 18:51:38 node clurgmgrd[7431]: <notice> Service vm:guest1 is recovering 
Aug 20 18:51:38 node clurgmgrd[7431]: <warning> #71: Relocating failed service vm:guest1 
Aug 20 18:51:38 node clurgmgrd[7431]: <debug> Sent remote-start request to 6 
Aug 20 18:51:49 node clurgmgrd[7431]: <debug> 4 events processed 

Any advice is really appreciated.

Thanks in advance.


From fdinitto at redhat.com  Sun Aug 21 05:15:38 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Sun, 21 Aug 2011 07:15:38 +0200
Subject: [Linux-cluster] Network switch problem
In-Reply-To: <2AF90FC418974F20BA16D16E8B85831B@versa>
References: <2AF90FC418974F20BA16D16E8B85831B@versa>
Message-ID: <4E50947A.3040909@redhat.com>

Hi Nicolas,

On 08/19/2011 09:48 PM, Nicolas Ross wrote:
> Hi !
> 
> We have a cluster of 8 nodes that are splited among 2 gigabit 24 ports
> network switch. Port one on each server is used for services, and port 2
> for the "totem-ring" or cluster communications.
> 
> The servers are splited 4 on each switch, with each port configured to
> the proper vlan. We have a vlan trunk between the switchs.
> 
> I need to reboot one or both switch, without interupting the cluster
> services. In the past (i.e. before there were critical services), I did
> rebooted a switch and the cluster lost quorum and all services stoped
> and restarted as the quorum got back. I can live with a minute or so
> without services as the switch reboot, but not 5 or 10 while the
> services stops and starts.
> 
> Now, to reboot the switch, I plan on adding a 3rd temporary switch just
> for the cluster vlan, and connect, one by one, the network interfaces to
> that switch.
> 
> So, if I disconnect a the cluster network interface on a node, will that
> node immediatly be fenced or I have some time, let's say 10 seconds, to
> complete the reconnect ?
> 
> I also see that each node has a tcp connection to the other nodes. So,
> will the disconnect / reconnect sever complety that connection or will
> it be retried ?
> 
> Thanks for any insights.

Assuming you have the option to add a 3rd switch (or even a 4th one) and
the availability one/teo extra network card(s) on each server, you can
use a slightly different setup that would allow you to reboot the all
switches without any service interruption.

What most people do is:

serverX -> eth0 -> switch0
        -> eth1 -> switch1
        -> eth2 -> switch2
        -> eth3 -> switch3

eth0 and eth1 are configured in bonding (IIRC bond 1 is the only
supported mode for cluster heartbeat but check the KB on redhat website)
and that's where you allow cluster heartbeat traffic.

eth2 and eth3 are also configured in bonding, but you have a greater
freedom of mode (load-balancing for example to increase bandwith to 2x)
for services.

switch0 and switch1 / switch2 and switch3 would be configured in
trunking like you have now.

With such setup, you can have up to two switches offline at the same
time, as long as they are not on the same bond/trunk.

A soon-to-be-supported technology in RHEL6 is Redundant Ring, that
allows you to use two separated LAN to perform cluster heartbeats (one
primary, one backup).

Fabio


From ashley at host365.com  Sun Aug 21 05:22:26 2011
From: ashley at host365.com (ashley at host365.com)
Date: 21 Aug 2011 06:22:26 +0100
Subject: [Linux-cluster] =?utf-8?q?Network_switch_problem?=
Message-ID: <20110821052226.26154.qmail@psa101.host365.com>

Hi

I am away until 30/08/11. If you require support, please email support at host365.com or call +44 (0)207 610 9911 in my absence.

Regards
Ashley


From ajb2 at mssl.ucl.ac.uk  Mon Aug 22 12:52:11 2011
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Mon, 22 Aug 2011 13:52:11 +0100
Subject: [Linux-cluster] Options other than reboot to stop DP processes
 thatcan't be killed -9
In-Reply-To: <1313399795.27379.17.camel@bhac.iouk.ioroot.tld>
References: <CABTxP=6iW+pJVT-kYEa9iq_2DHfcoiHog6JEjBcYAq3hcgj=Bg@mail.gmail.com>
	<1313399795.27379.17.camel@bhac.iouk.ioroot.tld>
Message-ID: <4E5250FB.40300@mssl.ucl.ac.uk>

Colin Simpson wrote:
> Probably not a cluster issue just pure kernel question.  Sounds like the
> driver or device is locked up and the driver or device is confused, so
> the processes attached to it will be hung. 

A common problem in a fabric environment is that there are 2+ paths to 
the tapes (ie, 2 HBAs on the server) and commands may take either path 
(drives get confused by this). Sending an unlock/reset command via the 
other path is usually sufficient to recover but it's an extremely poorly 
documented area.

The most common case of this is tapes which refuse to eject - lock 
commands are per source and ORed, so unlock commands have to come from 
the same HBA(s) which issued the lock. I've added scripts to my bacula 
tape handling routines to ensure this happens on our setup.

> To be honest I've had similar problems on pretty much all Unixes for
> many years. And I've never found a good way out of it. Maybe not an
> option with your case and application, but I guess why most people have
> their backup systems running on separate dedicated boxes so it can be
> rebooted without affecting production systems.

Strongly agree. There are a number of other good reasons for running 
dedicated backup systems, not least of which is the double-barrel 
difficulty of bootstrapping a restore of the backup system itself AND 
the dead cluster box in a worst case scenario (It's a lot easier with 
separate boxes as in most cases only one gets trashed and you can reduce 
risk further by physically separating backups from operational servers.

A second good reason is the amount of IO a good tape backup solution can 
generate - LTO tapes easily outrun spinning media, so a spooling setup 
is needed to avoid shoeshine issues.

All this stuff is best discussed on a list dedicated to backups. 
Discussions of this kind show up regularly and there are a number of 
canned answers at hand.

AB


From ashley at host365.com  Mon Aug 22 13:01:24 2011
From: ashley at host365.com (ashley at host365.com)
Date: 22 Aug 2011 14:01:24 +0100
Subject: [Linux-cluster]
	=?utf-8?q?Options_other_than_reboot_to_stop_DP_pr?=
	=?utf-8?q?ocesses_thatcan=27t_be_killed_-9?=
Message-ID: <20110822130124.14192.qmail@psa101.host365.com>

Hi

I am away until 30/08/11. If you require support, please email support at host365.com or call +44 (0)207 610 9911 in my absence.

Regards
Ashley


From fdinitto at redhat.com  Wed Aug 24 14:41:43 2011
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 24 Aug 2011 16:41:43 +0200
Subject: [Linux-cluster] cluster 3.1.6 release
Message-ID: <4E550DA7.60204@redhat.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Welcome to the cluster 3.1.6 release.

The new source tarball can be downloaded here:

https://fedorahosted.org/releases/c/l/cluster/cluster-3.1.6.tar.xz

ChangeLog:

https://fedorahosted.org/releases/c/l/cluster/Changelog-3.1.6

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Thanks/congratulations to all people that contributed to achieve this
great milestone.

Happy clustering,
Fabio
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBCAAGBQJOVQ2lAAoJEFA6oBJjVJ+OP2wQALbBTaZiZy6ScOwmCqReHycZ
z9vkYvwKA0hgzA8urDb8cj6RhdvkRojmFRjKGTA3pm9PLwutDcn8+ujxwLxHuoAs
VVKdgWiIjoIiav+6XfihZh73xAMnmsAfK8mxMg6G2fO0ts/1qsyuio+itE8e95SU
gSeg9OllWircvgeaDN20DH6cdEegzvWguyzUqG4nOI2HkUeFiT7n9uYnEQPPyI7L
IORR8jInUSM2k7TZiA3NfrHn5GstMAKrzFmfY4D3gu2D5TiqIXKJE68SUeTBNQFl
0UlJq9PG/go6Ws7thftuCA4l41G/JXmvseMt559TQHs7zbbPTqUOtBhJamVQ0lWy
/+XwnRlI1yqAlUxmS/NDjgO0PUfQl53N99Ss0v2N3OYjRE5d+/CexIhTPycZ8uqE
EOTtcb9cMikjzZME6bf/XRmoOYkYgP5YWds/NCSvKgWjAXTjp2/C45uQS6CBEI16
hHvCb7BrgssvFcWogrjT+OPyqtP1KCbWLzKy1ozQ+xwnX9AbLU4z2S1VDPwHbdVT
vwXZEx+JdzsRs0rG+G3wxUDNsH2bH+VUkm2EAqtNg6sDvXsqn3YiMcGLAApkYcAl
1pQJd6/6/kLHA3e0gOkCrHeVwcmhC06GC0bQkITIJ6E2Qa011eKhbrEgh9nKnncW
0EY2b9u7bO96+GKHsIlt
=U7gT
-----END PGP SIGNATURE-----


From ashley at host365.com  Wed Aug 24 14:48:35 2011
From: ashley at host365.com (ashley at host365.com)
Date: 24 Aug 2011 15:48:35 +0100
Subject: [Linux-cluster] =?utf-8?q?cluster_3=2E1=2E6_release?=
Message-ID: <20110824144835.16250.qmail@psa101.host365.com>

Hi

I am away until 30/08/11. If you require support, please email support at host365.com or call +44 (0)207 610 9911 in my absence.

Regards
Ashley


From mmorgan at dca.net  Thu Aug 25 21:45:29 2011
From: mmorgan at dca.net (Michael Morgan)
Date: Thu, 25 Aug 2011 17:45:29 -0400
Subject: [Linux-cluster] "Invalid resource" starting KVM guest with clusvcadm
Message-ID: <20110825214529.GF7305@staff.dca.net>

Hello,

 I have a 2 node KVM cluster under Scientific Linux 6.1. Starting guests
works fine through virsh, virt-manager, and even rg_test. When I try to
use clusvcadm however:

[root at node1 ~]# clusvcadm -e vm:test
Local machine trying to enable vm:test...Invalid operation for resource

 The resource XML follows:

<vm autostart="0" domain="node1_primary" max_restarts="2" migration_mapping="node1:192.168.20.1,node2:192.168.20.2" name="test" path="/mnt/shared/xml" recovery="restart" restart_expire_time="600"/>

 SELinux isn't logging any denies but even with it fully disabled I get
the same behavior. With cluster debug logging enabled the only thing out
of the ordinary is from rgmanager:

Aug 25 17:09:12 rgmanager No other nodes have seen vm:test

 The odd thing is I have a very similar cluster which does not have the
same problem. I can duplicate this exact guest there and start it up
with clusvcadm. Cluster and libvirtd configs are essentially the same
between these clusters with the exception of hostnames. Even the rpm -qa
output matches. 

 Any suggestions would be appreciated as I am quickly losing my sanity.
I'm happy to provide any config or specific versions if necessary. Thank
you!

-Mike

--
Michael Morgan
mmorgan at dca.net


From ashley at host365.com  Thu Aug 25 21:54:30 2011
From: ashley at host365.com (ashley at host365.com)
Date: 25 Aug 2011 22:54:30 +0100
Subject: [Linux-cluster]
	=?utf-8?q?=22Invalid_resource=22_starting_KVM_gue?=
	=?utf-8?q?st_with_clusvcadm?=
Message-ID: <20110825215430.21551.qmail@psa101.host365.com>

Hi

I am away until 30/08/11. If you require support, please email support at host365.com or call +44 (0)207 610 9911 in my absence.

Regards
Ashley


From sunhux at gmail.com  Sat Aug 27 06:27:14 2011
From: sunhux at gmail.com (sunhux G)
Date: Sat, 27 Aug 2011 14:27:14 +0800
Subject: [Linux-cluster] Options other than reboot to stop DP processes
 thatcan't be killed -9
In-Reply-To: <4E5250FB.40300@mssl.ucl.ac.uk>
References: <CABTxP=6iW+pJVT-kYEa9iq_2DHfcoiHog6JEjBcYAq3hcgj=Bg@mail.gmail.com>
	<1313399795.27379.17.camel@bhac.iouk.ioroot.tld>
	<4E5250FB.40300@mssl.ucl.ac.uk>
Message-ID: <CABTxP=6r+50Me62G-pz9YDcos_QMVN20LnWNZULgpj+Q_CxruA@mail.gmail.com>

Hi Alan / anyone,


>  best discussed on a list dedicated to backups
> so a spooling setup is needed to avoid shoeshine

Appreciate if you can point me to good active lists for DataProtector &
NetBackup.  Need to explore more.

I've seen a German corporate using numerous low-cost NAS (as they
cost a lot less than one large tape library with four drives) to do image
(ie system/boot disks) & data backups to the NAS.

They rotate the NAS' SATA disks offsite : critics have it disks may
crash & there's lack of security encryption but a SATA disk can cost
less than LTO tapes for each GB of storage


Sun


From ashley at host365.com  Sat Aug 27 06:36:30 2011
From: ashley at host365.com (ashley at host365.com)
Date: 27 Aug 2011 07:36:30 +0100
Subject: [Linux-cluster]
	=?utf-8?q?Options_other_than_reboot_to_stop_DP_pr?=
	=?utf-8?q?ocesses_thatcan=27t_be_killed_-9?=
Message-ID: <20110827063630.32677.qmail@psa101.host365.com>

Hi

I am away until 30/08/11. If you require support, please email support at host365.com or call +44 (0)207 610 9911 in my absence.

Regards
Ashley


From yamato at redhat.com  Mon Aug 29 08:11:17 2011
From: yamato at redhat.com (Masatake YAMATO)
Date: Mon, 29 Aug 2011 17:11:17 +0900 (JST)
Subject: [Linux-cluster] [patch] maybe a typo in comment in vf.h of rgmanager
In-Reply-To: <20110822130124.14192.qmail@psa101.host365.com>
References: <20110822130124.14192.qmail@psa101.host365.com>
Message-ID: <20110829.171117.624448666359611655.yamato@redhat.com>

I cannot find vf_handle_msg in the source tree.
I guess it should be vf_process_msg.
Could install this to the official source tree if my guessing is correct.

Signed-off-by: Masatake YAMATO <yamato at redhat.com>

diff -ruN vf.h.orig vf.h
--- vf.h.orig	2011-08-29 17:05:40.448808854 +0900
+++ vf.h	2011-08-29 17:05:58.717240739 +0900
@@ -128,7 +128,7 @@
 #define VF_COORD_TIMEOUT	60	/* 60 seconds MAX timeout */
 #define VF_COMMIT_TIMEOUT_MIN	(2 * VF_COORD_TIMEOUT)
 
-/* Return codes for vf_handle_msg... */
+/* Return codes for vf_process_msg... */
 #define VFR_ERROR	100
 #define VFR_TIMEOUT	101
 #define VFR_OK		0


From ashley at host365.com  Mon Aug 29 08:18:57 2011
From: ashley at host365.com (ashley at host365.com)
Date: 29 Aug 2011 09:18:57 +0100
Subject: [Linux-cluster]
	=?utf-8?q?=5Bpatch=5D_maybe_a_typo_in_comment_in_?=
	=?utf-8?q?vf=2Eh_of_rgmanager?=
Message-ID: <20110829081857.7219.qmail@psa101.host365.com>

Hi

I am away until 30/08/11. If you require support, please email support at host365.com or call +44 (0)207 610 9911 in my absence.

Regards
Ashley


From brunato at sissa.it  Tue Aug 30 10:51:36 2011
From: brunato at sissa.it (Davide Brunato)
Date: Tue, 30 Aug 2011 12:51:36 +0200
Subject: [Linux-cluster] Backup of a GFS2 volume
Message-ID: <4E5CC0B8.8010209@sissa.it>

Hello,

I've a Red Hat 5.7 2-node cluster for electronic mail services where the mailboxes (maildir format)
are stored on GFS2 volume. The volume contains about 7500000 files for ~740 GB of disk space
occupation. Previously the mailboxes were on a GFS1 volume, and I migrated to GFS2 when we changed
the SAN storage system.

Due to incremental backups that have become extremely slow (about 41-42H) after the migration from
GFS to GFS2, I checked the configuration/tuning of the cluster and volume mount options, with the
help of Red Hat support, but the optimizations (<gfs_controld plock_rate_limit="0"/>, mount with
noatime and nodiratime) don't have significantly accelerated the incremental backups.

So I tried another backup strategy, using the snaphot feature of our SAN storage system, doing
backups outside the cluster environment. I use the snapshots of the GSF2 on another server (also
with RHEL 5.7) mounting the volume as a local (not clustered) filesystem:

/var/mailboxes type gfs2 (rw,noatime,nodiratime,lockproto=lock_nolock,localflocks,localcaching)

The duration of full backups are slightly better (from 24-25H to 21-22H of duration) and the
incremental backup are "acceptable" (about 9H). But the speed is still low in comparison to backups
of Ext3 filesystems, particularly for incremental backups.

I've notice that the glocks are still used, also when I mount a snapshot of the mailbox GFS2
partition as a local filesystem:


# mount -t gfs2 /dev/mapper/posta_mbox_disk_vg-posta_mbox_disk_lvol1 /var/mailboxes -o
lockproto=lock_nolock,noatime,nodiratime
# time cp -Rp /var/mailboxes/prova* /var/tmp/test/

real	2m5.648s
user	0m0.311s
sys	0m13.243s
# rm -Rf /var/tmp/test/*
# time cp -Rp /var/mailboxes/prova* /var/tmp/test/

real	0m10.946s
user	0m0.254s
sys	0m10.634s

# cat /proc/slabinfo | grep gloc
gfs2_glock         35056  35064    424    9    1 : tunables   54   27    8 : slabdata   3896   3896
     0

Is there a way to exclude the use of the glocks, or them are necessary to access the partition, even
if mounted as local filesystem?

Thanks

Davide Brunato

-- 
______________________________________________
Davide Brunato
Sistema Informatico SISSA: http://sis.sissa.it
via Bonomea 265 - 34136 Trieste - Italy
tel: +39-040-3787538  e-mail: brunato at sissa.it


From swhiteho at redhat.com  Tue Aug 30 11:09:23 2011
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 30 Aug 2011 12:09:23 +0100
Subject: [Linux-cluster] Backup of a GFS2 volume
In-Reply-To: <4E5CC0B8.8010209@sissa.it>
References: <4E5CC0B8.8010209@sissa.it>
Message-ID: <1314702563.2694.13.camel@menhir>

Hi,

On Tue, 2011-08-30 at 12:51 +0200, Davide Brunato wrote:
> Hello,
> 
> I've a Red Hat 5.7 2-node cluster for electronic mail services where the mailboxes (maildir format)
> are stored on GFS2 volume. The volume contains about 7500000 files for ~740 GB of disk space
> occupation. Previously the mailboxes were on a GFS1 volume, and I migrated to GFS2 when we changed
> the SAN storage system.
> 
> Due to incremental backups that have become extremely slow (about 41-42H) after the migration from
> GFS to GFS2, I checked the configuration/tuning of the cluster and volume mount options, with the
> help of Red Hat support, but the optimizations (<gfs_controld plock_rate_limit="0"/>, mount with
> noatime and nodiratime) don't have significantly accelerated the incremental backups.
> 
You don't mention how fast the backups were before...

The issue is most likely just that GFS2 caches more data (on average)
than GFS does. If you access that data from the node where that data is
cached, then its faster, if you try to access that same data from
another node then it will be slower.

The issue therefore is ensuring that you divide your backup amoung nodes
in such a way as the backup will mostly be working only with the working
set of files on that node.

Either that, or as you've mentioned below, use your array's snapshot
capability to avoid this issue.


> So I tried another backup strategy, using the snaphot feature of our SAN storage system, doing
> backups outside the cluster environment. I use the snapshots of the GSF2 on another server (also
> with RHEL 5.7) mounting the volume as a local (not clustered) filesystem:
> 
> /var/mailboxes type gfs2 (rw,noatime,nodiratime,lockproto=lock_nolock,localflocks,localcaching)
> 
> The duration of full backups are slightly better (from 24-25H to 21-22H of duration) and the
> incremental backup are "acceptable" (about 9H). But the speed is still low in comparison to backups
> of Ext3 filesystems, particularly for incremental backups.
> 
It is bound to be a bit slower, ext3 can make some optimisations which
are just not possible in a clustered environment. On the other hand, if
it is taking that length of time to snapshot the GFS2 volume on the
array, then that seems to be to be an issue with the array rather than
the filesystem.

> I've notice that the glocks are still used, also when I mount a snapshot of the mailbox GFS2
> partition as a local filesystem:
> 
The glocks are pretty low overhead, when clustering is not involved.
> 
> # mount -t gfs2 /dev/mapper/posta_mbox_disk_vg-posta_mbox_disk_lvol1 /var/mailboxes -o
> lockproto=lock_nolock,noatime,nodiratime
> # time cp -Rp /var/mailboxes/prova* /var/tmp/test/
> 
> real	2m5.648s
> user	0m0.311s
> sys	0m13.243s
> # rm -Rf /var/tmp/test/*
> # time cp -Rp /var/mailboxes/prova* /var/tmp/test/
> 
> real	0m10.946s
> user	0m0.254s
> sys	0m10.634s
> 
This is a nice demonstration of the effects of accessing cached vs.
uncached data.

> # cat /proc/slabinfo | grep gloc
> gfs2_glock         35056  35064    424    9    1 : tunables   54   27    8 : slabdata   3896   3896
>      0
> 
That is a pretty small number of glocks.

> Is there a way to exclude the use of the glocks, or them are necessary to access the partition, even
> if mounted as local filesystem?
> 
> Thanks
> 
> Davide Brunato
> 
Yes, they are required, but the overhead is pretty small, so I doubt
that is the real issue here,

Steve.


From brunato at sissa.it  Tue Aug 30 14:25:40 2011
From: brunato at sissa.it (Davide Brunato)
Date: Tue, 30 Aug 2011 16:25:40 +0200
Subject: [Linux-cluster] Backup of a GFS2 volume
In-Reply-To: <1314702563.2694.13.camel@menhir>
References: <4E5CC0B8.8010209@sissa.it> <1314702563.2694.13.camel@menhir>
Message-ID: <4E5CF2E4.1070503@sissa.it>

Hi,

Steven Whitehouse wrote:
> Hi,
> 
> On Tue, 2011-08-30 at 12:51 +0200, Davide Brunato wrote:
>> Hello,
>>
>> I've a Red Hat 5.7 2-node cluster for electronic mail services where the mailboxes (maildir format)
>> are stored on GFS2 volume. The volume contains about 7500000 files for ~740 GB of disk space
>> occupation. Previously the mailboxes were on a GFS1 volume, and I migrated to GFS2 when we changed
>> the SAN storage system.
>>
>> Due to incremental backups that have become extremely slow (about 41-42H) after the migration from
>> GFS to GFS2, I checked the configuration/tuning of the cluster and volume mount options, with the
>> help of Red Hat support, but the optimizations (<gfs_controld plock_rate_limit="0"/>, mount with
>> noatime and nodiratime) don't have significantly accelerated the incremental backups.
>>
> You don't mention how fast the backups were before...
> 

Full: about 17-18 H
Incremental: about 7-8 H

but it was 15th month ago, with a less number of e-mails (4.5-5 millions).

> The issue is most likely just that GFS2 caches more data (on average)
> than GFS does. If you access that data from the node where that data is
> cached, then its faster, if you try to access that same data from
> another node then it will be slower.
> 
> The issue therefore is ensuring that you divide your backup amoung nodes
> in such a way as the backup will mostly be working only with the working
> set of files on that node.
> 
> Either that, or as you've mentioned below, use your array's snapshot
> capability to avoid this issue.
> 
> 
>> So I tried another backup strategy, using the snaphot feature of our SAN storage system, doing
>> backups outside the cluster environment. I use the snapshots of the GSF2 on another server (also
>> with RHEL 5.7) mounting the volume as a local (not clustered) filesystem:
>>
>> /var/mailboxes type gfs2 (rw,noatime,nodiratime,lockproto=lock_nolock,localflocks,localcaching)
>>
>> The duration of full backups are slightly better (from 24-25H to 21-22H of duration) and the
>> incremental backup are "acceptable" (about 9H). But the speed is still low in comparison to backups
>> of Ext3 filesystems, particularly for incremental backups.
>>
> It is bound to be a bit slower, ext3 can make some optimisations which
> are just not possible in a clustered environment. On the other hand, if
> it is taking that length of time to snapshot the GFS2 volume on the
> array, then that seems to be to be an issue with the array rather than
> the filesystem.
> 
>> I've notice that the glocks are still used, also when I mount a snapshot of the mailbox GFS2
>> partition as a local filesystem:
>>
> The glocks are pretty low overhead, when clustering is not involved.
>>
>> # mount -t gfs2 /dev/mapper/posta_mbox_disk_vg-posta_mbox_disk_lvol1 /var/mailboxes -o
>> lockproto=lock_nolock,noatime,nodiratime
>> # time cp -Rp /var/mailboxes/prova* /var/tmp/test/
>>
>> real	2m5.648s
>> user	0m0.311s
>> sys	0m13.243s
>> # rm -Rf /var/tmp/test/*
>> # time cp -Rp /var/mailboxes/prova* /var/tmp/test/
>>
>> real	0m10.946s
>> user	0m0.254s
>> sys	0m10.634s
>>
> This is a nice demonstration of the effects of accessing cached vs.
> uncached data.
> 
>> # cat /proc/slabinfo | grep gloc
>> gfs2_glock         35056  35064    424    9    1 : tunables   54   27    8 : slabdata   3896   3896
>>      0
>>
> That is a pretty small number of glocks.
> 

Yes, because they are related only to the copy of my test files.

>> Is there a way to exclude the use of the glocks, or them are necessary to access the partition, even
>> if mounted as local filesystem?
>>
>> Thanks
>>
>> Davide Brunato
>>
> Yes, they are required, but the overhead is pretty small, so I doubt
> that is the real issue here,
> 
> Steve.
> 

I tried other tests with local GFS2 partitions (on new test LUNs), both in a cluster node and in the
external server (the server that I use for backups from snapshots, but in this case I used a test
LUN assigned directly to this system).

The overhead is still heavy. For example:

# mkfs.gfs2 -j 4 -p lock_nolock /dev/mapper/posta_test_disk_vg-posta_test_disk_lvol1
# mount -t gfs2 /dev/mapper/posta_test_disk_vg-posta_test_disk_lvol1 /mnt -o noatime,nodiratime
# time cp -Rp /mnt/* /var/tmp/test

real	1m9.386s
user	0m0.265s
sys	0m10.647s
# rm -Rf /var/tmp/test/*
# time cp -Rp /mnt/* /var/tmp/test

real	0m9.397s
user	0m0.250s
sys	0m9.036s

Excluding issues on the storage system (we have other TBs in Ext3 for filesystems without issues,
and the storage is an enterprise class system), the overhead appears greater than expected and not
correlated with the snapshot mechanism of the storage system. Similar results are obtained if I
create a local GFS2 test volume on a node of the cluster.

Is this behaviour of the GFS2 local volumes an anomaly?

Thank you

Davide

> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
______________________________________________
Davide Brunato
Sistema Informatico SISSA: http://sis.sissa.it
via Bonomea 265 - 34136 Trieste - Italy
tel: +39-040-3787538  e-mail: brunato at sissa.it


From Ralph.Grothe at itdz-berlin.de  Wed Aug 31 08:25:00 2011
From: Ralph.Grothe at itdz-berlin.de (Ralph.Grothe at itdz-berlin.de)
Date: Wed, 31 Aug 2011 10:25:00 +0200
Subject: [Linux-cluster] Node has joined cluster but services cannot be
	started on it, why?
Message-ID: <A789DDB53ED7E94396E842EE2AC9B5FF01432A17@itdzex101.ITDZ.verwalt-berlin.de>

Hello everyone,

I experience a strange phenomenon on one of our RHCS clusters.

During a scheduled downtime I needed to run a few cluster tests
where I also fenced the node (by issuing a "fence_node barosic"
from the other node of this two-node cluster) which now is
causing me some pain because it is unwilling to start any service
even when explicitly told so by the "-m" option of e.g. clusvcadm
command.

It appears to me as if the communication to the clurgmgrd on this
node is disrupted although the daemon is running.

This can also be seen from the incomplete output of clustat when
compared to that of the fully integrated cluster node (i.e.
arubaic in this case).

At the moment I'm not allowed to issue a service relocation to
show the resulting output because I require a scheduled downtime
for this.
All I can issue now are commands that don't affect the running
services.

Here's clustat's output on the "working node":
(in accordance with the customer I froze all services to counter
any unwanted mangling by clurgmgrd because we aren't HA in the
current situation anyway)


[root at aruba:~]
# clustat
Cluster Status for rhcs-voebb @ Wed Aug 31 09:43:10 2011
Member Status: Quorate

 Member Name                                                ID
Status
 ------ ----                                                ----
------
 arubaic                                                        1
Online, Local, RG-Master
 barosic                                                        2
Online

 Service Name                                      Owner (Last)
State         
 ------- ----                                      ----- ------
-----         
 service:alma                                      arubaic
started    [Z]
 service:lola                                      arubaic
started    [Z]
 service:vb_bz_zlb                                 arubaic
started    [Z]


Whereas the same command issued on the reluctant node I get this:


[root at baros:~]
# clustat
Cluster Status for rhcs-voebb @ Wed Aug 31 09:44:46 2011
Member Status: Quorate

 Member Name                                     ID   Status
 ------ ----                                     ---- ------
 arubaic                                             1 Online
 barosic                                             2 Online,
Local


I monitor our RHCS clusters through Nagios and defined a
check_multi command to this end that checks what I deemed the
vital functions of the RHCS cluster stack.
Its OK output also shows me that all the required daemons are all
running on barosic.
Here's the output of this check run on barosic:


[nagios at baros:~]
$
/usr/lib64/nagios/plugins/contrib/check_multi/libexec/check_multi
-l /usr/lib64/nagios/plugins -f
/etc/nagios/check_multi/rhcs_status.cmd 
OK - 20 plugins checked, 20 ok
[ 1] proc_ccsd PROCS OK: 1 process with command name 'ccsd'
[ 2] proc_clurgmgrd PROCS OK: 2 processes with command name
'clurgmgrd'
[ 3] proc_fenced PROCS OK: 1 process with command name 'fenced'
[ 4] proc_groupd PROCS OK: 1 process with command name 'groupd'
[ 5] proc_clvmd PROCS OK: 1 process with command name 'clvmd'
[ 6] proc_gfs_controld PROCS OK: 1 process with command name
'clvmd'
[ 7] proc_dlm_controld PROCS OK: 1 process with command name
'clvmd'
[ 8] ic_node_ip 192.168.5.58 
[ 9] ic_bond_dev bond1
[10] ic_mii_status up
[11] ic_slave1 eth1
[12] ic_slave2 eth4
[13] slave1_props  8000Mb/s
  Full
  yes
[14] slave2_props  8000Mb/s
  Full
  yes
[15] slave1_link  yes
[16] slave2_link  yes
[17] slave1_speed 8000
[18] slave2_speed 8000
[19] slave1_mode  full
[20] slave2_mode  full|check_multi::check_multi::plugins=20
time=0.257608 


Also cman_tool reports all being OK with barosic (if I
interpreted its output correctly).
Yet, I'm not able to relocate any of the three services on
barosic.

What could be going wrong/missing, where else to look?


Regards
Ralph


[root at baros:~]
# cman_tool status
Version: 6.2.0
Config Version: 64
Cluster Name: rhcs-voebb
Cluster Id: 44402
Cluster Member: Yes
Cluster Generation: 516
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Quorum: 1  
Active subsystems: 9
Flags: 2node Dirty 
Ports Bound: 0 11  
Node name: barosic
Node ID: 2
Multicast addresses: 239.192.173.32 
Node addresses: 192.168.5.58 
[root at baros:~]
# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M    516   2011-08-28 19:27:38  arubaic
   2   M    512   2011-08-28 19:27:38  barosic
[root at baros:~]
# cman_tool services
type             level name       id       state       
fence            0     default    00010001 none        
[1 2]
dlm              1     clvmd      00020001 none        
[1 2]
dlm              1     rgmanager  00010002 none        
[1 2]


From swhiteho at redhat.com  Wed Aug 31 12:56:33 2011
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 31 Aug 2011 13:56:33 +0100
Subject: [Linux-cluster] Backup of a GFS2 volume
In-Reply-To: <4E5CF2E4.1070503@sissa.it>
References: <4E5CC0B8.8010209@sissa.it> <1314702563.2694.13.camel@menhir>
	<4E5CF2E4.1070503@sissa.it>
Message-ID: <1314795393.2679.3.camel@menhir>

Hi,

On Tue, 2011-08-30 at 16:25 +0200, Davide Brunato wrote:
> Hi,
> 
> Steven Whitehouse wrote:
> > Hi,
> > 
> > On Tue, 2011-08-30 at 12:51 +0200, Davide Brunato wrote:
> >> Hello,
> >>
> >> I've a Red Hat 5.7 2-node cluster for electronic mail services where the mailboxes (maildir format)
> >> are stored on GFS2 volume. The volume contains about 7500000 files for ~740 GB of disk space
> >> occupation. Previously the mailboxes were on a GFS1 volume, and I migrated to GFS2 when we changed
> >> the SAN storage system.
> >>
> >> Due to incremental backups that have become extremely slow (about 41-42H) after the migration from
> >> GFS to GFS2, I checked the configuration/tuning of the cluster and volume mount options, with the
> >> help of Red Hat support, but the optimizations (<gfs_controld plock_rate_limit="0"/>, mount with
> >> noatime and nodiratime) don't have significantly accelerated the incremental backups.
> >>
> > You don't mention how fast the backups were before...
> > 
> 
> Full: about 17-18 H
> Incremental: about 7-8 H
> 
> but it was 15th month ago, with a less number of e-mails (4.5-5 millions).
> 
So from that, it seems that the current times have not suddenly got
worse, but have just been extended due to a greater amount of data to be
processed.

> > The issue is most likely just that GFS2 caches more data (on average)
> > than GFS does. If you access that data from the node where that data is
> > cached, then its faster, if you try to access that same data from
> > another node then it will be slower.
> > 
> > The issue therefore is ensuring that you divide your backup amoung nodes
> > in such a way as the backup will mostly be working only with the working
> > set of files on that node.
> > 
> > Either that, or as you've mentioned below, use your array's snapshot
> > capability to avoid this issue.
> > 
> > 
> >> So I tried another backup strategy, using the snaphot feature of our SAN storage system, doing
> >> backups outside the cluster environment. I use the snapshots of the GSF2 on another server (also
> >> with RHEL 5.7) mounting the volume as a local (not clustered) filesystem:
> >>
> >> /var/mailboxes type gfs2 (rw,noatime,nodiratime,lockproto=lock_nolock,localflocks,localcaching)
> >>
> >> The duration of full backups are slightly better (from 24-25H to 21-22H of duration) and the
> >> incremental backup are "acceptable" (about 9H). But the speed is still low in comparison to backups
> >> of Ext3 filesystems, particularly for incremental backups.
> >>
> > It is bound to be a bit slower, ext3 can make some optimisations which
> > are just not possible in a clustered environment. On the other hand, if
> > it is taking that length of time to snapshot the GFS2 volume on the
> > array, then that seems to be to be an issue with the array rather than
> > the filesystem.
> > 
> >> I've notice that the glocks are still used, also when I mount a snapshot of the mailbox GFS2
> >> partition as a local filesystem:
> >>
> > The glocks are pretty low overhead, when clustering is not involved.
> >>
> >> # mount -t gfs2 /dev/mapper/posta_mbox_disk_vg-posta_mbox_disk_lvol1 /var/mailboxes -o
> >> lockproto=lock_nolock,noatime,nodiratime
> >> # time cp -Rp /var/mailboxes/prova* /var/tmp/test/
> >>
> >> real	2m5.648s
> >> user	0m0.311s
> >> sys	0m13.243s
> >> # rm -Rf /var/tmp/test/*
> >> # time cp -Rp /var/mailboxes/prova* /var/tmp/test/
> >>
> >> real	0m10.946s
> >> user	0m0.254s
> >> sys	0m10.634s
> >>
> > This is a nice demonstration of the effects of accessing cached vs.
> > uncached data.
> > 
> >> # cat /proc/slabinfo | grep gloc
> >> gfs2_glock         35056  35064    424    9    1 : tunables   54   27    8 : slabdata   3896   3896
> >>      0
> >>
> > That is a pretty small number of glocks.
> > 
> 
> Yes, because they are related only to the copy of my test files.
> 
> >> Is there a way to exclude the use of the glocks, or them are necessary to access the partition, even
> >> if mounted as local filesystem?
> >>
> >> Thanks
> >>
> >> Davide Brunato
> >>
> > Yes, they are required, but the overhead is pretty small, so I doubt
> > that is the real issue here,
> > 
> > Steve.
> > 
> 
> I tried other tests with local GFS2 partitions (on new test LUNs), both in a cluster node and in the
> external server (the server that I use for backups from snapshots, but in this case I used a test
> LUN assigned directly to this system).
> 
> The overhead is still heavy. For example:
> 
> # mkfs.gfs2 -j 4 -p lock_nolock /dev/mapper/posta_test_disk_vg-posta_test_disk_lvol1
> # mount -t gfs2 /dev/mapper/posta_test_disk_vg-posta_test_disk_lvol1 /mnt -o noatime,nodiratime
> # time cp -Rp /mnt/* /var/tmp/test
> 
> real	1m9.386s
> user	0m0.265s
> sys	0m10.647s
> # rm -Rf /var/tmp/test/*
> # time cp -Rp /mnt/* /var/tmp/test
> 
> real	0m9.397s
> user	0m0.250s
> sys	0m9.036s
> 
> Excluding issues on the storage system (we have other TBs in Ext3 for filesystems without issues,
> and the storage is an enterprise class system), the overhead appears greater than expected and not
> correlated with the snapshot mechanism of the storage system. Similar results are obtained if I
> create a local GFS2 test volume on a node of the cluster.
> 
> Is this behaviour of the GFS2 local volumes an anomaly?
> 
> Thank you
> 
> Davide
> 
No, you appear to be measuring the difference between reading all the
data off disk and reading all the data from the cache, assuming that you
didn't flush the caches or umount between the two tests. So thats
basically what I'd expect to see - that cached data is read much faster,

Steve.

> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>