From raju.rajsand at gmail.com  Fri Jan  2 05:49:38 2009
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Fri, 2 Jan 2009 11:19:38 +0530
Subject: [Linux-cluster] Re: Fencing test
In-Reply-To: <8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com>
References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com>
	<8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com>
	<8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com>
	<8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com>
Message-ID: <8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com>

Greetings,

On Wed, Dec 31, 2008 at 10:30 PM, Paras pradhan <pradhanparas at gmail.com> wrote:
>
> Pulled the heartbeat network cable from node1. Nothing happens. BUT
> when i plug the cable back , then node1 restarted. What am i misssing
> here.

The Heartbeat network cable should be out for at least 20-30 seconds.

If you have connected the data and heartbeat cable or in the same
switch, you may need to pull out both.

Incidently, you will have to enable multicasting for the heartbeat
network in the switch if it is managed switch and assign a seperate
VLAN for it. There have been cases in recent past where some of the
switches

> Also I don't see any thing interesting in /var/log/messages in
> node1 after i disconnect the cable.

Have you checked node2?

HTH

With warm regards

Rajagopal



From ccaulfie at redhat.com  Fri Jan  2 08:34:38 2009
From: ccaulfie at redhat.com (Chrissie Caulfield)
Date: Fri, 02 Jan 2009 08:34:38 +0000
Subject: [Linux-cluster] i rpmbuild the cman on linux as4 IBM power,	it
	does not work.
In-Reply-To: <4957B2AA.096868.02362@m50-132.163.com>
References: <4957B2AA.096868.02362@m50-132.163.com>
Message-ID: <495DD19E.4020801@redhat.com>

victory.xu wrote:
> when i run " service cman start"
> the error in the /var/log/messages
> 
> 	kernel: ioctl32(cman_tool:5382): Unknown cmd fd(3) cmd(2000780b){' '} arg(42000422) on socket:[17147]


At a very quick guess that looks like the tools have been built as 32bit
and the kernel is 64 bit. There is no 32/64 compatibility layer in cman
for RHEL4, they must be the same word size.



> 	the ccsd has been started
> 
> 	i dont know why
> 
> ????????victory.xu
> ????????july_snow at 163.com
> ??????????2008-12-29
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


-- 

Chrissie



From pradhanparas at gmail.com  Fri Jan  2 22:48:51 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Fri, 2 Jan 2009 16:48:51 -0600
Subject: [Linux-cluster] Re: Fencing test
In-Reply-To: <8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com>
References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com>
	<8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com>
	<8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com>
	<8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com>
	<8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com>
Message-ID: <8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com>

On Thu, Jan 1, 2009 at 11:49 PM, Rajagopal Swaminathan
<raju.rajsand at gmail.com> wrote:
> Greetings,


Thanks for following me up with your replies. I really appreciate this.
>
> On Wed, Dec 31, 2008 at 10:30 PM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>
>> Pulled the heartbeat network cable from node1. Nothing happens. BUT
>> when i plug the cable back , then node1 restarted. What am i misssing
>> here.
>
> The Heartbeat network cable should be out for at least 20-30 seconds.

Yes waited more then 20-30 seconds (around 2,3 minutes). Didn;t
reboot. But as I said when I pushed the cable back to network port
then it reboots.

>
> If you have connected the data and heartbeat cable or in the same
> switch, you may need to pull out both.

Each of my nodes have one network interface card. So my heartbeat and
data cable is same and only one  if I understand you correctly.

>
> Incidently, you will have to enable multicasting for the heartbeat
> network in the switch if it is managed switch and assign a seperate
> VLAN for it. There have been cases in recent past where some of the
> switches

Here I am using 4 nodes.

Node 1) That runs luci
Node 2) This is my iscsi shared storage where my virutal machine(s) resides
Node 3) First node in my two node cluster
Node 4) Second node in my two node cluster

All of them are connected simply to an unmanaged 16 port switch.


>
>> Also I don't see any thing interesting in /var/log/messages in
>> node1 after i disconnect the cable.
>
> Have you checked node2?

Nothing in node2 log as well




>
> HTH
>
> With warm regards
>
> Rajagopal
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


Thanks !
Paras.



From jngarratt at gmail.com  Mon Jan  5 06:10:13 2009
From: jngarratt at gmail.com (James Garratt)
Date: Mon, 5 Jan 2009 17:10:13 +1100
Subject: [Linux-cluster] clvm running with redundant gnbd servers
Message-ID: <314e34340901042210t8aa4162je75ca81d82f66be4@mail.gmail.com>

I'm setting up a GNBD cluster with clvmd on the clients for the purpose of
running a xen cluster. I've been playing with this for a few months now and
I've almost got everything working, However I still have one outstanding
issue that even after extensive searches of documentation and goggle I can't
find an answer to.

My setup:
2 gnbd servers (running rhel5)
5 gnbd clients (running centos5)
GNBD servers are connected to a SAN via redundant paths.
Servers export multiple GNBDs with different names but with matching UIDs
for each device they export.
Clients import all GNBDs from each server.
multipath.conf has been configured on the clients to see the GNBDs
lvm.conf has been configured on the clients to filter everything except the
local disks and /dev/mpath/*

My problem is that if I put the two GNBD servers in the same cluster as the
GNBD clients then I get warnings as the servers can't see the Volume Groups
being used by the clients. If I put the servers in a separate cluster then
fencing can not work properly in the event of a server crash and multipath
locks up until the server is running again. Is there a way to tell clvm to
ignore some of the cluster nodes or is there another solution to this
problem?

Any advice or pointers to relevant documentation would be appreciated.

Regards,

James Garratt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090105/24c5f9f5/attachment.htm>

From raju.rajsand at gmail.com  Mon Jan  5 14:23:36 2009
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Mon, 5 Jan 2009 19:53:36 +0530
Subject: [Linux-cluster] Re: Fencing test
In-Reply-To: <8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com>
References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com>
	<8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com>
	<8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com>
	<8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com>
	<8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com>
	<8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com>
Message-ID: <8786b91c0901050623m46c79628i795e18dda28474c9@mail.gmail.com>

Greetings,

On Sat, Jan 3, 2009 at 4:18 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>
> Here I am using 4 nodes.
>
> Node 1) That runs luci
> Node 2) This is my iscsi shared storage where my virutal machine(s) resides
> Node 3) First node in my two node cluster
> Node 4) Second node in my two node cluster
>
> All of them are connected simply to an unmanaged 16 port switch.

Luci need not require a separate node to run. it can run on one of the
member nodes (node 3 | 4).

what does clustat say?

Can you post your cluster.conf here?

When you pull out the network cable *and* plug it back  in say node 3,
, what messages appear in the /var/log/messages if Node 4 (if any)?
(sorry for the repitition, but messages are necessary here to make any
sense of the situation)

HTH

With warm regards

Rajagopal



From pradhanparas at gmail.com  Mon Jan  5 18:11:24 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Mon, 5 Jan 2009 12:11:24 -0600
Subject: [Linux-cluster] Re: Fencing test
In-Reply-To: <8786b91c0901050623m46c79628i795e18dda28474c9@mail.gmail.com>
References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com>
	<8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com>
	<8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com>
	<8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com>
	<8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com>
	<8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com>
	<8786b91c0901050623m46c79628i795e18dda28474c9@mail.gmail.com>
Message-ID: <8b711df40901051011x79066243g38108439ffb1075f@mail.gmail.com>

hi,

On Mon, Jan 5, 2009 at 8:23 AM, Rajagopal Swaminathan
<raju.rajsand at gmail.com> wrote:
> Greetings,
>
> On Sat, Jan 3, 2009 at 4:18 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>
>> Here I am using 4 nodes.
>>
>> Node 1) That runs luci
>> Node 2) This is my iscsi shared storage where my virutal machine(s) resides
>> Node 3) First node in my two node cluster
>> Node 4) Second node in my two node cluster
>>
>> All of them are connected simply to an unmanaged 16 port switch.
>
> Luci need not require a separate node to run. it can run on one of the
> member nodes (node 3 | 4).

OK.

>
> what does clustat say?

Here is my clustat o/p:

-----------

[root at ha1lx ~]# clustat
Cluster Status for ipmicluster @ Mon Jan  5 12:00:10 2009
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 10.42.21.29                                                         1
Online, rgmanager
 10.42.21.27                                                         2
Online, Local, rgmanager

 Service Name
Owner (Last)                                                     State
 ------- ----
----- ------                                                     -----
 vm:linux64
10.42.21.27
started
[root at ha1lx ~]#
------------------------


10.42.21.27 is node3 and 10.42.21.29 is node4



>
> Can you post your cluster.conf here?

Here is my cluster.conf

--
[root at ha1lx cluster]# more cluster.conf
<?xml version="1.0"?>
<cluster alias="ipmicluster" config_version="8" name="ipmicluster">
	<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="10.42.21.29" nodeid="1" votes="1">
			<fence>
				<method name="1">
					<device name="fence2"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="10.42.21.27" nodeid="2" votes="1">
			<fence>
				<method name="1">
					<device name="fence1"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<cman expected_votes="1" two_node="1"/>
	<fencedevices>
		<fencedevice agent="fence_ipmilan" ipaddr="10.42.21.28"
login="admin" name="fence1" passwd="admin"/>
		<fencedevice agent="fence_ipmilan" ipaddr="10.42.21.30"
login="admin" name="fence2" passwd="admin"/>
	</fencedevices>
	<rm>
		<failoverdomains>
			<failoverdomain name="myfd" nofailback="0" ordered="1" restricted="0">
				<failoverdomainnode name="10.42.21.29" priority="2"/>
				<failoverdomainnode name="10.42.21.27" priority="1"/>
			</failoverdomain>
		</failoverdomains>
		<resources/>
		<vm autostart="1" domain="myfd" exclusive="0" migrate="live"
name="linux64" path="/guest_roots" recovery="restart"/>
	</rm>
</cluster>
------


Here:

10.42.21.28 is IPMI interface in node3
10.42.21.30 is IPMI interface in node4








>
> When you pull out the network cable *and* plug it back  in say node 3,
> , what messages appear in the /var/log/messages if Node 4 (if any)?
> (sorry for the repitition, but messages are necessary here to make any
> sense of the situation)
>

Ok here is the log in node 4 after i disconnect the network cable in node3.

-----------

Jan  5 12:05:24 ha2lx openais[4988]: [TOTEM] The token was lost in the
OPERATIONAL state.
Jan  5 12:05:24 ha2lx openais[4988]: [TOTEM] Receive multicast socket
recv buffer size (288000 bytes).
Jan  5 12:05:24 ha2lx openais[4988]: [TOTEM] Transmit multicast socket
send buffer size (262142 bytes).
Jan  5 12:05:24 ha2lx openais[4988]: [TOTEM] entering GATHER state from 2.
Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] entering GATHER state from 0.
Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] Creating commit token
because I am the rep.
Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] Saving state aru 76 high
seq received 76
Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] Storing new sequence id
for ring ac
Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] entering COMMIT state.
Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] entering RECOVERY state.
Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] position [0] member 10.42.21.29:
Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] previous ring seq 168 rep
10.42.21.27
Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] aru 76 high delivered 76
received flag 1
Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] Did not need to originate
any messages in recovery.
Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] Sending initial ORF token
Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] CLM CONFIGURATION CHANGE
Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] New Configuration:
Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] 	r(0) ip(10.42.21.29)
Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] Members Left:
Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] 	r(0) ip(10.42.21.27)
Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] Members Joined:
Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] CLM CONFIGURATION CHANGE
Jan  5 12:05:28 ha2lx kernel: dlm: closing connection to node 2
Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] New Configuration:
Jan  5 12:05:28 ha2lx fenced[5004]: 10.42.21.27 not a cluster member
after 0 sec post_fail_delay
Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] 	r(0) ip(10.42.21.29)
Jan  5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
jid=1: Trying to acquire journal lock...
Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] Members Left:
Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] Members Joined:
Jan  5 12:05:28 ha2lx openais[4988]: [SYNC ] This node is within the
primary component and will provide service.
Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state.
Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] got nodejoin message 10.42.21.29
Jan  5 12:05:28 ha2lx openais[4988]: [CPG  ] got joinlist message from node 1
Jan  5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
jid=1: Looking at journal...
Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
jid=1: Acquiring the transaction lock...
Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
jid=1: Replaying journal...
Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
jid=1: Replayed 0 of 0 blocks
Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
jid=1: Found 0 revoke tags
Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
jid=1: Journal replayed in 1s
Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: jid=1: Done
------------------

Now when I plug back my cable to node3, node 4 reboots and here is the
quickly grabbed log in node4


--
Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] entering GATHER state from 11.
Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] Saving state aru 1d high
seq received 1d
Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] Storing new sequence id
for ring b0
Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] entering COMMIT state.
Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] entering RECOVERY state.
Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] position [0] member 10.42.21.27:
Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep
10.42.21.27
Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 16 high delivered 16
received flag 1
Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] position [1] member 10.42.21.29:
Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep
10.42.21.29
Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 1d high delivered 1d
received flag 1
Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] Did not need to originate
any messages in recovery.
Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] CLM CONFIGURATION CHANGE
Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] New Configuration:
Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] 	r(0) ip(10.42.21.29)
Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] Members Left:
Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] Members Joined:
Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] CLM CONFIGURATION CHANGE
Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] New Configuration:
Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] 	r(0) ip(10.42.21.27)
Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] 	r(0) ip(10.42.21.29)
Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] Members Left:
Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] Members Joined:
Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] 	r(0) ip(10.42.21.27)
Jan  5 12:07:12 ha2lx openais[4988]: [SYNC ] This node is within the
primary component and will provide service.
Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state.
Jan  5 12:07:12 ha2lx openais[4988]: [MAIN ] Killing node 10.42.21.27
because it has rejoined the cluster with existing state
Jan  5 12:07:12 ha2lx openais[4988]: [CMAN ] cman killed by node 2
because we rejoined the cluster without a full restart
Jan  5 12:07:12 ha2lx gfs_controld[5016]: groupd_dispatch error -1 errno 11
Jan  5 12:07:12 ha2lx gfs_controld[5016]: groupd connection died
Jan  5 12:07:12 ha2lx gfs_controld[5016]: cluster is down, exiting
Jan  5 12:07:12 ha2lx dlm_controld[5010]: cluster is down, exiting
Jan  5 12:07:12 ha2lx kernel: dlm: closing connection to node 1
Jan  5 12:07:12 ha2lx fenced[5004]: cluster is down, exiting
-------


Also here is the log of node3:

--
[root at ha1lx ~]# tail -f /var/log/messages
Jan  5 12:07:24 ha1lx openais[26029]: [TOTEM] entering OPERATIONAL state.
Jan  5 12:07:24 ha1lx openais[26029]: [CLM  ] got nodejoin message 10.42.21.27
Jan  5 12:07:24 ha1lx openais[26029]: [CLM  ] got nodejoin message 10.42.21.27
Jan  5 12:07:24 ha1lx openais[26029]: [CPG  ] got joinlist message from node 2
Jan  5 12:07:27 ha1lx ccsd[26019]: Attempt to close an unopened CCS
descriptor (4520670).
Jan  5 12:07:27 ha1lx ccsd[26019]: Error while processing disconnect:
Invalid request descriptor
Jan  5 12:07:27 ha1lx fenced[26045]: fence "10.42.21.29" success
Jan  5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1:
jid=0: Trying to acquire journal lock...
Jan  5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1:
jid=0: Looking at journal...
Jan  5 12:07:28 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1: jid=0: Done
----------------












> HTH
>
> With warm regards
>
> Rajagopal
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


Thanks a lot

Paras.



From Joseph.Greenseid at ngc.com  Mon Jan  5 20:18:10 2009
From: Joseph.Greenseid at ngc.com (Greenseid, Joseph M.)
Date: Mon, 5 Jan 2009 14:18:10 -0600
Subject: [Linux-cluster] problem adding new node to an existing cluster
Message-ID: <D089B7B0C0FBCD498494B5A0AA74827DDB386E@XMBIL112.northgrum.com>

hi all,
 
i am trying to add a new node to an existing 3 node GFS cluster.
 
i followed the steps in the online docs for this, so i went onto the 1st node in my existing cluster, run system-config-cluster, added a new node and fence for it, then propagated that out to the existing nodes, and scp'd the cluster.conf file to the new node.
 
at that point, i confirmed that multipath and mdadm config files were synced with my other nodes, the new node can properly see the SAN that they're all sharing, etc.  
 
i then started cman, which seemed to start without any trouble.  i tried to start clvmd, but it says:
 
Activating VGs: Skipping clustered volume group san01
 
my VG is named "san01," so it can see the volume group, it just won't activate it for some reason.  any ideas what i'm doing wrong?  
 
thanks,
--Joe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090105/d4760d53/attachment.htm>

From rpeterso at redhat.com  Mon Jan  5 20:25:36 2009
From: rpeterso at redhat.com (Bob Peterson)
Date: Mon, 5 Jan 2009 15:25:36 -0500 (EST)
Subject: [Linux-cluster] problem adding new node to an existing cluster
In-Reply-To: <D089B7B0C0FBCD498494B5A0AA74827DDB386E@XMBIL112.northgrum.com>
Message-ID: <868569604.2835591231187135219.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>

----- "Joseph M. Greenseid" <Joseph.Greenseid at ngc.com> wrote:
| hi all,
| 
| i am trying to add a new node to an existing 3 node GFS cluster.
| 
| i followed the steps in the online docs for this, so i went onto the
| 1st node in my existing cluster, run system-config-cluster, added a
| new node and fence for it, then propagated that out to the existing
| nodes, and scp'd the cluster.conf file to the new node.
| 
| at that point, i confirmed that multipath and mdadm config files were
| synced with my other nodes, the new node can properly see the SAN that
| they're all sharing, etc.
| 
| i then started cman, which seemed to start without any trouble. i
| tried to start clvmd, but it says:
| 
| Activating VGs: Skipping clustered volume group san01
| 
| my VG is named "san01," so it can see the volume group, it just won't
| activate it for some reason. any ideas what i'm doing wrong?
| 
| thanks,
| --Joe 

Hi Joe,

Make sure that you have clvmd service running on the new node
("chkconfig clvmd on" and/or "service clvmd start" as necessary).
Also, make sure the lock_type is 2 (RHEL4/similar) or 3 (RHEL5/similar)
in the /etc/lvm/lvm.conf file.

Regards,

Bob Peterson
Red Hat GFS



From Joseph.Greenseid at ngc.com  Mon Jan  5 20:28:12 2009
From: Joseph.Greenseid at ngc.com (Greenseid, Joseph M.)
Date: Mon, 5 Jan 2009 14:28:12 -0600
Subject: [Linux-cluster] problem adding new node to an existing cluster
References: <868569604.2835591231187135219.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
Message-ID: <D089B7B0C0FBCD498494B5A0AA74827DDB386F@XMBIL112.northgrum.com>

---- "Joseph M. Greenseid" <Joseph.Greenseid at ngc.com> wrote:
| hi all,
|
| i am trying to add a new node to an existing 3 node GFS cluster.
|
| i followed the steps in the online docs for this, so i went onto the
| 1st node in my existing cluster, run system-config-cluster, added a
| new node and fence for it, then propagated that out to the existing
| nodes, and scp'd the cluster.conf file to the new node.
|
| at that point, i confirmed that multipath and mdadm config files were
| synced with my other nodes, the new node can properly see the SAN that
| they're all sharing, etc.
|
| i then started cman, which seemed to start without any trouble. i
| tried to start clvmd, but it says:
|
| Activating VGs: Skipping clustered volume group san01
|
| my VG is named "san01," so it can see the volume group, it just won't
| activate it for some reason. any ideas what i'm doing wrong?
|
| thanks,
| --Joe

> Hi Joe,

> Make sure that you have clvmd service running on the new node
> ("chkconfig clvmd on" and/or "service clvmd start" as necessary).

Hi Bob, 

Yes, this problem started when I tried to start clvmd (/sbin/service clvmd start).


> Also, make sure the lock_type is 2 (RHEL4/similar) or 3 (RHEL5/similar)
> in the /etc/lvm/lvm.conf file.

Ah, Ok, I believe this may be the trouble.  My lock_type was 1.  I'll change it and try again.  Thanks.

--Joe

> Regards,

> Bob Peterson
> Red Hat GFS


-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 4399 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090105/6da60c4d/attachment.bin>

From Joseph.Greenseid at ngc.com  Mon Jan  5 21:10:29 2009
From: Joseph.Greenseid at ngc.com (Greenseid, Joseph M.)
Date: Mon, 5 Jan 2009 15:10:29 -0600
Subject: [Linux-cluster] problem adding new node to an existing cluster
References: <D089B7B0C0FBCD498494B5A0AA74827DDB386F@XMBIL112.northgrum.com>
Message-ID: <D089B7B0C0FBCD498494B5A0AA74827DDB3872@XMBIL112.northgrum.com>

> Also, make sure the lock_type is 2 (RHEL4/similar) or 3 (RHEL5/similar)
> in the /etc/lvm/lvm.conf file.
 
This fixed it.  Thanks.
 
--Joe
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090105/0999baeb/attachment.htm>

From Joseph.Greenseid at ngc.com  Mon Jan  5 22:01:45 2009
From: Joseph.Greenseid at ngc.com (Greenseid, Joseph M.)
Date: Mon, 5 Jan 2009 16:01:45 -0600
Subject: [Linux-cluster] problem adding new node to an existing cluster
References: <D089B7B0C0FBCD498494B5A0AA74827DDB3872@XMBIL112.northgrum.com>
Message-ID: <D089B7B0C0FBCD498494B5A0AA74827DDB3873@XMBIL112.northgrum.com>

Hi,
 
I have a new question.  When I created this file system a year ago, I didn't anticipate needing any additional nodes other than the original 3 I set up.  Consequently, I have 3 journals.  Now that I've been told to add a fourth node, is there a way to add a journal to an existing file system that resides on a volume that has not been expanded (the docs appear to read that you can only do it to an expanded volume because the additional journal(s) take up additional space).  My file system isn't full, though my volume is fully used by the formatted GFS file system.  
 
Is there anything I can do that won't involve destroying my existing file system?
 
Thanks,
--Joe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3699 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090105/ddb0e237/attachment.bin>

From rpeterso at redhat.com  Mon Jan  5 23:09:18 2009
From: rpeterso at redhat.com (Bob Peterson)
Date: Mon, 5 Jan 2009 18:09:18 -0500 (EST)
Subject: [Linux-cluster] problem adding new node to an existing cluster
In-Reply-To: <1380566121.21231196900140.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
Message-ID: <291064814.51231196957732.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>

----- "Joseph M. Greenseid" <Joseph.Greenseid at ngc.com> wrote:
| Hi,
|  
| I have a new question.  When I created this file system a year ago, I
| didn't anticipate needing any additional nodes other than the original
| 3 I set up.  Consequently, I have 3 journals.  Now that I've been told
| to add a fourth node, is there a way to add a journal to an existing
| file system that resides on a volume that has not been expanded (the
| docs appear to read that you can only do it to an expanded volume
| because the additional journal(s) take up additional space).  My file
| system isn't full, though my volume is fully used by the formatted GFS
| file system.  
|  
| Is there anything I can do that won't involve destroying my existing
| file system?
|  
| Thanks,
| --Joe

Hi Joe,

Journals for gfs file systems are carved out during mkfs.  The rest of the
space is used for data and metadata.  So there are only two ways to
make journals: (1) Do another mkfs which will destroy your file system
or (2) if you're using lvm, add more storage with something like
lvresize or lvextend, then use gfs_jadd to add the new journal to the
new chunk of storage.

We realize that's a pain, and that's why we took away that restriction
in gfs2.  In gfs2, journals are kept as a hidden part of the file system,
so they can be added painlessly to an existing file system without
adding storage.   So I guess a third option would be to convert the file
system to gfs2 using gfs2_convert, add the journal with gfs2_jadd, then
use it as gfs2 from then on.  But please be aware that gfs2_convert had some
serious problems until the 5.3 version that was committed to the cluster
git tree in December, (i.e. the very latest and greatest "RHEL5", "RHEL53",
"master", "STABLE2" or "STABLE3" versions in the cluster git (source code)
tree.)  Make ABSOLUTELY CERTAIN that you have a working & recent backup and
restore option before you try this.  Also, the GFS2 kernel code prior to
5.3 is considered tech preview as well, so not ready for production use.
So if you're not building from source code, you should wait until RHEL5.3
or Centos5.3 (or similar) before even considering this option.

Regards,

Bob Peterson
Red Hat GFS



From Joseph.Greenseid at ngc.com  Tue Jan  6 13:57:21 2009
From: Joseph.Greenseid at ngc.com (Greenseid, Joseph M.)
Date: Tue, 6 Jan 2009 07:57:21 -0600
Subject: [Linux-cluster] problem adding new node to an existing cluster
References: <291064814.51231196957732.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
Message-ID: <D089B7B0C0FBCD498494B5A0AA74827DDB3875@XMBIL112.northgrum.com>

---- "Joseph M. Greenseid" <Joseph.Greenseid at ngc.com> wrote:
| Hi,
| 
| I have a new question.  When I created this file system a year ago, I
| didn't anticipate needing any additional nodes other than the original
| 3 I set up.  Consequently, I have 3 journals.  Now that I've been told
| to add a fourth node, is there a way to add a journal to an existing
| file system that resides on a volume that has not been expanded (the
| docs appear to read that you can only do it to an expanded volume
| because the additional journal(s) take up additional space).  My file
| system isn't full, though my volume is fully used by the formatted GFS
| file system. 
| 
| Is there anything I can do that won't involve destroying my existing
| file system?
| 
| Thanks,
| --Joe

> Hi Joe,

> Journals for gfs file systems are carved out during mkfs.  The rest of the
> space is used for data and metadata.  So there are only two ways to
> make journals: (1) Do another mkfs which will destroy your file system
> or (2) if you're using lvm, add more storage with something like
> lvresize or lvextend, then use gfs_jadd to add the new journal to the
> new chunk of storage.
> 

Ok, so I did understand correctly.  That's at least something positive.  :)


> We realize that's a pain, and that's why we took away that restriction
> in gfs2.  In gfs2, journals are kept as a hidden part of the file system,
> so they can be added painlessly to an existing file system without
> adding storage.   So I guess a third option would be to convert the file
> system to gfs2 using gfs2_convert, add the journal with gfs2_jadd, then
> use it as gfs2 from then on.  But please be aware that gfs2_convert had some
> serious problems until the 5.3 version that was committed to the cluster
> git tree in December, (i.e. the very latest and greatest "RHEL5", "RHEL53",
> "master", "STABLE2" or "STABLE3" versions in the cluster git (source code)
> tree.)  Make ABSOLUTELY CERTAIN that you have a working & recent backup and
> restore option before you try this.  Also, the GFS2 kernel code prior to
> 5.3 is considered tech preview as well, so not ready for production use.
> So if you're not building from source code, you should wait until RHEL5.3
> or Centos5.3 (or similar) before even considering this option.
> 


Ok, I have an earlier version of GFS2, so I guess I'm going to need to sit down and figure out a better strategy for what I've been asked to do.  I appreciate the help with my questions, though.  Thanks again.

--Joe

> Regards,
> 
> Bob Peterson
> Red Hat GFS



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090106/78398c16/attachment.htm>

From duplessis.jacques at gmail.com  Tue Jan  6 23:56:56 2009
From: duplessis.jacques at gmail.com (Jacques Duplessis)
Date: Tue, 6 Jan 2009 18:56:56 -0500
Subject: [Linux-cluster] Re: Linux-cluster Digest, Vol 57, Issue 5
In-Reply-To: <20090106170010.7AFD58E00FA@hormel.redhat.com>
References: <20090106170010.7AFD58E00FA@hormel.redhat.com>
Message-ID: <6d89d2a30901061556t4e6d66b6x7a4dd48a50e2dd80@mail.gmail.com>

# Add theses lines to syslog.conf file & Restart syslog
# ========================================================
# vi /etc/syslog.conf

# rgmanager log
  local4.*                     /var/log/rgmanager

# Create log file before restarting the syslog
# ========================================================
# touch /var/log/rgmanager
# chmod 644 /var/log/manager
# chown root.root /var/log/rgmanager

# service syslog restart
Shutting down kernel logger: [  OK  ]
Shutting down system logger: [  OK  ]
Starting system logger: [  OK  ]
Starting kernel logger: [  OK  ]

# Change cluster config file to log rgmanager info
# ========================================================

# vi /etc/cluster/cluster.conf

change line
<rm>
to
<rm log_facility="local4" log_level="7">



# Push changes to all cluster nodes
# ========================================================

# ccs_tool update /etc/cluster/cluster.conf

Unplug and plug back network cable on the node and
look at the /var/log/rgmanager file.
May contain usefull info for us.





On Tue, Jan 6, 2009 at 12:00 PM, <linux-cluster-request at redhat.com> wrote:

> Send Linux-cluster mailing list submissions to
>        linux-cluster at redhat.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://www.redhat.com/mailman/listinfo/linux-cluster
> or, via email, send a message with subject or body 'help' to
>        linux-cluster-request at redhat.com
>
> You can reach the person managing the list at
>        linux-cluster-owner at redhat.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Linux-cluster digest..."
>
>
> Today's Topics:
>
>   1. Re: Re: Fencing test (Paras pradhan)
>   2. problem adding new node to an existing cluster
>      (Greenseid, Joseph M.)
>   3. Re: problem adding new node to an existing cluster (Bob Peterson)
>   4. RE: problem adding new node to an existing cluster
>      (Greenseid, Joseph M.)
>   5. RE: problem adding new node to an existing cluster
>      (Greenseid, Joseph M.)
>   6. RE: problem adding new node to an existing cluster
>      (Greenseid, Joseph M.)
>   7. Re: problem adding new node to an existing cluster (Bob Peterson)
>   8. RE: problem adding new node to an existing cluster
>      (Greenseid, Joseph M.)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 5 Jan 2009 12:11:24 -0600
> From: "Paras pradhan" <pradhanparas at gmail.com>
> Subject: Re: [Linux-cluster] Re: Fencing test
> To: "linux clustering" <linux-cluster at redhat.com>
> Message-ID:
>        <8b711df40901051011x79066243g38108439ffb1075f at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> hi,
>
> On Mon, Jan 5, 2009 at 8:23 AM, Rajagopal Swaminathan
> <raju.rajsand at gmail.com> wrote:
> > Greetings,
> >
> > On Sat, Jan 3, 2009 at 4:18 AM, Paras pradhan <pradhanparas at gmail.com>
> wrote:
> >>
> >> Here I am using 4 nodes.
> >>
> >> Node 1) That runs luci
> >> Node 2) This is my iscsi shared storage where my virutal machine(s)
> resides
> >> Node 3) First node in my two node cluster
> >> Node 4) Second node in my two node cluster
> >>
> >> All of them are connected simply to an unmanaged 16 port switch.
> >
> > Luci need not require a separate node to run. it can run on one of the
> > member nodes (node 3 | 4).
>
> OK.
>
> >
> > what does clustat say?
>
> Here is my clustat o/p:
>
> -----------
>
> [root at ha1lx ~]# clustat
> Cluster Status for ipmicluster @ Mon Jan  5 12:00:10 2009
> Member Status: Quorate
>
>  Member Name                                                     ID
> Status
>  ------ ----                                                     ----
> ------
>  10.42.21.29                                                         1
> Online, rgmanager
>  10.42.21.27                                                         2
> Online, Local, rgmanager
>
>  Service Name
> Owner (Last)                                                     State
>  ------- ----
> ----- ------                                                     -----
>  vm:linux64
> 10.42.21.27
> started
> [root at ha1lx ~]#
> ------------------------
>
>
> 10.42.21.27 is node3 and 10.42.21.29 is node4
>
>
>
> >
> > Can you post your cluster.conf here?
>
> Here is my cluster.conf
>
> --
> [root at ha1lx cluster]# more cluster.conf
> <?xml version="1.0"?>
> <cluster alias="ipmicluster" config_version="8" name="ipmicluster">
>        <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="3"/>
>        <clusternodes>
>                <clusternode name="10.42.21.29" nodeid="1" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device name="fence2"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="10.42.21.27" nodeid="2" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device name="fence1"/>
>                                </method>
>                        </fence>
>                </clusternode>
>        </clusternodes>
>        <cman expected_votes="1" two_node="1"/>
>        <fencedevices>
>                <fencedevice agent="fence_ipmilan" ipaddr="10.42.21.28"
> login="admin" name="fence1" passwd="admin"/>
>                <fencedevice agent="fence_ipmilan" ipaddr="10.42.21.30"
> login="admin" name="fence2" passwd="admin"/>
>        </fencedevices>
>        <rm>
>                <failoverdomains>
>                        <failoverdomain name="myfd" nofailback="0"
> ordered="1" restricted="0">
>                                <failoverdomainnode name="10.42.21.29"
> priority="2"/>
>                                <failoverdomainnode name="10.42.21.27"
> priority="1"/>
>                        </failoverdomain>
>                </failoverdomains>
>                <resources/>
>                <vm autostart="1" domain="myfd" exclusive="0" migrate="live"
> name="linux64" path="/guest_roots" recovery="restart"/>
>        </rm>
> </cluster>
> ------
>
>
> Here:
>
> 10.42.21.28 is IPMI interface in node3
> 10.42.21.30 is IPMI interface in node4
>
>
>
>
>
>
>
>
> >
> > When you pull out the network cable *and* plug it back  in say node 3,
> > , what messages appear in the /var/log/messages if Node 4 (if any)?
> > (sorry for the repitition, but messages are necessary here to make any
> > sense of the situation)
> >
>
> Ok here is the log in node 4 after i disconnect the network cable in node3.
>
> -----------
>
> Jan  5 12:05:24 ha2lx openais[4988]: [TOTEM] The token was lost in the
> OPERATIONAL state.
> Jan  5 12:05:24 ha2lx openais[4988]: [TOTEM] Receive multicast socket
> recv buffer size (288000 bytes).
> Jan  5 12:05:24 ha2lx openais[4988]: [TOTEM] Transmit multicast socket
> send buffer size (262142 bytes).
> Jan  5 12:05:24 ha2lx openais[4988]: [TOTEM] entering GATHER state from 2.
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] entering GATHER state from 0.
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] Creating commit token
> because I am the rep.
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] Saving state aru 76 high
> seq received 76
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] Storing new sequence id
> for ring ac
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] entering COMMIT state.
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] entering RECOVERY state.
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] position [0] member
> 10.42.21.29:
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] previous ring seq 168 rep
> 10.42.21.27
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] aru 76 high delivered 76
> received flag 1
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] Did not need to originate
> any messages in recovery.
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] Sending initial ORF token
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] CLM CONFIGURATION CHANGE
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] New Configuration:
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ]    r(0) ip(10.42.21.29)
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] Members Left:
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ]    r(0) ip(10.42.21.27)
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] Members Joined:
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] CLM CONFIGURATION CHANGE
> Jan  5 12:05:28 ha2lx kernel: dlm: closing connection to node 2
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] New Configuration:
> Jan  5 12:05:28 ha2lx fenced[5004]: 10.42.21.27 not a cluster member
> after 0 sec post_fail_delay
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ]    r(0) ip(10.42.21.29)
> Jan  5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Trying to acquire journal lock...
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] Members Left:
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] Members Joined:
> Jan  5 12:05:28 ha2lx openais[4988]: [SYNC ] This node is within the
> primary component and will provide service.
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state.
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] got nodejoin message
> 10.42.21.29
> Jan  5 12:05:28 ha2lx openais[4988]: [CPG  ] got joinlist message from node
> 1
> Jan  5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Looking at journal...
> Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Acquiring the transaction lock...
> Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Replaying journal...
> Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Replayed 0 of 0 blocks
> Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Found 0 revoke tags
> Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Journal replayed in 1s
> Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: jid=1:
> Done
> ------------------
>
> Now when I plug back my cable to node3, node 4 reboots and here is the
> quickly grabbed log in node4
>
>
> --
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] entering GATHER state from 11.
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] Saving state aru 1d high
> seq received 1d
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] Storing new sequence id
> for ring b0
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] entering COMMIT state.
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] entering RECOVERY state.
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] position [0] member
> 10.42.21.27:
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep
> 10.42.21.27
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 16 high delivered 16
> received flag 1
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] position [1] member
> 10.42.21.29:
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep
> 10.42.21.29
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 1d high delivered 1d
> received flag 1
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] Did not need to originate
> any messages in recovery.
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] CLM CONFIGURATION CHANGE
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] New Configuration:
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ]    r(0) ip(10.42.21.29)
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] Members Left:
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] Members Joined:
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] CLM CONFIGURATION CHANGE
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] New Configuration:
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ]    r(0) ip(10.42.21.27)
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ]    r(0) ip(10.42.21.29)
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] Members Left:
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] Members Joined:
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ]    r(0) ip(10.42.21.27)
> Jan  5 12:07:12 ha2lx openais[4988]: [SYNC ] This node is within the
> primary component and will provide service.
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state.
> Jan  5 12:07:12 ha2lx openais[4988]: [MAIN ] Killing node 10.42.21.27
> because it has rejoined the cluster with existing state
> Jan  5 12:07:12 ha2lx openais[4988]: [CMAN ] cman killed by node 2
> because we rejoined the cluster without a full restart
> Jan  5 12:07:12 ha2lx gfs_controld[5016]: groupd_dispatch error -1 errno 11
> Jan  5 12:07:12 ha2lx gfs_controld[5016]: groupd connection died
> Jan  5 12:07:12 ha2lx gfs_controld[5016]: cluster is down, exiting
> Jan  5 12:07:12 ha2lx dlm_controld[5010]: cluster is down, exiting
> Jan  5 12:07:12 ha2lx kernel: dlm: closing connection to node 1
> Jan  5 12:07:12 ha2lx fenced[5004]: cluster is down, exiting
> -------
>
>
> Also here is the log of node3:
>
> --
> [root at ha1lx ~]# tail -f /var/log/messages
> Jan  5 12:07:24 ha1lx openais[26029]: [TOTEM] entering OPERATIONAL state.
> Jan  5 12:07:24 ha1lx openais[26029]: [CLM  ] got nodejoin message
> 10.42.21.27
> Jan  5 12:07:24 ha1lx openais[26029]: [CLM  ] got nodejoin message
> 10.42.21.27
> Jan  5 12:07:24 ha1lx openais[26029]: [CPG  ] got joinlist message from
> node 2
> Jan  5 12:07:27 ha1lx ccsd[26019]: Attempt to close an unopened CCS
> descriptor (4520670).
> Jan  5 12:07:27 ha1lx ccsd[26019]: Error while processing disconnect:
> Invalid request descriptor
> Jan  5 12:07:27 ha1lx fenced[26045]: fence "10.42.21.29" success
> Jan  5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1:
> jid=0: Trying to acquire journal lock...
> Jan  5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1:
> jid=0: Looking at journal...
> Jan  5 12:07:28 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1: jid=0:
> Done
> ----------------
>
>
>
>
>
>
>
>
>
>
>
>
> > HTH
> >
> > With warm regards
> >
> > Rajagopal
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
>
> Thanks a lot
>
> Paras.
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 5 Jan 2009 14:18:10 -0600
> From: "Greenseid, Joseph M." <Joseph.Greenseid at ngc.com>
> Subject: [Linux-cluster] problem adding new node to an existing
>        cluster
> To: <linux-cluster at redhat.com>
> Message-ID:
>        <D089B7B0C0FBCD498494B5A0AA74827DDB386E at XMBIL112.northgrum.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> hi all,
>
> i am trying to add a new node to an existing 3 node GFS cluster.
>
> i followed the steps in the online docs for this, so i went onto the 1st
> node in my existing cluster, run system-config-cluster, added a new node and
> fence for it, then propagated that out to the existing nodes, and scp'd the
> cluster.conf file to the new node.
>
> at that point, i confirmed that multipath and mdadm config files were
> synced with my other nodes, the new node can properly see the SAN that
> they're all sharing, etc.
>
> i then started cman, which seemed to start without any trouble.  i tried to
> start clvmd, but it says:
>
> Activating VGs: Skipping clustered volume group san01
>
> my VG is named "san01," so it can see the volume group, it just won't
> activate it for some reason.  any ideas what i'm doing wrong?
>
> thanks,
> --Joe
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> https://www.redhat.com/archives/linux-cluster/attachments/20090105/d4760d53/attachment.html
>
> ------------------------------
>
> Message: 3
> Date: Mon, 5 Jan 2009 15:25:36 -0500 (EST)
> From: Bob Peterson <rpeterso at redhat.com>
> Subject: Re: [Linux-cluster] problem adding new node to an existing
>        cluster
> To: linux clustering <linux-cluster at redhat.com>
> Message-ID:
>        <
> 868569604.2835591231187135219.JavaMail.root at zmail02.collab.prod.int.phx2.redhat.com
> >
>
> Content-Type: text/plain; charset=utf-8
>
> ----- "Joseph M. Greenseid" <Joseph.Greenseid at ngc.com> wrote:
> | hi all,
> |
> | i am trying to add a new node to an existing 3 node GFS cluster.
> |
> | i followed the steps in the online docs for this, so i went onto the
> | 1st node in my existing cluster, run system-config-cluster, added a
> | new node and fence for it, then propagated that out to the existing
> | nodes, and scp'd the cluster.conf file to the new node.
> |
> | at that point, i confirmed that multipath and mdadm config files were
> | synced with my other nodes, the new node can properly see the SAN that
> | they're all sharing, etc.
> |
> | i then started cman, which seemed to start without any trouble. i
> | tried to start clvmd, but it says:
> |
> | Activating VGs: Skipping clustered volume group san01
> |
> | my VG is named "san01," so it can see the volume group, it just won't
> | activate it for some reason. any ideas what i'm doing wrong?
> |
> | thanks,
> | --Joe
>
> Hi Joe,
>
> Make sure that you have clvmd service running on the new node
> ("chkconfig clvmd on" and/or "service clvmd start" as necessary).
> Also, make sure the lock_type is 2 (RHEL4/similar) or 3 (RHEL5/similar)
> in the /etc/lvm/lvm.conf file.
>
> Regards,
>
> Bob Peterson
> Red Hat GFS
>
>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 5 Jan 2009 14:28:12 -0600
> From: "Greenseid, Joseph M." <Joseph.Greenseid at ngc.com>
> Subject: RE: [Linux-cluster] problem adding new node to an existing
>        cluster
> To: "linux clustering" <linux-cluster at redhat.com>
> Message-ID:
>        <D089B7B0C0FBCD498494B5A0AA74827DDB386F at XMBIL112.northgrum.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> ---- "Joseph M. Greenseid" <Joseph.Greenseid at ngc.com> wrote:
> | hi all,
> |
> | i am trying to add a new node to an existing 3 node GFS cluster.
> |
> | i followed the steps in the online docs for this, so i went onto the
> | 1st node in my existing cluster, run system-config-cluster, added a
> | new node and fence for it, then propagated that out to the existing
> | nodes, and scp'd the cluster.conf file to the new node.
> |
> | at that point, i confirmed that multipath and mdadm config files were
> | synced with my other nodes, the new node can properly see the SAN that
> | they're all sharing, etc.
> |
> | i then started cman, which seemed to start without any trouble. i
> | tried to start clvmd, but it says:
> |
> | Activating VGs: Skipping clustered volume group san01
> |
> | my VG is named "san01," so it can see the volume group, it just won't
> | activate it for some reason. any ideas what i'm doing wrong?
> |
> | thanks,
> | --Joe
>
> > Hi Joe,
>
> > Make sure that you have clvmd service running on the new node
> > ("chkconfig clvmd on" and/or "service clvmd start" as necessary).
>
> Hi Bob,
>
> Yes, this problem started when I tried to start clvmd (/sbin/service clvmd
> start).
>
>
> > Also, make sure the lock_type is 2 (RHEL4/similar) or 3 (RHEL5/similar)
> > in the /etc/lvm/lvm.conf file.
>
> Ah, Ok, I believe this may be the trouble.  My lock_type was 1.  I'll
> change it and try again.  Thanks.
>
> --Joe
>
> > Regards,
>
> > Bob Peterson
> > Red Hat GFS
>
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/ms-tnef
> Size: 4399 bytes
> Desc: not available
> Url :
> https://www.redhat.com/archives/linux-cluster/attachments/20090105/6da60c4d/attachment.bin
>
> ------------------------------
>
> Message: 5
> Date: Mon, 5 Jan 2009 15:10:29 -0600
> From: "Greenseid, Joseph M." <Joseph.Greenseid at ngc.com>
> Subject: RE: [Linux-cluster] problem adding new node to an existing
>        cluster
> To: "linux clustering" <linux-cluster at redhat.com>,      "linux clustering"
>        <linux-cluster at redhat.com>
> Message-ID:
>        <D089B7B0C0FBCD498494B5A0AA74827DDB3872 at XMBIL112.northgrum.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> > Also, make sure the lock_type is 2 (RHEL4/similar) or 3 (RHEL5/similar)
> > in the /etc/lvm/lvm.conf file.
>
> This fixed it.  Thanks.
>
> --Joe
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> https://www.redhat.com/archives/linux-cluster/attachments/20090105/0999baeb/attachment.html
>
> ------------------------------
>
> Message: 6
> Date: Mon, 5 Jan 2009 16:01:45 -0600
> From: "Greenseid, Joseph M." <Joseph.Greenseid at ngc.com>
> Subject: RE: [Linux-cluster] problem adding new node to an existing
>        cluster
> To: "linux clustering" <linux-cluster at redhat.com>
> Message-ID:
>        <D089B7B0C0FBCD498494B5A0AA74827DDB3873 at XMBIL112.northgrum.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi,
>
> I have a new question.  When I created this file system a year ago, I
> didn't anticipate needing any additional nodes other than the original 3 I
> set up.  Consequently, I have 3 journals.  Now that I've been told to add a
> fourth node, is there a way to add a journal to an existing file system that
> resides on a volume that has not been expanded (the docs appear to read that
> you can only do it to an expanded volume because the additional journal(s)
> take up additional space).  My file system isn't full, though my volume is
> fully used by the formatted GFS file system.
>
> Is there anything I can do that won't involve destroying my existing file
> system?
>
> Thanks,
> --Joe
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/ms-tnef
> Size: 3699 bytes
> Desc: not available
> Url :
> https://www.redhat.com/archives/linux-cluster/attachments/20090105/ddb0e237/attachment.bin
>
> ------------------------------
>
> Message: 7
> Date: Mon, 5 Jan 2009 18:09:18 -0500 (EST)
> From: Bob Peterson <rpeterso at redhat.com>
> Subject: Re: [Linux-cluster] problem adding new node to an existing
>        cluster
> To: linux clustering <linux-cluster at redhat.com>
> Message-ID:
>        <
> 291064814.51231196957732.JavaMail.root at zmail02.collab.prod.int.phx2.redhat.com
> >
>
> Content-Type: text/plain; charset=utf-8
>
> ----- "Joseph M. Greenseid" <Joseph.Greenseid at ngc.com> wrote:
> | Hi,
> |
> | I have a new question.  When I created this file system a year ago, I
> | didn't anticipate needing any additional nodes other than the original
> | 3 I set up.  Consequently, I have 3 journals.  Now that I've been told
> | to add a fourth node, is there a way to add a journal to an existing
> | file system that resides on a volume that has not been expanded (the
> | docs appear to read that you can only do it to an expanded volume
> | because the additional journal(s) take up additional space).  My file
> | system isn't full, though my volume is fully used by the formatted GFS
> | file system.
> |
> | Is there anything I can do that won't involve destroying my existing
> | file system?
> |
> | Thanks,
> | --Joe
>
> Hi Joe,
>
> Journals for gfs file systems are carved out during mkfs.  The rest of the
> space is used for data and metadata.  So there are only two ways to
> make journals: (1) Do another mkfs which will destroy your file system
> or (2) if you're using lvm, add more storage with something like
> lvresize or lvextend, then use gfs_jadd to add the new journal to the
> new chunk of storage.
>
> We realize that's a pain, and that's why we took away that restriction
> in gfs2.  In gfs2, journals are kept as a hidden part of the file system,
> so they can be added painlessly to an existing file system without
> adding storage.   So I guess a third option would be to convert the file
> system to gfs2 using gfs2_convert, add the journal with gfs2_jadd, then
> use it as gfs2 from then on.  But please be aware that gfs2_convert had
> some
> serious problems until the 5.3 version that was committed to the cluster
> git tree in December, (i.e. the very latest and greatest "RHEL5", "RHEL53",
> "master", "STABLE2" or "STABLE3" versions in the cluster git (source code)
> tree.)  Make ABSOLUTELY CERTAIN that you have a working & recent backup and
> restore option before you try this.  Also, the GFS2 kernel code prior to
> 5.3 is considered tech preview as well, so not ready for production use.
> So if you're not building from source code, you should wait until RHEL5.3
> or Centos5.3 (or similar) before even considering this option.
>
> Regards,
>
> Bob Peterson
> Red Hat GFS
>
>
>
> ------------------------------
>
> Message: 8
> Date: Tue, 6 Jan 2009 07:57:21 -0600
> From: "Greenseid, Joseph M." <Joseph.Greenseid at ngc.com>
> Subject: RE: [Linux-cluster] problem adding new node to an existing
>        cluster
> To: "linux clustering" <linux-cluster at redhat.com>,      "linux clustering"
>        <linux-cluster at redhat.com>
> Message-ID:
>        <D089B7B0C0FBCD498494B5A0AA74827DDB3875 at XMBIL112.northgrum.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> ---- "Joseph M. Greenseid" <Joseph.Greenseid at ngc.com> wrote:
> | Hi,
> |
> | I have a new question.  When I created this file system a year ago, I
> | didn't anticipate needing any additional nodes other than the original
> | 3 I set up.  Consequently, I have 3 journals.  Now that I've been told
> | to add a fourth node, is there a way to add a journal to an existing
> | file system that resides on a volume that has not been expanded (the
> | docs appear to read that you can only do it to an expanded volume
> | because the additional journal(s) take up additional space).  My file
> | system isn't full, though my volume is fully used by the formatted GFS
> | file system.
> |
> | Is there anything I can do that won't involve destroying my existing
> | file system?
> |
> | Thanks,
> | --Joe
>
> > Hi Joe,
>
> > Journals for gfs file systems are carved out during mkfs.  The rest of
> the
> > space is used for data and metadata.  So there are only two ways to
> > make journals: (1) Do another mkfs which will destroy your file system
> > or (2) if you're using lvm, add more storage with something like
> > lvresize or lvextend, then use gfs_jadd to add the new journal to the
> > new chunk of storage.
> >
>
> Ok, so I did understand correctly.  That's at least something positive.  :)
>
>
> > We realize that's a pain, and that's why we took away that restriction
> > in gfs2.  In gfs2, journals are kept as a hidden part of the file system,
> > so they can be added painlessly to an existing file system without
> > adding storage.   So I guess a third option would be to convert the file
> > system to gfs2 using gfs2_convert, add the journal with gfs2_jadd, then
> > use it as gfs2 from then on.  But please be aware that gfs2_convert had
> some
> > serious problems until the 5.3 version that was committed to the cluster
> > git tree in December, (i.e. the very latest and greatest "RHEL5",
> "RHEL53",
> > "master", "STABLE2" or "STABLE3" versions in the cluster git (source
> code)
> > tree.)  Make ABSOLUTELY CERTAIN that you have a working & recent backup
> and
> > restore option before you try this.  Also, the GFS2 kernel code prior to
> > 5.3 is considered tech preview as well, so not ready for production use.
> > So if you're not building from source code, you should wait until RHEL5.3
> > or Centos5.3 (or similar) before even considering this option.
> >
>
>
> Ok, I have an earlier version of GFS2, so I guess I'm going to need to sit
> down and figure out a better strategy for what I've been asked to do.  I
> appreciate the help with my questions, though.  Thanks again.
>
> --Joe
>
> > Regards,
> >
> > Bob Peterson
> > Red Hat GFS
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> https://www.redhat.com/archives/linux-cluster/attachments/20090106/78398c16/attachment.html
>
> ------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> End of Linux-cluster Digest, Vol 57, Issue 5
> ********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090106/53e589bd/attachment.htm>

From garromo at us.ibm.com  Wed Jan  7 20:39:42 2009
From: garromo at us.ibm.com (Gary Romo)
Date: Wed, 7 Jan 2009 13:39:42 -0700
Subject: [Linux-cluster] system-config-cluster Error
Message-ID: <OF0E8EFBDC.35A43999-ON87257537.00705F53-87257537.00717F9A@us.ibm.com>


When I opened system-config-cluster today, I got this error;

Poorly Formed XML Error

A problem was encountered while reading configuration
file /etc/cluster/cluster.conf
Details or the error appear below.  Click the `New` button to create a new
configuration file.
To continue anyway (Not recommended), click the `Ok` button

Relax-NG validity error : Extra element rm in interleave
/etc/cluster/cluster.conf:2: element cluster: Relax-NG validity error :
Element cluster failed to validate content
/etc/cluster/cluster.conf fails to validate

Can anyone tell me what this is and how to correct?  Thanks!

Gary Romo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090107/18d998d4/attachment.htm>

From jumanjiman at gmail.com  Wed Jan  7 21:06:33 2009
From: jumanjiman at gmail.com (Paul Morgan)
Date: Wed, 7 Jan 2009 15:06:33 -0600
Subject: [Linux-cluster] system-config-cluster Error
In-Reply-To: <OF0E8EFBDC.35A43999-ON87257537.00705F53-87257537.00717F9A@us.ibm.com>
References: <OF0E8EFBDC.35A43999-ON87257537.00705F53-87257537.00717F9A@us.ibm.com>
Message-ID: <07646F01-ED12-430D-97AF-5CDCD33CDC7D@gmail.com>



On Jan 7, 2009, at 14:39, Gary Romo <garromo at us.ibm.com> wrote:

> When I opened system-config-cluster today, I got this error;
>
> Poorly Formed XML Error
>
> A problem was encountered while reading configuration file /etc/ 
> cluster/cluster.conf
> Details or the error appear below. Click the `New` button to create  
> a new configuration file.
> To continue anyway (Not recommended), click the `Ok` button
>
> Relax-NG validity error : Extra element rm in interleave
> /etc/cluster/cluster.conf:2: element cluster: Relax-NG validity  
> error : Element cluster failed to validate content
> /etc/cluster/cluster.conf fails to validate
>
> Can anyone tell me what this is and how to correct? Thanks!
>
> Gary Romo
>

Assuming you have a functional cluster:
Somebody-maybe you or another admin-used luci to modify the cluster.
s-c-cluster uses an older XML or doesn't perfectly validate luci's  
version. I ignore the validation error and have yet to see any fallout.

hth,
-paul 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090107/7c49040e/attachment.htm>

From Gary_Hunt at gallup.com  Wed Jan  7 21:46:52 2009
From: Gary_Hunt at gallup.com (Hunt, Gary)
Date: Wed, 7 Jan 2009 15:46:52 -0600
Subject: [Linux-cluster] DELL M600 fencing
Message-ID: <B0176B19AD215F4DA7E9EAC74EF4B0D6041E3314C9@EXCHNG5.noam.gallup.com>

Hello



New to this list and am trying to get a cluster up and running.  I noticed someone added support to the fence_drac agent to support the Dell CMC.  Could I get a link to the repository where the patched agent is at?



Thanks



Gary Hunt

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090107/bc624483/attachment.htm>

From rpeterso at redhat.com  Wed Jan  7 21:57:16 2009
From: rpeterso at redhat.com (Bob Peterson)
Date: Wed, 7 Jan 2009 16:57:16 -0500 (EST)
Subject: [Linux-cluster] system-config-cluster Error
In-Reply-To: <OF0E8EFBDC.35A43999-ON87257537.00705F53-87257537.00717F9A@us.ibm.com>
Message-ID: <1034294076.526991231365436502.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>

----- "Gary Romo" <garromo at us.ibm.com> wrote:
| When I opened system-config-cluster today, I got this error;
| 
| Poorly Formed XML Error
| 
| A problem was encountered while reading configuration file
| /etc/cluster/cluster.conf
| Details or the error appear below. Click the `New` button to create a
| new configuration file.
| To continue anyway (Not recommended), click the `Ok` button
| 
| Relax-NG validity error : Extra element rm in interleave
| /etc/cluster/cluster.conf:2: element cluster: Relax-NG validity error
| : Element cluster failed to validate content
| /etc/cluster/cluster.conf fails to validate
| 
| Can anyone tell me what this is and how to correct? Thanks!
| 
| Gary Romo 

Hi Gary,

Could it be:

http://sources.redhat.com/cluster/wiki/FAQ/GUI#gui_validityerror

Without seeing your cluster.conf it's hard to tell if it's a "real" error.

Regards,

Bob Peterson
Red Hat GFS



From garromo at us.ibm.com  Wed Jan  7 23:48:37 2009
From: garromo at us.ibm.com (Gary Romo)
Date: Wed, 7 Jan 2009 16:48:37 -0700
Subject: [Linux-cluster] system-config-cluster Error
In-Reply-To: <07646F01-ED12-430D-97AF-5CDCD33CDC7D@gmail.com>
Message-ID: <OF6D08B872.F5E686EE-ON87257537.00829BC1-87257537.0082CB7B@us.ibm.com>


                                                                           
             Paul Morgan                                                   
             <jumanjiman at gmail                                             
             .com>                                                      To 
             Sent by:                  linux clustering                    
             linux-cluster-bou         <linux-cluster at redhat.com>          
             nces at redhat.com                                            cc 
                                                                           
                                                                   Subject 
             01/07/2009 02:06          Re: [Linux-cluster]                 
             PM                        system-config-cluster Error         
                                                                           
                                                                           
             Please respond to                                             
             linux clustering                                              
             <linux-cluster at re                                             
                 dhat.com>                                                 
                                                                           
                                                                           









On Jan 7, 2009, at 14:39, Gary Romo <garromo at us.ibm.com> wrote:



      When I opened system-config-cluster today, I got this error;

      Poorly Formed XML Error

      A problem was encountered while reading configuration
      file /etc/cluster/cluster.conf
      Details or the error appear below. Click the `New` button to create a
      new configuration file.
      To continue anyway (Not recommended), click the `Ok` button

      Relax-NG validity error : Extra element rm in interleave
      /etc/cluster/cluster.conf:2: element cluster: Relax-NG validity
      error : Element cluster failed to validate content
      /etc/cluster/cluster.conf fails to validate,

      Can anyone tell me what this is and how to correct? Thanks!

      Gary Romo



Assuming you have a functional cluster:
Somebody-maybe you or another admin-used luci to modify the cluster.
s-c-cluster uses an older XML or doesn't perfectly validate luci's version.
I ignore the validation error and have yet to see any fallout.

hth,
-paul --


We do have a functional cluster.
luci was used.  As long as there us no fallout.  Thank you for your
explination!


Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090107/0629b126/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090107/0629b126/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pic04890.gif
Type: image/gif
Size: 1255 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090107/0629b126/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090107/0629b126/attachment-0002.gif>

From stewart at epits.com.au  Thu Jan  8 02:42:29 2009
From: stewart at epits.com.au (Stewart Walters)
Date: Thu, 08 Jan 2009 11:42:29 +0900
Subject: [Linux-cluster] cman_tool nodes shows different Inc numbers;
	should I be concerned?
Message-ID: <49656815.6070000@epits.com.au>

Hello List Members,

I've just joined, so please forgive me in advance if I break some list 
etiquette :-)

I have a two  node cluster (RHEL5) whereby running "cman_tool nodes" on 
each node net's the following results:

[root at node01 ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M    512   2009-01-08 10:59:53  node01.example.com
   2   M    516   2009-01-08 10:59:54  node02.example.com

[root at node02 ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M    516   2009-01-08 10:59:53  node01.example.com
   2   M    504   2009-01-08 10:35:59  node02.example.com

As you can see the "Inc" numbers are seen as different from both nodes.

First off, should I be concerned that they are different?

And secondly, what is the Inc number signify anyway?  The man page for 
cman_tool doesn't directly describe what an Inc number is for.  I think 
in my travels in trying to answer this question I found a vague 
reference to the fact that it's something to do with openais, but I 
wouldn't mind if someone could confirm this and/or hit me over the head 
with the clue stick.

 From manpage for cman_tools:
<snip>
       Example:

       In this example we have a five node cluster that has experienced 
a net-
       work partition. Here is the output of cman_tool nodes from all 
systems:
       Node  Sts   Inc   Joined               Name
          1   M   2372   2007-11-05 02:58:55  node-01.example.com
          2   d   2376   2007-11-05 02:58:56  node-02.example.com
          3   d   2376   2007-11-05 02:58:56  node-03.example.com
          4   M   2376   2007-11-05 02:58:56  node-04.example.com
          5   M   2376   2007-11-05 02:58:56  node-05.example.com

       Node  Sts   Inc   Joined               Name
          1   d   2372   2007-11-05 02:58:55  node-01.example.com
          2   M   2376   2007-11-05 02:58:56  node-02.example.com
          3   M   2376   2007-11-05 02:58:56  node-03.example.com
          4   d   2376   2007-11-05 02:58:56  node-04.example.com
          5   d   2376   2007-11-05 02:58:56  node-05.example.com

       Node  Sts   Inc   Joined               Name
          1   d   2372   2007-11-05 02:58:55  node-01.example.com
          2   M   2376   2007-11-05 02:58:56  node-02.example.com
          3   M   2376   2007-11-05 02:58:56  node-03.example.com
          4   d   2376   2007-11-05 02:58:56  node-04.example.com
          5   d   2376   2007-11-05 02:58:56  node-05.example.com

       Node  Sts   Inc   Joined               Name
          1   M   2372   2007-11-05 02:58:55  node-01.example.com
          2   d   2376   2007-11-05 02:58:56  node-02.example.com
          3   d   2376   2007-11-05 02:58:56  node-03.example.com
          4   M   2376   2007-11-05 02:58:56  node-04.example.com
          5   M   2376   2007-11-05 02:58:56  node-05.example.com

       Node  Sts   Inc   Joined               Name
          1   M   2372   2007-11-05 02:58:55  node-01.example.com
          2   d   2376   2007-11-05 02:58:56  node-02.example.com
          3   d   2376   2007-11-05 02:58:56  node-03.example.com
          4   M   2376   2007-11-05 02:58:56  node-04.example.com
          5   M   2376   2007-11-05 02:58:56  node-05.example.com
<snip>

At least in the man page example, node-01 consistently has Inc number 
2372, as seen consistently from all nodes.

But as you can see in my cluster, both nodes register a different Inc 
number for themselves and the other.

Thanks in advance for any information you can provide me regarding this.

Kind Regards,

Stewart



From ccaulfie at redhat.com  Thu Jan  8 08:25:56 2009
From: ccaulfie at redhat.com (Chrissie Caulfield)
Date: Thu, 08 Jan 2009 08:25:56 +0000
Subject: [Linux-cluster] cman_tool nodes shows different Inc numbers;
	should I be concerned?
In-Reply-To: <49656815.6070000@epits.com.au>
References: <49656815.6070000@epits.com.au>
Message-ID: <4965B894.7080807@redhat.com>

Stewart Walters wrote:
> Hello List Members,
> 
> I've just joined, so please forgive me in advance if I break some list
> etiquette :-)
> 
> I have a two  node cluster (RHEL5) whereby running "cman_tool nodes" on
> each node net's the following results:
> 
> [root at node01 ~]# cman_tool nodes
> Node  Sts   Inc   Joined               Name
>   1   M    512   2009-01-08 10:59:53  node01.example.com
>   2   M    516   2009-01-08 10:59:54  node02.example.com
> 
> [root at node02 ~]# cman_tool nodes
> Node  Sts   Inc   Joined               Name
>   1   M    516   2009-01-08 10:59:53  node01.example.com
>   2   M    504   2009-01-08 10:35:59  node02.example.com
> 
> As you can see the "Inc" numbers are seen as different from both nodes.
> 
> First off, should I be concerned that they are different?

No, it's perfectly normal that they are different.

> And secondly, what is the Inc number signify anyway?  The man page for
> cman_tool doesn't directly describe what an Inc number is for.  I think
> in my travels in trying to answer this question I found a vague
> reference to the fact that it's something to do with openais, but I
> wouldn't mind if someone could confirm this and/or hit me over the head
> with the clue stick.
> 

Inc is the cluster incarnation number at the time the node joined. It's
a totally pointless piece of information that I think we'll remove in
future releases ;-)


Chrissie



From Brett.Dellegrazie at intact-is.com  Thu Jan  8 11:48:49 2009
From: Brett.Dellegrazie at intact-is.com (Brett Delle Grazie)
Date: Thu, 8 Jan 2009 11:48:49 -0000
Subject: [Linux-cluster] Load share http servers using clusterip with
	failover
Message-ID: <EF979E950B86414987ABEAC3713A3E1A012CFD2F@ukslexb03.uk.logicalis.local>

Hi,

I have a configured two-node cluster with some GFS file systems on them.

Those servers also run http servers and I'd like to load-share the HTTP
servers without
putting a hardware load balancer in front of them.

I read about clusterIP: http://www.linux-ha.org/ClusterIP

and was wondering if anyone has managed to use this iptables capability 
to get a service running in load-shared fashion across multiple nodes
with the 
failover of a node handled by rgmanager?

Is there an example of this anywhere that someone could point me to?
Has anyone got a resource script of this type they would be willing to
share?

Thanks in advance,

Best regards, 

Brett


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090108/949e7087/attachment.htm>

From td3201 at gmail.com  Thu Jan  8 12:07:13 2009
From: td3201 at gmail.com (Terry)
Date: Thu, 8 Jan 2009 06:07:13 -0600
Subject: [Linux-cluster] failover domain not working as expected
Message-ID: <8ee061010901080407j3b4162e5r308f965da80cf62a@mail.gmail.com>

Hello,

I have an NFS cluster that isn't quite working as expected.  I intend
to distribute several volumes between both nodes of my cluster and in
the event one node goes down, the other picks up the full load.  I had
a situation where I had to reboot one of the nodes.  I did so and all
the services were restarted on the other node, which is great.  Then,
after a minute or so, some of the services stopped and stayed stopped.
 Here are some relevant parts of my config, anyone see anything
unusual:?

        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="omadvnfs01b" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="omadvnfs01b-drac"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="omadvnfs01a" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="omadvnfs01a-drac"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_drac" ipaddr="10.98.1.211"
login="root" name="omadvnfs01a-drac" passwd="foobar"/>
                <fencedevice agent="fence_drac" ipaddr="10.98.1.212"
login="root" name="omadvnfs01b-drac" passwd="foobar"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="fd_omadvnfs01a-nfs"
nofailback="0" ordered="1" restricted="0">
                                <failoverdomainnode name="omadvnfs01a"
priority="1"/>
                                <failoverdomainnode name="omadvnfs01b"
priority="2"/>
                        </failoverdomain>
                        <failoverdomain name="fd_omadvnfs01b-nfs"
nofailback="0" ordered="1" restricted="0">
                                <failoverdomainnode name="omadvnfs01b"
priority="1"/>
                                <failoverdomainnode name="omadvnfs01a"
priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>

                <service autostart="1" domain="fd_omadvnfs01a-nfs"
exclusive="0" name="omadvnfs01-nfs-a" recovery="relocate">
                        <ip ref="10.199.1.113"/>
                        <fs fstype="ext3" ref="omadvnfs01-data01a">
                                <nfsexport ref="data01a">
                                        <nfsclient ref="omadvdss01a"/>
                                        <nfsclient ref="omadvdss01b"/>
                                        <nfsclient ref="omadvdss01c"/>
                                </nfsexport>
                        </fs>
                </service>
                <service autostart="1" domain="fd_omadvnfs01b-nfs"
exclusive="0" name="omadvnfs01-nfs-b" recovery="relocate">
                        <ip ref="10.199.1.114"/>
                        <fs fstype="ext3" ref="omadvnfs01-data01b">
                                <nfsexport ref="data01b">
                                        <nfsclient ref="omadvdss01a"/>
                                        <nfsclient ref="omadvdss01b"/>
                                        <nfsclient ref="omadvdss01c"/>
                                </nfsexport>
                        </fs>
                </service>



From jeff.jansen at kkoncepts.net  Thu Jan  8 13:22:59 2009
From: jeff.jansen at kkoncepts.net (Jeff Jansen)
Date: Thu, 08 Jan 2009 21:22:59 +0800
Subject: [Linux-cluster] Qdisk in initial quorum
Message-ID: <4965FE33.10509@kkoncepts.net>

Is it possible to use a qdisk to ATTAIN quorum or does it only SUSTAIN quorum?

I have a STABLE2 version 2 node cluster that is set up with
'expected_votes="3"'.  There are two physical nodes and a qdisk, which at the
moment is simply a ping heuristic.

But on start-up qdiskd can't run unless the cluster already has a quorum.  I see
this in the logs when qdiskd is started:

qdiskd[2624]: <crit> Connection to CCSD failed; cannot start
qdiskd[2624]: <crit> Configuration failed
ccsd[3258]: Cluster is not quorate.  Refusing connection.
ccsd[3258]: Error while processing connect: Connection refused

Once the two nodes join together and form a quorum, then qdiskd (if it's
restarted) will start correctly on both nodes and becomes part of the quorum.
>From then everything happens as expected and one node can maintain quorum as
long as it can "see" the qdisk.

But I'd like the qdisk to be used to ATTAIN quorum at start up if necessary.  If
the whole cluster gets shut down (which actually happened a while ago when our
data center had a "power incident") :-) and only one node boots back up for some
reason, then I'd like it to form a quorum with the qdisk.  But at the moment it
doesn't seem possible since qdiskd refuses to start without a pre-existing quorum.

TIA

Jeff Jansen



From pradhanparas at gmail.com  Thu Jan  8 18:39:10 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Thu, 8 Jan 2009 12:39:10 -0600
Subject: [Linux-cluster] Re: Fencing test
In-Reply-To: <8b711df40901051011x79066243g38108439ffb1075f@mail.gmail.com>
References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com>
	<8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com>
	<8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com>
	<8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com>
	<8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com>
	<8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com>
	<8786b91c0901050623m46c79628i795e18dda28474c9@mail.gmail.com>
	<8b711df40901051011x79066243g38108439ffb1075f@mail.gmail.com>
Message-ID: <8b711df40901081039m4351f8b9te7d3a2a10e118328@mail.gmail.com>

On Mon, Jan 5, 2009 at 12:11 PM, Paras pradhan <pradhanparas at gmail.com> wrote:
> hi,
>
> On Mon, Jan 5, 2009 at 8:23 AM, Rajagopal Swaminathan
> <raju.rajsand at gmail.com> wrote:
>> Greetings,
>>
>> On Sat, Jan 3, 2009 at 4:18 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>
>>> Here I am using 4 nodes.
>>>
>>> Node 1) That runs luci
>>> Node 2) This is my iscsi shared storage where my virutal machine(s) resides
>>> Node 3) First node in my two node cluster
>>> Node 4) Second node in my two node cluster
>>>
>>> All of them are connected simply to an unmanaged 16 port switch.
>>
>> Luci need not require a separate node to run. it can run on one of the
>> member nodes (node 3 | 4).
>
> OK.
>
>>
>> what does clustat say?
>
> Here is my clustat o/p:
>
> -----------
>
> [root at ha1lx ~]# clustat
> Cluster Status for ipmicluster @ Mon Jan  5 12:00:10 2009
> Member Status: Quorate
>
>  Member Name                                                     ID   Status
>  ------ ----                                                     ---- ------
>  10.42.21.29                                                         1
> Online, rgmanager
>  10.42.21.27                                                         2
> Online, Local, rgmanager
>
>  Service Name
> Owner (Last)                                                     State
>  ------- ----
> ----- ------                                                     -----
>  vm:linux64
> 10.42.21.27
> started
> [root at ha1lx ~]#
> ------------------------
>
>
> 10.42.21.27 is node3 and 10.42.21.29 is node4
>
>
>
>>
>> Can you post your cluster.conf here?
>
> Here is my cluster.conf
>
> --
> [root at ha1lx cluster]# more cluster.conf
> <?xml version="1.0"?>
> <cluster alias="ipmicluster" config_version="8" name="ipmicluster">
>        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>        <clusternodes>
>                <clusternode name="10.42.21.29" nodeid="1" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device name="fence2"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="10.42.21.27" nodeid="2" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device name="fence1"/>
>                                </method>
>                        </fence>
>                </clusternode>
>        </clusternodes>
>        <cman expected_votes="1" two_node="1"/>
>        <fencedevices>
>                <fencedevice agent="fence_ipmilan" ipaddr="10.42.21.28"
> login="admin" name="fence1" passwd="admin"/>
>                <fencedevice agent="fence_ipmilan" ipaddr="10.42.21.30"
> login="admin" name="fence2" passwd="admin"/>
>        </fencedevices>
>        <rm>
>                <failoverdomains>
>                        <failoverdomain name="myfd" nofailback="0" ordered="1" restricted="0">
>                                <failoverdomainnode name="10.42.21.29" priority="2"/>
>                                <failoverdomainnode name="10.42.21.27" priority="1"/>
>                        </failoverdomain>
>                </failoverdomains>
>                <resources/>
>                <vm autostart="1" domain="myfd" exclusive="0" migrate="live"
> name="linux64" path="/guest_roots" recovery="restart"/>
>        </rm>
> </cluster>
> ------
>
>
> Here:
>
> 10.42.21.28 is IPMI interface in node3
> 10.42.21.30 is IPMI interface in node4
>
>
>
>
>
>
>
>
>>
>> When you pull out the network cable *and* plug it back  in say node 3,
>> , what messages appear in the /var/log/messages if Node 4 (if any)?
>> (sorry for the repitition, but messages are necessary here to make any
>> sense of the situation)
>>
>
> Ok here is the log in node 4 after i disconnect the network cable in node3.
>
> -----------
>
> Jan  5 12:05:24 ha2lx openais[4988]: [TOTEM] The token was lost in the
> OPERATIONAL state.
> Jan  5 12:05:24 ha2lx openais[4988]: [TOTEM] Receive multicast socket
> recv buffer size (288000 bytes).
> Jan  5 12:05:24 ha2lx openais[4988]: [TOTEM] Transmit multicast socket
> send buffer size (262142 bytes).
> Jan  5 12:05:24 ha2lx openais[4988]: [TOTEM] entering GATHER state from 2.
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] entering GATHER state from 0.
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] Creating commit token
> because I am the rep.
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] Saving state aru 76 high
> seq received 76
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] Storing new sequence id
> for ring ac
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] entering COMMIT state.
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] entering RECOVERY state.
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] position [0] member 10.42.21.29:
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] previous ring seq 168 rep
> 10.42.21.27
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] aru 76 high delivered 76
> received flag 1
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] Did not need to originate
> any messages in recovery.
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] Sending initial ORF token
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] CLM CONFIGURATION CHANGE
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] New Configuration:
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ]    r(0) ip(10.42.21.29)
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] Members Left:
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ]    r(0) ip(10.42.21.27)
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] Members Joined:
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] CLM CONFIGURATION CHANGE
> Jan  5 12:05:28 ha2lx kernel: dlm: closing connection to node 2
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] New Configuration:
> Jan  5 12:05:28 ha2lx fenced[5004]: 10.42.21.27 not a cluster member
> after 0 sec post_fail_delay
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ]    r(0) ip(10.42.21.29)
> Jan  5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Trying to acquire journal lock...
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] Members Left:
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] Members Joined:
> Jan  5 12:05:28 ha2lx openais[4988]: [SYNC ] This node is within the
> primary component and will provide service.
> Jan  5 12:05:28 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state.
> Jan  5 12:05:28 ha2lx openais[4988]: [CLM  ] got nodejoin message 10.42.21.29
> Jan  5 12:05:28 ha2lx openais[4988]: [CPG  ] got joinlist message from node 1
> Jan  5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Looking at journal...
> Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Acquiring the transaction lock...
> Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Replaying journal...
> Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Replayed 0 of 0 blocks
> Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Found 0 revoke tags
> Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Journal replayed in 1s
> Jan  5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: jid=1: Done
> ------------------
>
> Now when I plug back my cable to node3, node 4 reboots and here is the
> quickly grabbed log in node4
>
>
> --
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] entering GATHER state from 11.
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] Saving state aru 1d high
> seq received 1d
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] Storing new sequence id
> for ring b0
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] entering COMMIT state.
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] entering RECOVERY state.
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] position [0] member 10.42.21.27:
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep
> 10.42.21.27
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 16 high delivered 16
> received flag 1
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] position [1] member 10.42.21.29:
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep
> 10.42.21.29
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 1d high delivered 1d
> received flag 1
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] Did not need to originate
> any messages in recovery.
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] CLM CONFIGURATION CHANGE
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] New Configuration:
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ]    r(0) ip(10.42.21.29)
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] Members Left:
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] Members Joined:
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] CLM CONFIGURATION CHANGE
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] New Configuration:
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ]    r(0) ip(10.42.21.27)
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ]    r(0) ip(10.42.21.29)
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] Members Left:
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ] Members Joined:
> Jan  5 12:07:12 ha2lx openais[4988]: [CLM  ]    r(0) ip(10.42.21.27)
> Jan  5 12:07:12 ha2lx openais[4988]: [SYNC ] This node is within the
> primary component and will provide service.
> Jan  5 12:07:12 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state.
> Jan  5 12:07:12 ha2lx openais[4988]: [MAIN ] Killing node 10.42.21.27
> because it has rejoined the cluster with existing state
> Jan  5 12:07:12 ha2lx openais[4988]: [CMAN ] cman killed by node 2
> because we rejoined the cluster without a full restart
> Jan  5 12:07:12 ha2lx gfs_controld[5016]: groupd_dispatch error -1 errno 11
> Jan  5 12:07:12 ha2lx gfs_controld[5016]: groupd connection died
> Jan  5 12:07:12 ha2lx gfs_controld[5016]: cluster is down, exiting
> Jan  5 12:07:12 ha2lx dlm_controld[5010]: cluster is down, exiting
> Jan  5 12:07:12 ha2lx kernel: dlm: closing connection to node 1
> Jan  5 12:07:12 ha2lx fenced[5004]: cluster is down, exiting
> -------
>
>
> Also here is the log of node3:
>
> --
> [root at ha1lx ~]# tail -f /var/log/messages
> Jan  5 12:07:24 ha1lx openais[26029]: [TOTEM] entering OPERATIONAL state.
> Jan  5 12:07:24 ha1lx openais[26029]: [CLM  ] got nodejoin message 10.42.21.27
> Jan  5 12:07:24 ha1lx openais[26029]: [CLM  ] got nodejoin message 10.42.21.27
> Jan  5 12:07:24 ha1lx openais[26029]: [CPG  ] got joinlist message from node 2
> Jan  5 12:07:27 ha1lx ccsd[26019]: Attempt to close an unopened CCS
> descriptor (4520670).
> Jan  5 12:07:27 ha1lx ccsd[26019]: Error while processing disconnect:
> Invalid request descriptor
> Jan  5 12:07:27 ha1lx fenced[26045]: fence "10.42.21.29" success
> Jan  5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1:
> jid=0: Trying to acquire journal lock...
> Jan  5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1:
> jid=0: Looking at journal...
> Jan  5 12:07:28 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1: jid=0: Done
> ----------------
>
>
>
>
>
>
>
>
>
>
>
>
>> HTH
>>
>> With warm regards
>>
>> Rajagopal
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
> Thanks a lot
>
> Paras.
>


In an act to solve my fencing issue in my 2 node cluster, i tried to
run fence_ipmi to check if fencing is working or not. I need to know
what is my problem

-
[root at ha1lx ~]# fence_ipmilan -a 10.42.21.28 -o off -l admin -p admin
Powering off machine @ IPMI:10.42.21.28...ipmilan: Failed to connect
after 30 seconds
Failed
[root at ha1lx ~]#
---------------


Here 10.42.21.28 is an IP address assigned to IPMI interface and I am
running this command in the same host.


Thanks
Paras.



From Bevan.Broun at ardec.com.au  Thu Jan  8 22:32:53 2009
From: Bevan.Broun at ardec.com.au (Bevan Broun)
Date: Fri, 9 Jan 2009 09:32:53 +1100
Subject: [Linux-cluster] Qdisk in initial quorum
In-Reply-To: <4965FE33.10509@kkoncepts.net>
References: <4965FE33.10509@kkoncepts.net>
Message-ID: <6008E5CED89FD44A86D3C376519E1DB2102553963B@megatron.ms.a2end.com>

Hi Jeff

I set up a 2 node cluster with qdisk and had the behavior you are expecting. At least I get a running cluster with 2 votes when only 1 node is booted up. So it should work.

I have

<cman expected_votes="3" two_node="0"/>

And
        <quorumd interval="3" label="SOMENAME" min_score="1" tko="10" votes="1">
                <heuristic interval="2" program="ping -c3 -t2 SOMEIPADDR" score="1"/>
        </quorumd>

This is on RH-5.1.

Bevan Broun
Solutions Architect
Ardec International

http://www.ardec.com.au
http://www.lisasoft.com
http://www.terrapages.com
Sydney
-----------------------
Suite 112,The Lower Deck
19-21 Jones Bay Wharf
Pirrama Road, Pyrmont 2009
Ph:  +61 2 8570 5000
Fax: +61 2 8570 5099
-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Jansen
Sent: Friday, 9 January 2009 12:23 AM
To: linux clustering
Subject: [Linux-cluster] Qdisk in initial quorum

Is it possible to use a qdisk to ATTAIN quorum or does it only SUSTAIN quorum?

I have a STABLE2 version 2 node cluster that is set up with
'expected_votes="3"'.  There are two physical nodes and a qdisk, which at the
moment is simply a ping heuristic.

But on start-up qdiskd can't run unless the cluster already has a quorum.  I see
this in the logs when qdiskd is started:

qdiskd[2624]: <crit> Connection to CCSD failed; cannot start
qdiskd[2624]: <crit> Configuration failed
ccsd[3258]: Cluster is not quorate.  Refusing connection.
ccsd[3258]: Error while processing connect: Connection refused

Once the two nodes join together and form a quorum, then qdiskd (if it's
restarted) will start correctly on both nodes and becomes part of the quorum.
>From then everything happens as expected and one node can maintain quorum as
long as it can "see" the qdisk.

But I'd like the qdisk to be used to ATTAIN quorum at start up if necessary.  If
the whole cluster gets shut down (which actually happened a while ago when our
data center had a "power incident") :-) and only one node boots back up for some
reason, then I'd like it to form a quorum with the qdisk.  But at the moment it
doesn't seem possible since qdiskd refuses to start without a pre-existing quorum.

TIA

Jeff Jansen

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.



From raju.rajsand at gmail.com  Fri Jan  9 04:57:51 2009
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Fri, 9 Jan 2009 10:27:51 +0530
Subject: [Linux-cluster] Re: Fencing test
In-Reply-To: <8b711df40901081039m4351f8b9te7d3a2a10e118328@mail.gmail.com>
References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com>
	<8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com>
	<8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com>
	<8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com>
	<8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com>
	<8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com>
	<8786b91c0901050623m46c79628i795e18dda28474c9@mail.gmail.com>
	<8b711df40901051011x79066243g38108439ffb1075f@mail.gmail.com>
	<8b711df40901081039m4351f8b9te7d3a2a10e118328@mail.gmail.com>
Message-ID: <8786b91c0901082057t63abc80ct6ae041873a859bf@mail.gmail.com>

Greetings,

On Fri, Jan 9, 2009 at 12:09 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>
>
> In an act to solve my fencing issue in my 2 node cluster, i tried to
> run fence_ipmi to check if fencing is working or not. I need to know
> what is my problem
>
> -
> [root at ha1lx ~]# fence_ipmilan -a 10.42.21.28 -o off -l admin -p admin
> Powering off machine @ IPMI:10.42.21.28...ipmilan: Failed to connect
> after 30 seconds
> Failed
> [root at ha1lx ~]#
> ---------------
>
>
> Here 10.42.21.28 is an IP address assigned to IPMI interface and I am
> running this command in the same host.
>

Sorry couldn't respond earlier as I do this on personal time (which as
useual limited for us IT guys and gals ;-) ) and not during work per
se..

Do not run fence script from the node that you want to fence.

Let us say you want to fence node 3.
1. Try pinging the node 3's IPMI from node 4. I should be successful
2. Issue the fence command from Node 4 with IP of Node 3 IPMI as argument .


HTH

With warm regards

Rajagopal



From pradhanparas at gmail.com  Fri Jan  9 05:22:34 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Thu, 8 Jan 2009 23:22:34 -0600
Subject: [Linux-cluster] Re: Fencing test
In-Reply-To: <8786b91c0901082057t63abc80ct6ae041873a859bf@mail.gmail.com>
References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com>
	<8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com>
	<8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com>
	<8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com>
	<8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com>
	<8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com>
	<8786b91c0901050623m46c79628i795e18dda28474c9@mail.gmail.com>
	<8b711df40901051011x79066243g38108439ffb1075f@mail.gmail.com>
	<8b711df40901081039m4351f8b9te7d3a2a10e118328@mail.gmail.com>
	<8786b91c0901082057t63abc80ct6ae041873a859bf@mail.gmail.com>
Message-ID: <8b711df40901082122r5de5b6candd56b61090fdc53a@mail.gmail.com>

On Thu, Jan 8, 2009 at 10:57 PM, Rajagopal Swaminathan
<raju.rajsand at gmail.com> wrote:
> Greetings,
>
> On Fri, Jan 9, 2009 at 12:09 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>
>>
>> In an act to solve my fencing issue in my 2 node cluster, i tried to
>> run fence_ipmi to check if fencing is working or not. I need to know
>> what is my problem
>>
>> -
>> [root at ha1lx ~]# fence_ipmilan -a 10.42.21.28 -o off -l admin -p admin
>> Powering off machine @ IPMI:10.42.21.28...ipmilan: Failed to connect
>> after 30 seconds
>> Failed
>> [root at ha1lx ~]#
>> ---------------
>>
>>
>> Here 10.42.21.28 is an IP address assigned to IPMI interface and I am
>> running this command in the same host.
>>
>
> Sorry couldn't respond earlier as I do this on personal time (which as
> useual limited for us IT guys and gals ;-) ) and not during work per
> se..
>
> Do not run fence script from the node that you want to fence.
>
> Let us say you want to fence node 3.
> 1. Try pinging the node 3's IPMI from node 4. I should be successful
> 2. Issue the fence command from Node 4 with IP of Node 3 IPMI as argument .
>
>
> HTH
>
> With warm regards
>
> Rajagopal
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

Thanks will try that. Did u get a chance to see my cluster.conf file?


Paras.



From chattygk at gmail.com  Fri Jan  9 09:40:04 2009
From: chattygk at gmail.com (Chaitanya Kulkarni)
Date: Fri, 9 Jan 2009 15:10:04 +0530
Subject: [Linux-cluster] About the ccs_test tool
Message-ID: <1ad236320901090140t294c2468w3b42cbfa7bfb7347@mail.gmail.com>

Hi,

I am new to the RHEL cluster. I would like to know how we can write queries
for the ccs_test tool and how they actually fetch the information from the
cluster.

Any help would be much appreciated.

Thanks,
Chaitanya.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090109/04824bbc/attachment.htm>

From chattygk at gmail.com  Fri Jan  9 09:43:33 2009
From: chattygk at gmail.com (Chaitanya Kulkarni)
Date: Fri, 9 Jan 2009 15:13:33 +0530
Subject: [Linux-cluster] Resource State
Message-ID: <1ad236320901090143g23043925n52e5ca7855a95149@mail.gmail.com>

Hi,

When we use the clustat command, we get to know about the Status of the
cluster Service (or resource group). In similar way, is there any CLI
command using which we can get to know about the Status of the Resources
(ip, fs, nfsexport, script, etc) of the Service?

Any help will be much appreciated.

Thanks,
Chaitanya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090109/4ad7fe13/attachment.htm>

From Alain.Moulle at bull.net  Fri Jan  9 10:47:02 2009
From: Alain.Moulle at bull.net (Alain.Moulle)
Date: Fri, 09 Jan 2009 11:47:02 +0100
Subject: [Linux-cluster] cman-2.0.95-1.el5 / question about a problem when
	launching cman
Message-ID: <49672B26.4020306@bull.net>

Hi

Release : cman-2.0.95-1.el5
(but same problem with 2.0.98)

I face a problem when launching cman on a two-node cluster :

1. Launching cman on node 1 : OK
2. When launching cman on node 2, the log on node1 gives :
    cman killed by node 2 because we rejoined the cluster without a full 
restart

Any idea ? knowing that my cluster.conf is likewise (note the use of gfs 
if it could
be linked to ...) :
<?xml version="1.0"?>
<cluster config_version="4" name="HA_TEST">
        <fence_daemon clean_start="1" post_fail_delay="0" 
post_join_delay="60"/>
        <clusternodes>
                <clusternode name="node1" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="node1fence" 
option="reboot"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node2" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="node2fence" 
option="reboot"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman cluster_id="0" expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ipmilan" ipaddr="12.1.1.80" 
login="administrator" name="node1fence" passwd="administrator"/>
                <fencedevice agent="fence_ipmilan" ipaddr="12.1.1.81" 
login="administrator" name="node2fence" passwd="administrator"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="MgmtNodes" ordered="0" 
restricted="0">
                                <failoverdomainnode name="node1" 
priority="1"/>
                                <failoverdomainnode name="node2" 
priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <service domain="MgmtNodes" name="HA_MGMT" autostart="0" 
recovery="relocate">
                                <ip address="10.0.0.65/8" monitor_link="1"/>
                                <ip address="172.16.118.118/24" 
monitor_link="1"/>
                                <clusterfs 
device="LABEL=HA_MGMT:ganglia" mountpoint="/var/lib/ganglia/rrds" 
force_unmount="0" fstype="gfs2" name="nfsha2" options=""/>
                                <script file="/usr/sbin/haservices" 
name="haservices"/>
                </service>
        </rm>
        <logging syslog_facility="daemon"/>
        <totem token="21000"/>
</cluster>

Thanks
Regards
Alain Moull?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090109/ac20eab8/attachment.htm>

From grimme at atix.de  Fri Jan  9 11:10:14 2009
From: grimme at atix.de (Marc Grimme)
Date: Fri, 9 Jan 2009 12:10:14 +0100
Subject: [Linux-cluster] cman-2.0.95-1.el5 / question about a problem when
	launching cman
In-Reply-To: <49672B26.4020306@bull.net>
References: <49672B26.4020306@bull.net>
Message-ID: <200901091210.15119.grimme@atix.de>

Hi,
I'm having "exactly" the same problem with some clusters (Version: 
cman-2.0.84-2.el5_2.2,..) 

Is it so that if you reboot the node that was killed, it will rejoin the 
cluster without being killed? And does it only happen if you start the whole 
cluster from scratch?

I didn't figure out the whole picture behind it but I think it is related to 
IGMP,openais and cman. At least it fells like the same behaviour I'm 
experiencing.

Somehow it seems to be related to the networkswitches and IGMP Version being 
used (I don't have it on all RHEL5 clusters but on the majority running 
RHEl5U2+). I'm still investigating on this issue.

Strange thing.

Marc.

On Friday 09 January 2009 11:47:02 Alain.Moulle wrote:
> Hi
>
> Release : cman-2.0.95-1.el5
> (but same problem with 2.0.98)
>
> I face a problem when launching cman on a two-node cluster :
>
> 1. Launching cman on node 1 : OK
> 2. When launching cman on node 2, the log on node1 gives :
>     cman killed by node 2 because we rejoined the cluster without a full
> restart
>
> Any idea ? knowing that my cluster.conf is likewise (note the use of gfs
> if it could
> be linked to ...) :
> <?xml version="1.0"?>
> <cluster config_version="4" name="HA_TEST">
>         <fence_daemon clean_start="1" post_fail_delay="0"
> post_join_delay="60"/>
>         <clusternodes>
>                 <clusternode name="node1" nodeid="1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="node1fence"
> option="reboot"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="node2" nodeid="2" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="node2fence"
> option="reboot"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman cluster_id="0" expected_votes="1" two_node="1"/>
>         <fencedevices>
>                 <fencedevice agent="fence_ipmilan" ipaddr="12.1.1.80"
> login="administrator" name="node1fence" passwd="administrator"/>
>                 <fencedevice agent="fence_ipmilan" ipaddr="12.1.1.81"
> login="administrator" name="node2fence" passwd="administrator"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="MgmtNodes" ordered="0"
> restricted="0">
>                                 <failoverdomainnode name="node1"
> priority="1"/>
>                                 <failoverdomainnode name="node2"
> priority="2"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <service domain="MgmtNodes" name="HA_MGMT" autostart="0"
> recovery="relocate">
>                                 <ip address="10.0.0.65/8"
> monitor_link="1"/> <ip address="172.16.118.118/24"
> monitor_link="1"/>
>                                 <clusterfs
> device="LABEL=HA_MGMT:ganglia" mountpoint="/var/lib/ganglia/rrds"
> force_unmount="0" fstype="gfs2" name="nfsha2" options=""/>
>                                 <script file="/usr/sbin/haservices"
> name="haservices"/>
>                 </service>
>         </rm>
>         <logging syslog_facility="daemon"/>
>         <totem token="21000"/>
> </cluster>
>
> Thanks
> Regards
> Alain Moull?



-- 
Gruss / Regards,

Marc Grimme
Phone: +49-89 452 3538-14
http://www.atix.de/               http://www.open-sharedroot.org/

ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 |
85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org

------------------------------------------------------------
*** Besuchen Sie uns auf dem ATIX IT Solution Day: Linux Cluster-Technolgien, 
am 05.02.2009 in Neuss b. Koeln/Duesseldorf!
www.atix.de/event-archiv/atix-it-solution-day-linux-neuss ***
------------------------------------------------------------

Registergericht: Amtsgericht Muenchen, Registernummer: HRB 168930, USt.-Id.: 
DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas Merz (Vors.) |
Vorsitzender des Aufsichtsrats: Dr. Martin Buss




From Brett.Dellegrazie at intact-is.com  Sat Jan 10 10:45:31 2009
From: Brett.Dellegrazie at intact-is.com (Brett Delle Grazie)
Date: Sat, 10 Jan 2009 10:45:31 -0000
Subject: [Linux-cluster] OCF Compliant resource scripts from Heartbeat - can
	these be used?
Message-ID: <EF979E950B86414987ABEAC3713A3E1A012CFFCE@ukslexb03.uk.logicalis.local>

Hi,

There are a couple of OCF compliant resources from the heartbeat project
that I would 
like to use - is it possible to do so?
Can anyone forsee any problems with this?

Specifically, ipaddr2, which contains the CLUSTERIP facility previously
emailed about and
db2 for which there is no equivalent in the redhat cluster suite.

Thanks,

Best regards, 

Brett


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090110/d7dc4f31/attachment.htm>

From stewart at epits.com.au  Sun Jan 11 22:00:42 2009
From: stewart at epits.com.au (Stewart Walters)
Date: Mon, 12 Jan 2009 07:00:42 +0900
Subject: [Linux-cluster] Re: Fencing test
In-Reply-To: <8b711df40901082122r5de5b6candd56b61090fdc53a@mail.gmail.com>
References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com>	<8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com>	<8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com>	<8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com>	<8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com>	<8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com>	<8786b91c0901050623m46c79628i795e18dda28474c9@mail.gmail.com>	<8b711df40901051011x79066243g38108439ffb1075f@mail.gmail.com>	<8b711df40901081039m4351f8b9te7d3a2a10e118328@mail.gmail.com>	<8786b91c0901082057t63abc80ct6ae041873a859bf@mail.gmail.com>
	<8b711df40901082122r5de5b6candd56b61090fdc53a@mail.gmail.com>
Message-ID: <496A6C0A.3090806@epits.com.au>

Paras pradhan wrote:
> On Thu, Jan 8, 2009 at 10:57 PM, Rajagopal Swaminathan
> <raju.rajsand at gmail.com> wrote:
>   
>> Greetings,
>>
>> On Fri, Jan 9, 2009 at 12:09 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>     
>>> In an act to solve my fencing issue in my 2 node cluster, i tried to
>>> run fence_ipmi to check if fencing is working or not. I need to know
>>> what is my problem
>>>
>>> -
>>> [root at ha1lx ~]# fence_ipmilan -a 10.42.21.28 -o off -l admin -p admin
>>> Powering off machine @ IPMI:10.42.21.28...ipmilan: Failed to connect
>>> after 30 seconds
>>> Failed
>>> [root at ha1lx ~]#
>>> ---------------
>>>
>>>
>>> Here 10.42.21.28 is an IP address assigned to IPMI interface and I am
>>> running this command in the same host.
>>>
>>>       
>> Sorry couldn't respond earlier as I do this on personal time (which as
>> useual limited for us IT guys and gals ;-) ) and not during work per
>> se..
>>
>> Do not run fence script from the node that you want to fence.
>>
>> Let us say you want to fence node 3.
>> 1. Try pinging the node 3's IPMI from node 4. I should be successful
>> 2. Issue the fence command from Node 4 with IP of Node 3 IPMI as argument .
>>
>>
>> HTH
>>
>> With warm regards
>>
>> Rajagopal
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>     
>
> Thanks will try that. Did u get a chance to see my cluster.conf file?
>
>
> Paras.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   
Actually, I've recently had this same problem myself.

My second node went to fence the first node and what instead happened 
was the fence tool hung because the ipmitool hung.

As far as I'm aware this was not always the case - I've been able to 
fence node01 from node02 before.

Node02 can ping node01's address and it can ping node01's ipmi address 
as well.

However ipmitool says it can not reach the ipmi address, even though I 
can manually.

Is anyone else having this issue?  Possibly problem with ipmitool 
perhaps? My boxes are Red Hat 5.2 and were updated via RHN on Friday.

Regards,

-- 
Stewart Walters
Senior IT Consultant
LPIC-2, MCP, MCDST, MCSA &
Foundations Certificate - IT Service Management

E&P IT Solutions Pty Ltd
Office: Level 2, 157 Rokeby Road, Subiaco, W.A. 6008
E-mail: stewart at epits.com.au
Phone: 08 9388 8622   Fax: 08 6210 1780   Mobile: 0419 977 848

==  Red Hat Linux, Solaris, VMware, Oracle and Windows.  ==
== Solutions for high performance and high availability. == 



From curtis at athabascau.ca  Mon Jan 12 00:03:20 2009
From: curtis at athabascau.ca (Curtis Collicutt)
Date: Sun, 11 Jan 2009 17:03:20 -0700
Subject: [Linux-cluster] Sun StorageTek 2530 Array SAS and Redhat Cluster
	Suite
Message-ID: <1231718585-sup-3869@beaker.cs.athabascau.ca>

I'm thinking of purchasing the 2530 to enable shared storage for a small cluster of Redhat Xen hosts.

But, on this little blurb page from Redhat:

http://www.redhat.com/cluster_suite/hardware/

it says: "Cluster Manager systems can be configured using iSCSI, Fibre Channel, or Multi-Initiator (MI) SAS hardware. In general, MI SAS will be appropriate for smaller configurations with 2-4 servers, while iSCSI and Fibre Channel are better suited to larger configurations." 

Initially the cluster would be 3 servers but I could see us adding more later, but I doubt more than 6 servers total on this entry level SAN from SUN.

I'm wondering why MI SAS should only be used with 2-4 cluster nodes?

Perhaps I should go with the fibre channel model of this array? But it would cost more. The SAS version seems just right to me, except for this blurb.

Thanks,
Curtis.

__ 
    This communication is intended for the use of the recipient to whom it
    is addressed, and may contain confidential, personal, and or privileged
    information. Please contact us immediately if you are not the intended
    recipient of this communication, and do not copy, distribute, or take
    action relying on it. Any communications received in error, or
    subsequent reply, should be deleted or destroyed.
---



From jeff.jansen at kkoncepts.net  Mon Jan 12 05:20:14 2009
From: jeff.jansen at kkoncepts.net (Jeff Jansen)
Date: Mon, 12 Jan 2009 13:20:14 +0800
Subject: [Linux-cluster] Qdisk in initial quorum
In-Reply-To: <6008E5CED89FD44A86D3C376519E1DB2102553963B@megatron.ms.a2end.com>
References: <4965FE33.10509@kkoncepts.net>
	<6008E5CED89FD44A86D3C376519E1DB2102553963B@megatron.ms.a2end.com>
Message-ID: <496AD30E.1010608@kkoncepts.net>

Bevan Broun <Bevan.Broun at ardec.com.au> wrote on 2009-Jan-09:
> I set up a 2 node cluster with qdisk and had the behavior you are expecting. At least I get a running cluster with 2 votes when only 1 node is booted up. So it should work.
> 
> I have
> 
> <cman expected_votes="3" two_node="0"/>
> 
> And
>         <quorumd interval="3" label="SOMENAME" min_score="1" tko="10" votes="1">
>                 <heuristic interval="2" program="ping -c3 -t2 SOMEIPADDR" score="1"/>
>         </quorumd>
> 
> This is on RH-5.1.

Hmmmmmm, that's pretty much exactly what I have.  But qdiskd can't start because
ccsd refuses the connection saying that it's not quorate.

So when yours starts up, there's no error messages for qdisk not being able to
talk to ccsd?

This is on debian lenny : redhat-cluster -2.20081102-1.

Jeff



From jeff.jansen at kkoncepts.net  Mon Jan 12 05:29:06 2009
From: jeff.jansen at kkoncepts.net (Jeff Jansen)
Date: Mon, 12 Jan 2009 13:29:06 +0800
Subject: [Linux-cluster] Qdisk in initial quorum
In-Reply-To: <6008E5CED89FD44A86D3C376519E1DB2102553963B@megatron.ms.a2end.com>
References: <4965FE33.10509@kkoncepts.net>
	<6008E5CED89FD44A86D3C376519E1DB2102553963B@megatron.ms.a2end.com>
Message-ID: <496AD522.9@kkoncepts.net>

Bevan Broun <Bevan.Broun at ardec.com.au> wrote on 2009-Jan-09:
> I set up a 2 node cluster with qdisk and had the behavior you are expecting. At least I get a running cluster with 2 votes when only 1 node is booted up. So it should work.
> 
> I have
> 
> <cman expected_votes="3" two_node="0"/>
> 
> And
>         <quorumd interval="3" label="SOMENAME" min_score="1" tko="10" votes="1">
>                 <heuristic interval="2" program="ping -c3 -t2 SOMEIPADDR" score="1"/>
>         </quorumd>
> 
> This is on RH-5.1.

Hmmmmmm, that's pretty much exactly what I have.  But qdiskd can't start because
ccsd refuses the connection saying that it's not quorate.

So when yours starts up, there's no error messages for qdisk not being able to
talk to ccsd?

This is on debian lenny : redhat-cluster -2.20081102-1.

Jeff



From jeff.jansen at kkoncepts.net  Mon Jan 12 05:49:37 2009
From: jeff.jansen at kkoncepts.net (Jeff Jansen)
Date: Mon, 12 Jan 2009 13:49:37 +0800
Subject: [Linux-cluster] Qdisk in initial quorum
In-Reply-To: <6008E5CED89FD44A86D3C376519E1DB2102553963B@megatron.ms.a2end.com>
References: <4965FE33.10509@kkoncepts.net>
	<6008E5CED89FD44A86D3C376519E1DB2102553963B@megatron.ms.a2end.com>
Message-ID: <496AD9F1.40201@kkoncepts.net>

Bevan Broun <Bevan.Broun at ardec.com.au> wrote on 2009-Jan-09:
> I set up a 2 node cluster with qdisk and had the behavior you are expecting. At least I get a running cluster with 2 votes when only 1 node is booted up. So it should work.
> 
> I have
> 
> <cman expected_votes="3" two_node="0"/>
> 
> And
>         <quorumd interval="3" label="SOMENAME" min_score="1" tko="10" votes="1">
>                 <heuristic interval="2" program="ping -c3 -t2 SOMEIPADDR" score="1"/>
>         </quorumd>
> 
> This is on RH-5.1.

Hmmmmmm, that's pretty much exactly what I have.  But qdiskd can't start because
ccsd refuses the connection saying that it's not quorate.

So when yours starts up, there's no error messages for qdisk not being able to
talk to ccsd?

This is on debian lenny : redhat-cluster -2.20081102-1.

Jeff



From jeff.jansen at kkoncepts.net  Mon Jan 12 06:18:13 2009
From: jeff.jansen at kkoncepts.net (Jeff Jansen)
Date: Mon, 12 Jan 2009 14:18:13 +0800
Subject: [Linux-cluster] Qdisk in initial quorum
In-Reply-To: <6008E5CED89FD44A86D3C376519E1DB2102553963B@megatron.ms.a2end.com>
References: <4965FE33.10509@kkoncepts.net>
	<6008E5CED89FD44A86D3C376519E1DB2102553963B@megatron.ms.a2end.com>
Message-ID: <496AE0A5.7050509@kkoncepts.net>

Bevan Broun <Bevan.Broun at ardec.com.au> wrote on 2009-Jan-09:
> I set up a 2 node cluster with qdisk and had the behavior you are expecting. At least I get a running cluster with 2 votes when only 1 node is booted up. So it should work.
> 
> I have
> 
> <cman expected_votes="3" two_node="0"/>
> 
> And
>         <quorumd interval="3" label="SOMENAME" min_score="1" tko="10" votes="1">
>                 <heuristic interval="2" program="ping -c3 -t2 SOMEIPADDR" score="1"/>
>         </quorumd>
> 
> This is on RH-5.1.

Hmmmmmm, that's pretty much exactly what I have.  But qdiskd can't start because
ccsd refuses the connection saying that it's not quorate.

So when yours starts up, there's no error messages for qdisk not being able to
talk to ccsd?

This is on debian lenny : redhat-cluster -2.20081102-1.

Jeff



From jeff.jansen at kkoncepts.net  Mon Jan 12 06:26:13 2009
From: jeff.jansen at kkoncepts.net (Jeff Jansen)
Date: Mon, 12 Jan 2009 14:26:13 +0800
Subject: [Linux-cluster] Qdisk in initial quorum
In-Reply-To: <496AD9F1.40201@kkoncepts.net>
References: <4965FE33.10509@kkoncepts.net>	<6008E5CED89FD44A86D3C376519E1DB2102553963B@megatron.ms.a2end.com>
	<496AD9F1.40201@kkoncepts.net>
Message-ID: <496AE285.3050201@kkoncepts.net>

Sorry for multiple copies of the same message.  Ran out of my caffeine supply of
choice this morning (Diet Coke) and haven't been the same since.  I'll go get
some more immediately and try not to waste your time again sending the same
message three times! :-(

Jeff Jansen



From fdinitto at redhat.com  Mon Jan 12 07:38:56 2009
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 12 Jan 2009 08:38:56 +0100
Subject: [Linux-cluster] Cluster 3.0.0.alpha2 released
Message-ID: <1231745936.22679.13.camel@cerberus.int.fabbione.net>

The cluster team and its community are proud to announce the
3.0.0.alpha2 release from the STABLE3 branch.

The development cycle for 3.0.0 is about to end. With the new STABLE3
branch that will collect only bug fixes and minimal update required to
build on top of the latest upstream kernel, we are getting closer and
closer to a shiny new stable release.

Everybody with test equipment and time to spare, is highly encouraged to
download, install and test the 3.0.0.alpha2 releases and more important
report problems. This is the time for people to make a difference and
help us testing as much as possible.

In order to build the 3.0.0.alpha1 release you will need:

- corosync svn r1709.
- openais svn r1667.
- linux kernel (2.6.27) - Porting to newer kernel is in progress.

The new source tarball can be downloaded here:

ftp://sources.redhat.com/pub/cluster/releases/cluster-3.0.0.alpha2.tar.gz

https://fedorahosted.org/releases/c/l/cluster/cluster-3.0.0.alpha2.tar.gz

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Happy clustering,
Fabio

Under the hood (from 3.0.0.alpha1):

Bob Peterson (1):
      gfs: improve gfs_fsck rindex repair code

Christine Caulfield (5):
      cman: drastically improve startup errors
      cman: fix cman_tool join return code
      cman: make 'cman_tool leave -w' wait until cman has shut down
      cman: Return an error if 'cman_tool leave' is attempted during
shutdown
      cman: let 'cman-tool leave -w' wait even if shutdown has already
started

David Teigland (3):
      dlm_tool: lockdebug using new debugfs file
      dlm_tool: change to new debugfs scan
      dlm_tool: add -s option to summarize lockdebug output

Fabio M. Di Nitto (10):
      build: restore original behaviour when building groupd
      qdisk: fix mkqdisk output
      cman: improve init script
      qdisk: improve init script
      fence_scsi: improve init script
      rgmanager: improve init script
      gfs2: improve init script
      build: fix rgmanager init script Makefile
      build: fix gfs2 init script Makefile
      gfs: improve init script

Jan Friesse (1):
      fence: Fix virsh agent and ssh_options in case of ssh private key

 cman/cman_tool/join.c             |  125 ++++++++--
 cman/cman_tool/main.c             |   11 +-
 cman/daemon/commands.c            |    3 +-
 cman/init.d/cman.in               |  113 ++++-----
 cman/init.d/qdiskd.in             |   59 ++---
 cman/qdisk/proc.c                 |   14 +-
 dlm/tool/main.c                   |  484
++++++++++++++++++++++++++++++++++---
 fence/agents/lib/fencing.py.py    |    6 +-
 fence/agents/scsi/Makefile        |    1 +
 fence/agents/scsi/scsi_reserve.in |   39 ++--
 fence/agents/virsh/fence_virsh.py |    2 +-
 gfs/gfs_fsck/rgrp.c               |    5 +
 gfs/gfs_fsck/super.c              |  160 ++++++++++---
 gfs/init.d/Makefile               |    1 +
 gfs/init.d/gfs.in                 |   39 ++--
 gfs2/init.d/Makefile              |    1 +
 gfs2/init.d/gfs2.in               |   39 ++--
 group/Makefile                    |    6 +-
 group/gfs_controld/plock.c        |    2 +
 rgmanager/init.d/Makefile         |    1 +
 rgmanager/init.d/rgmanager.in     |   64 +++---
 21 files changed, 879 insertions(+), 296 deletions(-)

--
I'm going to make him an offer he can't refuse.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090112/dbb69255/attachment.sig>

From chattygk at gmail.com  Mon Jan 12 11:14:42 2009
From: chattygk at gmail.com (Chaitanya Kulkarni)
Date: Mon, 12 Jan 2009 16:44:42 +0530
Subject: [Linux-cluster] List Cluster Resources
Message-ID: <1ad236320901120314m35b1204bxfc2f24185b830dc2@mail.gmail.com>

Hi All,

I am new to the RHEL Clusters. Is there any way, (other than the
cluster.conf file) using which we can view / list all the Cluster Resources
that are used under the Cluster Service (Resource Group)? Some command which
might give some output as -

Service Name = Service1

Resources -
IP Address = <Value>
File System = <Value>
Script = <Value>
.
.
.
.
.

Any help is much needed.

Thanks,
Chaitanya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090112/48a96cca/attachment.htm>

From Alain.Moulle at bull.net  Mon Jan 12 11:51:50 2009
From: Alain.Moulle at bull.net (Alain.Moulle)
Date: Mon, 12 Jan 2009 12:51:50 +0100
Subject: [Linux-cluster] cman-2.0.95-1.el5 / question about a problem
	when launching cman
Message-ID: <496B2ED6.50608@bull.net>

Hi,

Thanks for your response Marc. It seems that we are the only ones facing 
this
problem ... ?
I saw in changelog a fix :
- A dirty node is now prevented from joining the cman cluster.
It could be related to our problem ... because when launching cman
on the second node, the node is labeled as "dirty" ...
So could someone explain which are all the possible causes which
could flagged a node as "dirty" and lead to our problem ?

PS: note that this problem does not happen on RHEL 5.2 with cman 2-0-73-1

Thanks
Regards,
Alain

> Hi,
> I'm having "exactly" the same problem with some clusters (Version: 
> cman-2.0.84-2.el5_2.2,..) 
>
> Is it so that if you reboot the node that was killed, it will rejoin the 
> cluster without being killed? And does it only happen if you start the whole 
> cluster from scratch?
>
> I didn't figure out the whole picture behind it but I think it is related to 
> IGMP,openais and cman. At least it fells like the same behaviour I'm 
> experiencing.
>
> Somehow it seems to be related to the networkswitches and IGMP Version being 
> used (I don't have it on all RHEL5 clusters but on the majority running 
> RHEl5U2+). I'm still investigating on this issue.
>
> Strange thing.
>
> Marc.
>
> On Friday 09 January 2009 11:47:02 Alain.Moulle wrote:
>   
>> > Hi
>> >
>> > Release : cman-2.0.95-1.el5
>> > (but same problem with 2.0.98)
>> >
>> > I face a problem when launching cman on a two-node cluster :
>> >
>> > 1. Launching cman on node 1 : OK
>> > 2. When launching cman on node 2, the log on node1 gives :
>> >     cman killed by node 2 because we rejoined the cluster without a full
>> > restart
>> >
>> > Any idea ?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090112/300572bb/attachment.htm>

From ccaulfie at redhat.com  Mon Jan 12 11:58:37 2009
From: ccaulfie at redhat.com (Chrissie Caulfield)
Date: Mon, 12 Jan 2009 11:58:37 +0000
Subject: [Linux-cluster] cman-2.0.95-1.el5 / question about a problem
	when launching cman
In-Reply-To: <496B2ED6.50608@bull.net>
References: <496B2ED6.50608@bull.net>
Message-ID: <496B306D.5020007@redhat.com>

Yes, I suspect the problem is that the node is 'bouncing' as it joins
the cluster.

Causes of this are usually to do with either a) startup scripts (eg some
Xen ones) taking he interface down and then up after openais has started
or b) "intelligent" switches taking too long to recognise the multicast
join. So that both cluster nodes have "state" (the dirty flag) by the
time they see each other.

Chrissie

Alain.Moulle wrote:
> Hi,
> 
> Thanks for your response Marc. It seems that we are the only ones facing
> this
> problem ... ?
> I saw in changelog a fix :
> - A dirty node is now prevented from joining the cman cluster.
> It could be related to our problem ... because when launching cman
> on the second node, the node is labeled as "dirty" ...
> So could someone explain which are all the possible causes which
> could flagged a node as "dirty" and lead to our problem ?
> 
> PS: note that this problem does not happen on RHEL 5.2 with cman 2-0-73-1
> 
> Thanks
> Regards,
> Alain
> 
>> Hi,
>> I'm having "exactly" the same problem with some clusters (Version: 
>> cman-2.0.84-2.el5_2.2,..) 
>>
>> Is it so that if you reboot the node that was killed, it will rejoin the 
>> cluster without being killed? And does it only happen if you start the whole 
>> cluster from scratch?
>>
>> I didn't figure out the whole picture behind it but I think it is related to 
>> IGMP,openais and cman. At least it fells like the same behaviour I'm 
>> experiencing.
>>
>> Somehow it seems to be related to the networkswitches and IGMP Version being 
>> used (I don't have it on all RHEL5 clusters but on the majority running 
>> RHEl5U2+). I'm still investigating on this issue.
>>
>> Strange thing.
>>
>> Marc.
>>
>> On Friday 09 January 2009 11:47:02 Alain.Moulle wrote:
>>   
>>> > Hi
>>> >
>>> > Release : cman-2.0.95-1.el5
>>> > (but same problem with 2.0.98)
>>> >
>>> > I face a problem when launching cman on a two-node cluster :
>>> >
>>> > 1. Launching cman on node 1 : OK
>>> > 2. When launching cman on node 2, the log on node1 gives :
>>> >     cman killed by node 2 because we rejoined the cluster without a full
>>> > restart
>>> >
>>> > Any idea ?
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From chattygk at gmail.com  Mon Jan 12 13:49:11 2009
From: chattygk at gmail.com (Chaitanya Kulkarni)
Date: Mon, 12 Jan 2009 19:19:11 +0530
Subject: [Linux-cluster] Re: About the ccs_test tool
In-Reply-To: <1ad236320901090140t294c2468w3b42cbfa7bfb7347@mail.gmail.com>
References: <1ad236320901090140t294c2468w3b42cbfa7bfb7347@mail.gmail.com>
Message-ID: <1ad236320901120549q5d55b95fg350213a7294f711f@mail.gmail.com>

Hi,

By trial and error, I could write the queries. We need to follow the Node
structure of the cluster.config file to fetch the result. For the specific
attribute we need to mention '@' before the attribute name.

However, I would still like to know whether we can relate the results in a
way similar to the following -

/sbin/ccs_test get $i /cluster/rm/service/@name="Service1"/ip/@address

All I want to do is find the IP Address which is related to my "Service1".
Is there any other way in which this can be achieved? I want to relate the
Service and it's respective Resources.

Thanks for your help.

Regards,
Chaitanya

---------------------------------------------------------------------------------------------------------------------

On Fri, Jan 9, 2009 at 3:10 PM, Chaitanya Kulkarni <chattygk at gmail.com>wrote:

> Hi,
>
> I am new to the RHEL cluster. I would like to know how we can write queries
> for the ccs_test tool and how they actually fetch the information from the
> cluster.
>
> Any help would be much appreciated.
>
> Thanks,
> Chaitanya.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090112/09216349/attachment.htm>

From jeff.sturm at eprize.com  Mon Jan 12 13:55:00 2009
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Mon, 12 Jan 2009 08:55:00 -0500
Subject: [Linux-cluster] Strange CMAN error
Message-ID: <64D0546C5EBBD147B75DE133D798665F021B9A43@hugo.eprize.local>

What might cause a message like:
 
Jan 12 08:41:24 t0core-mqc02 openais[1716]: [CMAN ] Node 8 conflict,
remote cluster name='t0core-inner-rhcxvm', local='t0core-inner-rhc'

I've double- and triple-checked that /etc/cluster/cluster.conf is
identical on every node.  It starts with:

<?xml version="1.0"?>
<cluster name="t0core-inner-rhc" config_version="3">
  <fencedevices>
    <fencedevice name="xvm" agent="fence_xvm"/>
  </fencedevices>

It's as though cman is concatenating the two attributes somehow.
Coincident with this, "cman_tool nodes" says the node 8 is not a member
(though node 8 thinks it is).

We're having trouble with our cluster when fencing/rejoining a node.  It
operates fine when starting the whole cluster together.  This is on
CentOS 5.2.

Thanks for any insight...
 
Jeff




From ccaulfie at redhat.com  Mon Jan 12 14:00:17 2009
From: ccaulfie at redhat.com (Chrissie Caulfield)
Date: Mon, 12 Jan 2009 14:00:17 +0000
Subject: [Linux-cluster] Strange CMAN error
In-Reply-To: <64D0546C5EBBD147B75DE133D798665F021B9A43@hugo.eprize.local>
References: <64D0546C5EBBD147B75DE133D798665F021B9A43@hugo.eprize.local>
Message-ID: <496B4CF1.7010508@redhat.com>

Jeff Sturm wrote:
> What might cause a message like:
>  
> Jan 12 08:41:24 t0core-mqc02 openais[1716]: [CMAN ] Node 8 conflict,
> remote cluster name='t0core-inner-rhcxvm', local='t0core-inner-rhc'
> 
> I've double- and triple-checked that /etc/cluster/cluster.conf is
> identical on every node.  It starts with:
> 
> <?xml version="1.0"?>
> <cluster name="t0core-inner-rhc" config_version="3">
>   <fencedevices>
>     <fencedevice name="xvm" agent="fence_xvm"/>
>   </fencedevices>
> 
> It's as though cman is concatenating the two attributes somehow.
> Coincident with this, "cman_tool nodes" says the node 8 is not a member
> (though node 8 thinks it is).

The cluster name is too long. 15 characters is the maximum.

> We're having trouble with our cluster when fencing/rejoining a node.  It
> operates fine when starting the whole cluster together.  This is on
> CentOS 5.2.
> 
> Thanks for any insight...
>  
> Jeff
> 
> 


Chrissie



From jeff.sturm at eprize.com  Mon Jan 12 14:03:39 2009
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Mon, 12 Jan 2009 09:03:39 -0500
Subject: [Linux-cluster] Strange CMAN error
In-Reply-To: <496B4CF1.7010508@redhat.com>
References: <64D0546C5EBBD147B75DE133D798665F021B9A43@hugo.eprize.local>
	<496B4CF1.7010508@redhat.com>
Message-ID: <64D0546C5EBBD147B75DE133D798665F021B9A46@hugo.eprize.local>

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of 
> Chrissie Caulfield
> Sent: Monday, January 12, 2009 9:00 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] Strange CMAN error
> 
> > It's as though cman is concatenating the two attributes somehow.
> > Coincident with this, "cman_tool nodes" says the node 8 is not a 
> > member (though node 8 thinks it is).
> 
> The cluster name is too long. 15 characters is the maximum.

Thanks for the quick answer!

You've saved us from much head-scratching.  I'll post back if this
happens again, but we're going to rename the cluster now.

Jeff




From adam at gradientzero.com  Mon Jan 12 21:03:26 2009
From: adam at gradientzero.com (Adam Hough)
Date: Mon, 12 Jan 2009 15:03:26 -0600
Subject: [Linux-cluster] Load share http servers using clusterip with
	failover
In-Reply-To: <EF979E950B86414987ABEAC3713A3E1A012CFD2F@ukslexb03.uk.logicalis.local>
References: <AclxhxI1ChxaHHF0SG+uj3DLOxQm3A==>
	<EF979E950B86414987ABEAC3713A3E1A012CFD2F@ukslexb03.uk.logicalis.local>
Message-ID: <c67630fc0901121303v6cc38f1l3e4d2ca6811b2a99@mail.gmail.com>

2009/1/8 Brett Delle Grazie <Brett.Dellegrazie at intact-is.com>:
> Hi,
>
> I have a configured two-node cluster with some GFS file systems on them.
>
> Those servers also run http servers and I'd like to load-share the HTTP
> servers without
> putting a hardware load balancer in front of them.
>
> I read about clusterIP: http://www.linux-ha.org/ClusterIP
>
> and was wondering if anyone has managed to use this iptables capability
> to get a service running in load-shared fashion across multiple nodes with
> the
> failover of a node handled by rgmanager?
>
> Is there an example of this anywhere that someone could point me to?
> Has anyone got a resource script of this type they would be willing to
> share?
>
> Thanks in advance,
>
> Best regards,
>
> Brett
>

You will want to create X number of ip address where X is the number
of http servers.  Then setup a failover domain inside the cluster for
each ip address that includes all of the http servers with the servers
weighted in such a way that every http server would have an ip address
assuming the server is up. Then create an ip address resource for each
of the ip addresses you created earlier.  Finally create a service
that contains an ip address resource and is associated with a failover
domain.  You will create one server for each ip address. Alternately
it might be possible to skip the setting up of the resource and just
setup the resource in side of the service.

- Adam



From Bevan.Broun at ardec.com.au  Mon Jan 12 22:19:13 2009
From: Bevan.Broun at ardec.com.au (Bevan Broun)
Date: Tue, 13 Jan 2009 09:19:13 +1100
Subject: [Linux-cluster] Qdisk in initial quorum
In-Reply-To: <496AD30E.1010608@kkoncepts.net>
References: <4965FE33.10509@kkoncepts.net>
	<6008E5CED89FD44A86D3C376519E1DB2102553963B@megatron.ms.a2end.com>
	<496AD30E.1010608@kkoncepts.net>
Message-ID: <6008E5CED89FD44A86D3C376519E1DB21025539736@megatron.ms.a2end.com>

I don?t have access to the cluster anymore. I built 4 instances of this and had a few issues along the way to investigate. I don?t recall getting any messages like yours. Sorry I cant help. You should send your cluster.conf.

Bevan Broun
Solutions Architect
Ardec International

http://www.ardec.com.au
http://www.lisasoft.com
http://www.terrapages.com
Sydney
-----------------------
Suite 112,The Lower Deck
19-21 Jones Bay Wharf
Pirrama Road, Pyrmont 2009
Ph:  +61 2 8570 5000
Fax: +61 2 8570 5099
-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Jansen
Sent: Monday, 12 January 2009 4:20 PM
To: linux clustering
Subject: Re: [Linux-cluster] Qdisk in initial quorum

Bevan Broun <Bevan.Broun at ardec.com.au> wrote on 2009-Jan-09:
> I set up a 2 node cluster with qdisk and had the behavior you are expecting. At least I get a running cluster with 2 votes when only 1 node is booted up. So it should work.
>
> I have
>
> <cman expected_votes="3" two_node="0"/>
>
> And
>         <quorumd interval="3" label="SOMENAME" min_score="1" tko="10" votes="1">
>                 <heuristic interval="2" program="ping -c3 -t2 SOMEIPADDR" score="1"/>
>         </quorumd>
>
> This is on RH-5.1.

Hmmmmmm, that's pretty much exactly what I have.  But qdiskd can't start because
ccsd refuses the connection saying that it's not quorate.

So when yours starts up, there's no error messages for qdisk not being able to
talk to ccsd?

This is on debian lenny : redhat-cluster -2.20081102-1.

Jeff

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.



From sunhux at gmail.com  Tue Jan 13 12:05:27 2009
From: sunhux at gmail.com (sunhux G)
Date: Tue, 13 Jan 2009 20:05:27 +0800
Subject: [Linux-cluster] 2 IP addrs on one NIC : for 2 applications
	listening on same tcp port to bind to
Message-ID: <60f08e700901130405nd544e64y1408c04c9f0b534e@mail.gmail.com>

Hi,

How can I create 2 separate IP addresses for one physical NIC
in RHES 4.5?

I have 2 applicaitons, one Oracle & one UPS software & both
listens/bind to same tcp port number.

Purpose is to have Oracle bind to one IP addr while UPS bind
to the 2nd IP addr (ie the alias address as in Solaris).

Let me know the steps, step by step as pretty new to Linux

BTW, do I need to restart the server or the 2nd IP address
created can go live without a reboot?

Thanks
U
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090113/359cf2fe/attachment.htm>

From d.vasilets at peterhost.ru  Tue Jan 13 12:18:28 2009
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Tue, 13 Jan 2009 15:18:28 +0300
Subject: [Linux-cluster] 2 IP addrs on one NIC : for 2 applications
	listening on same tcp port to bind to
In-Reply-To: <60f08e700901130405nd544e64y1408c04c9f0b534e@mail.gmail.com>
References: <60f08e700901130405nd544e64y1408c04c9f0b534e@mail.gmail.com>
Message-ID: <1231849108.9163.0.camel@dima-desktop>

ifconfig eth0 192.168.1.1/24 up 
ifconfig eth0:1 192.168.10.1/24 up 

? ???, 13/01/2009 ? 20:05 +0800, sunhux G ?????:
> Hi,
>  
> How can I create 2 separate IP addresses for one physical NIC
> in RHES 4.5?
>  
> I have 2 applicaitons, one Oracle & one UPS software & both
> listens/bind to same tcp port number.
>  
> Purpose is to have Oracle bind to one IP addr while UPS bind
> to the 2nd IP addr (ie the alias address as in Solaris).
>  
> Let me know the steps, step by step as pretty new to Linux
> 
> BTW, do I need to restart the server or the 2nd IP address
> created can go live without a reboot?
>  
> Thanks
> U
>  
> 
>  
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From sunhux at gmail.com  Tue Jan 13 16:00:27 2009
From: sunhux at gmail.com (sunhux G)
Date: Wed, 14 Jan 2009 00:00:27 +0800
Subject: [Linux-cluster] 2 IP addrs on one NIC : for 2 applications 
	listening on same tcp port to bind to
In-Reply-To: <1231849108.9163.0.camel@dima-desktop>
References: <60f08e700901130405nd544e64y1408c04c9f0b534e@mail.gmail.com>
	<1231849108.9163.0.camel@dima-desktop>
Message-ID: <60f08e700901130800l65135d06iee3152b705ecf024@mail.gmail.com>

Thanks.

What should I do to ensure that each time after the server is rebooted,
this 2nd IP address stays : is there a config file somewhere that I must
update or a startup script that get executed each time server is rebooted
that will execute the commands?

TIA

On Tue, Jan 13, 2009 at 8:18 PM, ??????? ??????? <d.vasilets at peterhost.ru>wrote:

> ifconfig eth0 192.168.1.1/24 up
> ifconfig eth0:1 192.168.10.1/24 up
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090114/22d5d7b1/attachment.htm>

From sunhux at gmail.com  Tue Jan 13 16:01:06 2009
From: sunhux at gmail.com (sunhux G)
Date: Wed, 14 Jan 2009 00:01:06 +0800
Subject: [Linux-cluster] 2 IP addrs on one NIC : for 2 applications 
	listening on same tcp port to bind to
In-Reply-To: <1231849108.9163.0.camel@dima-desktop>
References: <60f08e700901130405nd544e64y1408c04c9f0b534e@mail.gmail.com>
	<1231849108.9163.0.camel@dima-desktop>
Message-ID: <60f08e700901130801o16d6ce94v1d1a9c92af00bc4b@mail.gmail.com>

Thanks.

What should I do to ensure that each time after the server is rebooted,
this 2nd IP address stays : is there a config file somewhere that I must
update or a startup script that get executed each time server is rebooted
that will execute the commands?

TIA

On Tue, Jan 13, 2009 at 8:18 PM, ??????? ??????? <d.vasilets at peterhost.ru>wrote:

> ifconfig eth0 192.168.1.1/24 up
> ifconfig eth0:1 192.168.10.1/24 up
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090114/25821183/attachment.htm>

From sunhux at gmail.com  Tue Jan 13 16:02:53 2009
From: sunhux at gmail.com (sunhux G)
Date: Wed, 14 Jan 2009 00:02:53 +0800
Subject: [Linux-cluster] 2 IP addrs on one NIC : for 2 applications 
	listening on same tcp port to bind to
In-Reply-To: <1231849108.9163.0.camel@dima-desktop>
References: <60f08e700901130405nd544e64y1408c04c9f0b534e@mail.gmail.com>
	<1231849108.9163.0.camel@dima-desktop>
Message-ID: <60f08e700901130802j1feeb6es37722b79ecf12f1b@mail.gmail.com>

Thanks.

What should I do to ensure each time the server is booted up,
the 2nd IP address will stay.  Which config file to edit or startup
script to create that will execute the commands on each reboot?


On Tue, Jan 13, 2009 at 8:18 PM, ??????? ??????? <d.vasilets at peterhost.ru>wrote:

> ifconfig eth0 192.168.1.1/24 up
> ifconfig eth0:1 192.168.10.1/24 up
>
> ? ???, 13/01/2009 ? 20:05 +0800, sunhux G ?????:
> > Hi,
> >
> > How can I create 2 separate IP addresses for one physical NIC
> > in RHES 4.5?
> >
> > I have 2 applicaitons, one Oracle & one UPS software & both
> > listens/bind to same tcp port number.
> >
> > Purpose is to have Oracle bind to one IP addr while UPS bind
> > to the 2nd IP addr (ie the alias address as in Solaris).
> >
> > Let me know the steps, step by step as pretty new to Linux
> >
> > BTW, do I need to restart the server or the 2nd IP address
> > created can go live without a reboot?
> >
> > Thanks
> > U
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090114/cbfcf682/attachment.htm>

From mad at wol.de  Tue Jan 13 16:08:42 2009
From: mad at wol.de (Marc - A. Dahlhaus [ Administration | Westermann GmbH ])
Date: Tue, 13 Jan 2009 17:08:42 +0100
Subject: [Linux-cluster] 2 IP addrs on one NIC : for 2 applications 
	listening on same tcp port to bind to
In-Reply-To: <60f08e700901130802j1feeb6es37722b79ecf12f1b@mail.gmail.com>
References: <60f08e700901130405nd544e64y1408c04c9f0b534e@mail.gmail.com>
	<1231849108.9163.0.camel@dima-desktop>
	<60f08e700901130802j1feeb6es37722b79ecf12f1b@mail.gmail.com>
Message-ID: <1231862922.19474.0.camel@marc>

How about looking here?
http://www.redhat.com/docs/manuals/

Am Mittwoch, den 14.01.2009, 00:02 +0800 schrieb sunhux G:
> Thanks.
> 
> What should I do to ensure each time the server is booted up,
> the 2nd IP address will stay.  Which config file to edit or startup
> script to create that will execute the commands on each reboot?
> 
> 
> On Tue, Jan 13, 2009 at 8:18 PM, ??????? ???????
> <d.vasilets at peterhost.ru> wrote:
>         ifconfig eth0 192.168.1.1/24 up
>         ifconfig eth0:1 192.168.10.1/24 up
>         
>         ? ???, 13/01/2009 ? 20:05 +0800, sunhux G ?????:
>         
>         > Hi,
>         >
>         > How can I create 2 separate IP addresses for one physical
>         NIC
>         > in RHES 4.5?
>         >
>         > I have 2 applicaitons, one Oracle & one UPS software & both
>         > listens/bind to same tcp port number.
>         >
>         > Purpose is to have Oracle bind to one IP addr while UPS bind
>         > to the 2nd IP addr (ie the alias address as in Solaris).
>         >
>         > Let me know the steps, step by step as pretty new to Linux
>         >
>         > BTW, do I need to restart the server or the 2nd IP address
>         > created can go live without a reboot?
>         >
>         > Thanks
>         > U
>         >
>         >
>         >
>         
>         > --
>         > Linux-cluster mailing list
>         > Linux-cluster at redhat.com
>         > https://www.redhat.com/mailman/listinfo/linux-cluster
>         
>         --
>         Linux-cluster mailing list
>         Linux-cluster at redhat.com
>         https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From jakub.suchy at enlogit.cz  Tue Jan 13 16:41:58 2009
From: jakub.suchy at enlogit.cz (Jakub Suchy)
Date: Tue, 13 Jan 2009 17:41:58 +0100
Subject: [Linux-cluster] Sun StorageTek 2530 Array SAS and Redhat
	Cluster Suite
In-Reply-To: <1231718585-sup-3869@beaker.cs.athabascau.ca>
References: <1231718585-sup-3869@beaker.cs.athabascau.ca>
Message-ID: <20090113164158.GA22854@aaron>

We use 2530 with 2 servers and RHCS without a problem. I think that the limitations are just due to a possible throughput of SAS.

Jakub

> I'm thinking of purchasing the 2530 to enable shared storage for a small cluster of Redhat Xen hosts.
> 
> But, on this little blurb page from Redhat:
> 
> http://www.redhat.com/cluster_suite/hardware/
> 
> it says: "Cluster Manager systems can be configured using iSCSI, Fibre Channel, or Multi-Initiator (MI) SAS hardware. In general, MI SAS will be appropriate for smaller configurations with 2-4 servers, while iSCSI and Fibre Channel are better suited to larger configurations." 
> 
> Initially the cluster would be 3 servers but I could see us adding more later, but I doubt more than 6 servers total on this entry level SAN from SUN.
> 
> I'm wondering why MI SAS should only be used with 2-4 cluster nodes?
> 
> Perhaps I should go with the fibre channel model of this array? But it would cost more. The SAS version seems just right to me, except for this blurb.

-- 
Jakub Such? <jakub.suchy at enlogit.cz>
GSM: +420 - 777 817 949

Enlogit s.r.o, U Cukrovaru 509/4, 400 07 ?st? nad Labem
tel.: +420 - 474 745 159, fax: +420 - 474 745 160
e-mail: info at enlogit.cz, web: http://www.enlogit.cz

Energy & Logic in IT



From mhayes at redhat.com  Tue Jan 13 18:40:34 2009
From: mhayes at redhat.com (Michael Hayes)
Date: Tue, 13 Jan 2009 13:40:34 -0500 (EST)
Subject: [Linux-cluster] GFS2 tuning recommendations on RHEL 5.3
In-Reply-To: <978535621.639031231870059115.JavaMail.root@zmail01.collab.prod.int.phx2.redhat.com>
Message-ID: <1603819793.647941231872034753.JavaMail.root@zmail01.collab.prod.int.phx2.redhat.com>

I have a customer who is currently looking at standing up 4 8 node RHCS clusters each of which will have a 8TB GFS2 file system. ?RHEL 5.3 32bit; vmware host virtualization; fence_vmware_vi.py fencing scripts.  The cluster and fencing are all working.

I am looking to get some GFS2 tuning recommendations for these file systems. ?They will contain directory structures and files similar to the following configurations; the following are rough estimates from the application vendor.  Currently we are looking at GFS2 partitions set up with the default settings; default i386 block size, 8 journals.

Chiliad Raw Data (html/xml) files:
Estimate # of Directories up to 100k
Typical FS Layout: /datastore/chiliad/extract/<form>/<year>/<month>/<day of month>/<bin>/*.html
Number of files in a directory: min=1,000, max=10,000, avg=1,000
File size: min=1KB, max=2MB, avg=5KB
Directory depth: min=1, max=25, avg=5

Chiliad Index files:
Estimate # of Directories: Thousands
Number of files in a directory: min=5, max=30, avg=15-20
File size: min=1KB, max=2GB, avg=1-2GB
Directory depth: min=1, max=10, avg=5

XXXXX:(/root)# gfs2_tool gettune /datastore/
new_files_directio = 0
new_files_jdata = 0
quota_scale = 1.0000 ? (1, 1)
quotad_secs = 5
logd_secs = 1
recoverd_secs = 60
statfs_quantum = 30
stall_secs = 600
quota_cache_secs = 300
quota_simul_sync = 64
statfs_slow = 0
reclaim_limit = 5000
complain_secs = 10
max_readahead = 262144
atime_quantum = 3600
quota_quantum = 60
quota_warn_period = 10
jindex_refresh_secs = 60
log_flush_secs = 60
incore_log_blocks = 1024
demote_secs = 300

Thank you,

Michael Hayes 
Red Hat 
mhayes at redhat.com 




From swhiteho at redhat.com  Tue Jan 13 19:34:38 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 13 Jan 2009 19:34:38 +0000
Subject: [Linux-cluster] GFS2 tuning recommendations on RHEL 5.3
In-Reply-To: <1603819793.647941231872034753.JavaMail.root@zmail01.collab.prod.int.phx2.redhat.com>
References: <1603819793.647941231872034753.JavaMail.root@zmail01.collab.prod.int.phx2.redhat.com>
Message-ID: <1231875278.3580.8.camel@localhost.localdomain>

Hi,

On Tue, 2009-01-13 at 13:40 -0500, Michael Hayes wrote:
> I have a customer who is currently looking at standing up 4 8 node RHCS clusters each of which will have a 8TB GFS2 file system.  RHEL 5.3 32bit; vmware host virtualization; fence_vmware_vi.py fencing scripts.  The cluster and fencing are all working.
> 
> I am looking to get some GFS2 tuning recommendations for these file systems.  They will contain directory structures and files similar to the following configurations; the following are rough estimates from the application vendor.  Currently we are looking at GFS2 partitions set up with the default settings; default i386 block size, 8 journals.
> 
It looks like you won't really need to do a lot of tuning, it should be
ok on defaults. The only issue is how often the various processes
running on different nodes try to access the same data files. Provided
its not too often, then everything should be fine,

Steve.

> Chiliad Raw Data (html/xml) files:
> Estimate # of Directories up to 100k
> Typical FS Layout: /datastore/chiliad/extract/<form>/<year>/<month>/<day of month>/<bin>/*.html
> Number of files in a directory: min=1,000, max=10,000, avg=1,000
> File size: min=1KB, max=2MB, avg=5KB
> Directory depth: min=1, max=25, avg=5
> 
> Chiliad Index files:
> Estimate # of Directories: Thousands
> Number of files in a directory: min=5, max=30, avg=15-20
> File size: min=1KB, max=2GB, avg=1-2GB
> Directory depth: min=1, max=10, avg=5
> 
> XXXXX:(/root)# gfs2_tool gettune /datastore/
> new_files_directio = 0
> new_files_jdata = 0
> quota_scale = 1.0000   (1, 1)
> quotad_secs = 5
> logd_secs = 1
> recoverd_secs = 60
> statfs_quantum = 30
> stall_secs = 600
> quota_cache_secs = 300
> quota_simul_sync = 64
> statfs_slow = 0
> reclaim_limit = 5000
> complain_secs = 10
> max_readahead = 262144
> atime_quantum = 3600
> quota_quantum = 60
> quota_warn_period = 10
> jindex_refresh_secs = 60
> log_flush_secs = 60
> incore_log_blocks = 1024
> demote_secs = 300
> 
> Thank you,
> 
> Michael Hayes 
> Red Hat 
> mhayes at redhat.com 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From ramiblanco at gmail.com  Tue Jan 13 20:44:32 2009
From: ramiblanco at gmail.com (Ramiro Blanco)
Date: Tue, 13 Jan 2009 18:44:32 -0200
Subject: [Linux-cluster] GFS2 tuning recommendations on RHEL 5.3
In-Reply-To: <1231875278.3580.8.camel@localhost.localdomain>
References: <1603819793.647941231872034753.JavaMail.root@zmail01.collab.prod.int.phx2.redhat.com>
	<1231875278.3580.8.camel@localhost.localdomain>
Message-ID: <713aecdf0901131244r1c1d1c53y6b71b71420fd5152@mail.gmail.com>

2009/1/13 Steven Whitehouse <swhiteho at redhat.com>

>   <snip>
>
> It looks like you won't really need to do a lot of tuning, it should be
> ok on defaults. The only issue is how often the various processes
> running on different nodes try to access the same data files. Provided
> its not too often, then everything should be fine,
>
>
Which kind of tunning would require in case of very often access to same
data?
 Cheers,

-- 
Ramiro Blanco
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090113/92116a38/attachment.htm>

From swhiteho at redhat.com  Tue Jan 13 21:00:24 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 13 Jan 2009 21:00:24 +0000
Subject: [Linux-cluster] GFS2 tuning recommendations on RHEL 5.3
In-Reply-To: <713aecdf0901131244r1c1d1c53y6b71b71420fd5152@mail.gmail.com>
References: <1603819793.647941231872034753.JavaMail.root@zmail01.collab.prod.int.phx2.redhat.com>
	<1231875278.3580.8.camel@localhost.localdomain>
	<713aecdf0901131244r1c1d1c53y6b71b71420fd5152@mail.gmail.com>
Message-ID: <1231880424.3580.14.camel@localhost.localdomain>

Hi,

On Tue, 2009-01-13 at 18:44 -0200, Ramiro Blanco wrote:
> 
> 
> 2009/1/13 Steven Whitehouse <swhiteho at redhat.com>
>           <snip>
>         
>         
>         It looks like you won't really need to do a lot of tuning, it
>         should be
>         ok on defaults. The only issue is how often the various
>         processes
>         running on different nodes try to access the same data files.
>         Provided
>         its not too often, then everything should be fine,
>         
> 
> Which kind of tunning would require in case of very often access to
> same data?
>  Cheers,
> 
Ideally you want to arrange the application so that you are not pushing
the cache from node to node too often. So it depends on the application
rather than the filesystem. The classic example is running a mail server
with lots of small files in the same directory, and the solution is to
have a number of separate directories. The issue in that case is that
creating and deleting files requires exclusive access to the directory
in which the files are being created and deleted and thus the
application has to lay out its files such that all the nodes are not all
trying to do that in just one single directory at once.

It can make a huge difference to performance, and its not something
which can really be fixed at a filesystem level,

Steve.

> -- 
> Ramiro Blanco
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From stewart at epits.com.au  Tue Jan 13 22:39:15 2009
From: stewart at epits.com.au (Stewart Walters)
Date: Wed, 14 Jan 2009 07:39:15 +0900
Subject: [Linux-cluster] 2 IP addrs on one NIC : for 2 applications
In-Reply-To: <60f08e700901130800l65135d06iee3152b705ecf024@mail.gmail.com>
References: <60f08e700901130405nd544e64y1408c04c9f0b534e@mail.gmail.com>	<1231849108.9163.0.camel@dima-desktop>
	<60f08e700901130800l65135d06iee3152b705ecf024@mail.gmail.com>
Message-ID: <496D1813.7090807@epits.com.au>

sunhux G wrote:
> Thanks.
>
> What should I do to ensure that each time after the server is rebooted,
> this 2nd IP address stays : is there a config file somewhere that I must
> update or a startup script that get executed each time server is rebooted
> that will execute the commands?
>
> TIA
>
> On Tue, Jan 13, 2009 at 8:18 PM, ??????? ??????? 
> <d.vasilets at peterhost.ru <mailto:d.vasilets at peterhost.ru>> wrote:
>
>     ifconfig eth0 *MailScanner has detected a possible fraud attempt
>     from "192.168.1.1" claiming to be* 192.168.1.1/24
>     <http://192.168.1.1/24> up
>     ifconfig eth0:1 *MailScanner has detected a possible fraud attempt
>     from "192.168.10.1" claiming to be* 192.168.10.1/24
>     <http://192.168.10.1/24> up
>
>
>
>
> ------------------------------------------------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
Copy /etc/sysconfig/network-scripts/ifcfg-eth0 to 
/etc/sysconfig/network-scripts/ifcfg-eth0:1 (i.e. run "cp 
/etc/sysconfig/network-scripts/ifcfg-eth0 
/etc/sysconfig/network-scripts/ifcfg-eth0\:1") and modify the contents 
of the new file to suit appropriately.

This is for Red Hat and Red Hat based systems (Centos, Fedora, Mandriva 
etc) - Debian based systems typically use /etc/network/interfaces instead.

Regards,

Stewart

-- 
Stewart Walters
Senior IT Consultant
LPIC-2, MCP, MCDST, MCSA &
Foundations Certificate - IT Service Management

E&P IT Solutions Pty Ltd
Office: Level 2, 157 Rokeby Road, Subiaco, W.A. 6008
E-mail: stewart at epits.com.au
Phone: 08 9388 8622   Fax: 08 6210 1780   Mobile: 0419 977 848

==  Red Hat Linux, Solaris, VMware, Oracle and Windows.  ==
== Solutions for high performance and high availability. == 



From Alain.Moulle at bull.net  Wed Jan 14 08:58:50 2009
From: Alain.Moulle at bull.net (Alain.Moulle)
Date: Wed, 14 Jan 2009 09:58:50 +0100
Subject: [Linux-cluster] cman-2.0.95-1.el5 / question about a problem when
	launching cman
Message-ID: <496DA94A.20608@bull.net>

Hi Chrissie,

Thanks for your quick response.
But we started the cman manually and then did not start any other 
service between.
There is no risk that the network goes down during the test.
We don't use actually 'intelligent switch' and no cisco switches.

We made a new test with the cman configured to start at boot time and 
with the clean_start set to 0.
scenario:
1) launch cman on node 1
2) node 1 fences the node 2 as it was not joining the fence domain 
before the time out.
3) during the boot time of node 2, the cman is launched and then fences 
the node 1.
4) after the boot of the node 1, both seems to see each other with the 
correct state.

Is this is the expected behaviour?

do both nodes have to be rebooted before having all nodes ready?

Thanks a lot for your help.
Alain
> Yes, I suspect the problem is that the node is 'bouncing' as it joins
> the cluster.
>
> Causes of this are usually to do with either a) startup scripts (eg some
> Xen ones) taking he interface down and then up after openais has started
> or b) "intelligent" switches taking too long to recognise the multicast
> join. So that both cluster nodes have "state" (the dirty flag) by the
> time they see each other.
>
> Chrissie

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090114/ba07823e/attachment.htm>

From chattygk at gmail.com  Wed Jan 14 10:06:28 2009
From: chattygk at gmail.com (Chaitanya Kulkarni)
Date: Wed, 14 Jan 2009 15:36:28 +0530
Subject: [Linux-cluster] Fields of the cman_tool status
Message-ID: <1ad236320901140206n2d2d03bpba2427408a80f8bf@mail.gmail.com>

Hi,

Following is output of my cman_tool status -

Protocol version: 5.0.1
Config version: 40
Cluster name: cluster1
Cluster ID: 39377
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 4
Node name: node1
Node addresses: <IP Address of the node>

Can anyone please explain me what the fields "Protocol version" and "Config
version" mean?

Any help is much appreciated.

Thanks,
Chaitanya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090114/785b76c5/attachment.htm>

From ccaulfie at redhat.com  Wed Jan 14 10:32:12 2009
From: ccaulfie at redhat.com (Chrissie Caulfield)
Date: Wed, 14 Jan 2009 10:32:12 +0000
Subject: [Linux-cluster] Fields of the cman_tool status
In-Reply-To: <1ad236320901140206n2d2d03bpba2427408a80f8bf@mail.gmail.com>
References: <1ad236320901140206n2d2d03bpba2427408a80f8bf@mail.gmail.com>
Message-ID: <496DBF2C.3070506@redhat.com>

Chaitanya Kulkarni wrote:
> Hi,
> 
> Following is output of my cman_tool status -
> 
> Protocol version: 5.0.1
> Config version: 40
> Cluster name: cluster1
> Cluster ID: 39377
> Cluster Member: Yes
> Membership state: Cluster-Member
> Nodes: 2
> Expected_votes: 1
> Total_votes: 2
> Quorum: 1
> Active subsystems: 4
> Node name: node1
> Node addresses: <IP Address of the node>
> 
> Can anyone please explain me what the fields "Protocol version" and
> "Config version" mean?
> 
>
Hiya

Protocol Version: is the version number of the communications protocol
used by cman over openais. It doesn't change much! You might see 5.2.0
on some later systems.

Config Version is the version number taken from cluster.conf.

Hope this helps,

Chrissie



From chattygk at gmail.com  Wed Jan 14 10:45:57 2009
From: chattygk at gmail.com (Chaitanya Kulkarni)
Date: Wed, 14 Jan 2009 16:15:57 +0530
Subject: [Linux-cluster] Fields of the cman_tool status
In-Reply-To: <496DBF2C.3070506@redhat.com>
References: <1ad236320901140206n2d2d03bpba2427408a80f8bf@mail.gmail.com>
	<496DBF2C.3070506@redhat.com>
Message-ID: <1ad236320901140245v1a8c6bc7n540d66199693eabc@mail.gmail.com>

Ok. Thanks for your prompt reply.

I have another question - Is there any way in which "using the Command Line
Interface" we can find the version of the Cluster Suite which has been
installed?

Thanks a lot again.

Regards,
Chaitanya

On Wed, Jan 14, 2009 at 4:02 PM, Chrissie Caulfield <ccaulfie at redhat.com>wrote:

> Chaitanya Kulkarni wrote:
> > Hi,
> >
> > Following is output of my cman_tool status -
> >
> > Protocol version: 5.0.1
> > Config version: 40
> > Cluster name: cluster1
> > Cluster ID: 39377
> > Cluster Member: Yes
> > Membership state: Cluster-Member
> > Nodes: 2
> > Expected_votes: 1
> > Total_votes: 2
> > Quorum: 1
> > Active subsystems: 4
> > Node name: node1
> > Node addresses: <IP Address of the node>
> >
> > Can anyone please explain me what the fields "Protocol version" and
> > "Config version" mean?
> >
> >
> Hiya
>
> Protocol Version: is the version number of the communications protocol
> used by cman over openais. It doesn't change much! You might see 5.2.0
> on some later systems.
>
> Config Version is the version number taken from cluster.conf.
>
> Hope this helps,
>
> Chrissie
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090114/56f6153f/attachment.htm>

From ccaulfie at redhat.com  Wed Jan 14 10:52:16 2009
From: ccaulfie at redhat.com (Chrissie Caulfield)
Date: Wed, 14 Jan 2009 10:52:16 +0000
Subject: [Linux-cluster] Fields of the cman_tool status
In-Reply-To: <1ad236320901140245v1a8c6bc7n540d66199693eabc@mail.gmail.com>
References: <1ad236320901140206n2d2d03bpba2427408a80f8bf@mail.gmail.com>	<496DBF2C.3070506@redhat.com>
	<1ad236320901140245v1a8c6bc7n540d66199693eabc@mail.gmail.com>
Message-ID: <496DC3E0.4050100@redhat.com>

Chaitanya Kulkarni wrote:
> Ok. Thanks for your prompt reply.
> 
> I have another question - Is there any way in which "using the Command
> Line Interface" we can find the version of the Cluster Suite which has
> been installed?

It really depends on how much detail you need. Cluster Suite is .. well,
it's a suite! All the components have their own version numbers.

So, using rpm is the best way really.

That's not the answer you wanted I suspect, but it's the most reliable
way of determining just which versions of everything you have.

Chrissie

> Thanks a lot again.
> 
> Regards,
> Chaitanya
> 
> On Wed, Jan 14, 2009 at 4:02 PM, Chrissie Caulfield <ccaulfie at redhat.com
> <mailto:ccaulfie at redhat.com>> wrote:
> 
>     Chaitanya Kulkarni wrote:
>     > Hi,
>     >
>     > Following is output of my cman_tool status -
>     >
>     > Protocol version: 5.0.1
>     > Config version: 40
>     > Cluster name: cluster1
>     > Cluster ID: 39377
>     > Cluster Member: Yes
>     > Membership state: Cluster-Member
>     > Nodes: 2
>     > Expected_votes: 1
>     > Total_votes: 2
>     > Quorum: 1
>     > Active subsystems: 4
>     > Node name: node1
>     > Node addresses: <IP Address of the node>
>     >
>     > Can anyone please explain me what the fields "Protocol version" and
>     > "Config version" mean?
>     >
>     >
>     Hiya
> 
>     Protocol Version: is the version number of the communications protocol
>     used by cman over openais. It doesn't change much! You might see 5.2.0
>     on some later systems.
> 
>     Config Version is the version number taken from cluster.conf.
> 
>     Hope this helps,
> 
>     Chrissie
> 
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


-- 

Chrissie



From chattygk at gmail.com  Wed Jan 14 10:56:43 2009
From: chattygk at gmail.com (Chaitanya Kulkarni)
Date: Wed, 14 Jan 2009 16:26:43 +0530
Subject: [Linux-cluster] Fields of the cman_tool status
In-Reply-To: <496DC3E0.4050100@redhat.com>
References: <1ad236320901140206n2d2d03bpba2427408a80f8bf@mail.gmail.com>
	<496DBF2C.3070506@redhat.com>
	<1ad236320901140245v1a8c6bc7n540d66199693eabc@mail.gmail.com>
	<496DC3E0.4050100@redhat.com>
Message-ID: <1ad236320901140256q743b0cbcg5a3afd4e0fb00cb@mail.gmail.com>

You guessed it right Chrissie.

In the redhat-release file present in /etc, we get a very user friendly name
of the Release and the Version like -

Red Hat Enterprise Linux AS release 4 (Nahant Update 3)

Is there something similar we can get for the Cluster Suite?

Thank you very much for your responses Chrissie.

Thanks,
Chaitanya

On Wed, Jan 14, 2009 at 4:22 PM, Chrissie Caulfield <ccaulfie at redhat.com>wrote:

> Chaitanya Kulkarni wrote:
> > Ok. Thanks for your prompt reply.
> >
> > I have another question - Is there any way in which "using the Command
> > Line Interface" we can find the version of the Cluster Suite which has
> > been installed?
>
> It really depends on how much detail you need. Cluster Suite is .. well,
> it's a suite! All the components have their own version numbers.
>
> So, using rpm is the best way really.
>
> That's not the answer you wanted I suspect, but it's the most reliable
> way of determining just which versions of everything you have.
>
> Chrissie
>
> > Thanks a lot again.
> >
> > Regards,
> > Chaitanya
> >
> > On Wed, Jan 14, 2009 at 4:02 PM, Chrissie Caulfield <ccaulfie at redhat.com
> > <mailto:ccaulfie at redhat.com>> wrote:
> >
> >     Chaitanya Kulkarni wrote:
> >     > Hi,
> >     >
> >     > Following is output of my cman_tool status -
> >     >
> >     > Protocol version: 5.0.1
> >     > Config version: 40
> >     > Cluster name: cluster1
> >     > Cluster ID: 39377
> >     > Cluster Member: Yes
> >     > Membership state: Cluster-Member
> >     > Nodes: 2
> >     > Expected_votes: 1
> >     > Total_votes: 2
> >     > Quorum: 1
> >     > Active subsystems: 4
> >     > Node name: node1
> >     > Node addresses: <IP Address of the node>
> >     >
> >     > Can anyone please explain me what the fields "Protocol version" and
> >     > "Config version" mean?
> >     >
> >     >
> >     Hiya
> >
> >     Protocol Version: is the version number of the communications
> protocol
> >     used by cman over openais. It doesn't change much! You might see
> 5.2.0
> >     on some later systems.
> >
> >     Config Version is the version number taken from cluster.conf.
> >
> >     Hope this helps,
> >
> >     Chrissie
> >
> >     --
> >     Linux-cluster mailing list
> >     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
> >     https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
>
> Chrissie
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090114/8aebc651/attachment.htm>

From ccaulfie at redhat.com  Wed Jan 14 11:02:36 2009
From: ccaulfie at redhat.com (Chrissie Caulfield)
Date: Wed, 14 Jan 2009 11:02:36 +0000
Subject: [Linux-cluster] Fields of the cman_tool status
In-Reply-To: <1ad236320901140256q743b0cbcg5a3afd4e0fb00cb@mail.gmail.com>
References: <1ad236320901140206n2d2d03bpba2427408a80f8bf@mail.gmail.com>	
	<496DBF2C.3070506@redhat.com>	
	<1ad236320901140245v1a8c6bc7n540d66199693eabc@mail.gmail.com>	
	<496DC3E0.4050100@redhat.com>
	<1ad236320901140256q743b0cbcg5a3afd4e0fb00cb@mail.gmail.com>
Message-ID: <496DC64C.9070606@redhat.com>

Chaitanya Kulkarni wrote:
> You guessed it right Chrissie.
> 
> In the redhat-release file present in /etc, we get a very user friendly
> name of the Release and the Version like -
> 
> Red Hat Enterprise Linux AS release 4 (Nahant Update 3)
> 
> Is there something similar we can get for the Cluster Suite?

Well, Cluster Suite is updated at the same time as the rest of RHEL, so
the redhat-release version is as valid for clustering as it is for the
operating system base ... ie not very!

You might have extra patches for some packages but not others. These
don't change the update version - only a full upgrade does that.

So, if you want a very rough idea of the version of clustering you have,
then redhat-release is as valid as it is for the rest of the system. But
I suspect that if you report a bug then GSS will want to know actual
package versions!

Chrissie

> Thank you very much for your responses Chrissie.
> 
> Thanks,
> Chaitanya
> 
> On Wed, Jan 14, 2009 at 4:22 PM, Chrissie Caulfield <ccaulfie at redhat.com
> <mailto:ccaulfie at redhat.com>> wrote:
> 
>     Chaitanya Kulkarni wrote:
>     > Ok. Thanks for your prompt reply.
>     >
>     > I have another question - Is there any way in which "using the Command
>     > Line Interface" we can find the version of the Cluster Suite which has
>     > been installed?
> 
>     It really depends on how much detail you need. Cluster Suite is .. well,
>     it's a suite! All the components have their own version numbers.
> 
>     So, using rpm is the best way really.
> 
>     That's not the answer you wanted I suspect, but it's the most reliable
>     way of determining just which versions of everything you have.
> 
>     Chrissie
> 
>     > Thanks a lot again.
>     >
>     > Regards,
>     > Chaitanya
>     >
>     > On Wed, Jan 14, 2009 at 4:02 PM, Chrissie Caulfield
>     <ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>
>     > <mailto:ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>>> wrote:
>     >
>     >     Chaitanya Kulkarni wrote:
>     >     > Hi,
>     >     >
>     >     > Following is output of my cman_tool status -
>     >     >
>     >     > Protocol version: 5.0.1
>     >     > Config version: 40
>     >     > Cluster name: cluster1
>     >     > Cluster ID: 39377
>     >     > Cluster Member: Yes
>     >     > Membership state: Cluster-Member
>     >     > Nodes: 2
>     >     > Expected_votes: 1
>     >     > Total_votes: 2
>     >     > Quorum: 1
>     >     > Active subsystems: 4
>     >     > Node name: node1
>     >     > Node addresses: <IP Address of the node>
>     >     >
>     >     > Can anyone please explain me what the fields "Protocol
>     version" and
>     >     > "Config version" mean?
>     >     >
>     >     >
>     >     Hiya
>     >
>     >     Protocol Version: is the version number of the communications
>     protocol
>     >     used by cman over openais. It doesn't change much! You might
>     see 5.2.0
>     >     on some later systems.
>     >
>     >     Config Version is the version number taken from cluster.conf.
>     >
>     >     Hope this helps,
>     >
>     >     Chrissie
>     >
>     >     --
>     >     Linux-cluster mailing list
>     >     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     <mailto:Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>>
>     >     https://www.redhat.com/mailman/listinfo/linux-cluster
>     >
>     >
>     >
>     >
>     ------------------------------------------------------------------------
>     >
>     > --
>     > Linux-cluster mailing list
>     > Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
>     --
> 
>     Chrissie
> 
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 


-- 

Chrissie



From joaquim.machado at ine.pt  Wed Jan 14 13:48:17 2009
From: joaquim.machado at ine.pt (Joaquim Machado)
Date: Wed, 14 Jan 2009 13:48:17 +0000
Subject: [Linux-cluster] Red Hat Cluster Suite and Oracle RAC
Message-ID: <496DED21.6010702@ine.pt>

Hi,

I need to setup a 2 node cluster environment for an Oracle 10g Database 
in Red Hat.
If I chose to use Red Hat Cluster Suite, do I need Oracle RAC, or can I 
use a normal Oracle 10g version?
Even if it is an active/passive node setup?

TIA,
Joaquim Machado



From Harri.Paivaniemi at tieto.com  Wed Jan 14 14:02:45 2009
From: Harri.Paivaniemi at tieto.com (Harri.Paivaniemi at tieto.com)
Date: Wed, 14 Jan 2009 16:02:45 +0200
Subject: [Linux-cluster] Red Hat Cluster Suite and Oracle RAC
References: <496DED21.6010702@ine.pt>
Message-ID: <41E8D4F07FCE154CBEBAA60FFC92F67754D44C@apollo.eu.tieto.com>

No,

you don't need RAC, you can cluster Oracle instances just like any other thinkg in the world.

You can make either an active-passive cluster (another as a spare) or an active-active where you spread database instances to both nodes and in the case of failure all run on one node. Or anything else.

You don't even need GFS, OCFS, LVM etc, you can keep things very simple, if you want to ;)


-hjp



-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of Joaquim Machado
Sent: Wed 1/14/2009 15:48
To: linux-cluster at redhat.com
Subject: [Linux-cluster] Red Hat Cluster Suite and Oracle RAC
 
Hi,

I need to setup a 2 node cluster environment for an Oracle 10g Database 
in Red Hat.
If I chose to use Red Hat Cluster Suite, do I need Oracle RAC, or can I 
use a normal Oracle 10g version?
Even if it is an active/passive node setup?

TIA,
Joaquim Machado

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3045 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090114/3abf57d4/attachment.bin>

From cluster at xinet.it  Wed Jan 14 18:29:23 2009
From: cluster at xinet.it (Cluster Management)
Date: Wed, 14 Jan 2009 19:29:23 +0100
Subject: [Linux-cluster] Create Logical Volume 
Message-ID: <008201c97676$05c36600$114a3200$@it>

Hi all,

 

i have a two_node cluster RHEL 5 and an external ISCSI storage. I use XEN
for virtualizzation purpose. When i create a new LUN in my storage i use
hot_add command to discover it from nodes.

The problem is that i have to restart clvmd to be able to create a new
Logical Volume. This operation is very critical because i have to stop or
migrate each VM running on the node ed i have to umout their own LUN.

Is there a way to update clvmd without restarting?

 

Thanks a lot,

--

Francesco Gallo

XiNet S.r.L.

gallo (at) xinet (dot) it

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090114/e827f01f/attachment.htm>

From pradhanparas at gmail.com  Wed Jan 14 19:48:58 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Wed, 14 Jan 2009 13:48:58 -0600
Subject: [Linux-cluster] Re: Fencing test
In-Reply-To: <8786b91c0901082057t63abc80ct6ae041873a859bf@mail.gmail.com>
References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com>
	<8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com>
	<8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com>
	<8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com>
	<8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com>
	<8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com>
	<8786b91c0901050623m46c79628i795e18dda28474c9@mail.gmail.com>
	<8b711df40901051011x79066243g38108439ffb1075f@mail.gmail.com>
	<8b711df40901081039m4351f8b9te7d3a2a10e118328@mail.gmail.com>
	<8786b91c0901082057t63abc80ct6ae041873a859bf@mail.gmail.com>
Message-ID: <8b711df40901141148k740cf738ha9e43f0222b0a4ce@mail.gmail.com>

On Thu, Jan 8, 2009 at 10:57 PM, Rajagopal Swaminathan
<raju.rajsand at gmail.com> wrote:
> Greetings,
>
> On Fri, Jan 9, 2009 at 12:09 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>
>>
>> In an act to solve my fencing issue in my 2 node cluster, i tried to
>> run fence_ipmi to check if fencing is working or not. I need to know
>> what is my problem
>>
>> -
>> [root at ha1lx ~]# fence_ipmilan -a 10.42.21.28 -o off -l admin -p admin
>> Powering off machine @ IPMI:10.42.21.28...ipmilan: Failed to connect
>> after 30 seconds
>> Failed
>> [root at ha1lx ~]#
>> ---------------
>>
>>
>> Here 10.42.21.28 is an IP address assigned to IPMI interface and I am
>> running this command in the same host.
>>
>
> Sorry couldn't respond earlier as I do this on personal time (which as
> useual limited for us IT guys and gals ;-) ) and not during work per
> se..
>
> Do not run fence script from the node that you want to fence.
>
> Let us say you want to fence node 3.
> 1. Try pinging the node 3's IPMI from node 4. I should be successful
> 2. Issue the fence command from Node 4 with IP of Node 3 IPMI as argument .
>
>
> HTH
>
> With warm regards
>
> Rajagopal
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

Yes as you said, I am able to power down node4 using node3, so it
seems ipmi is working fine. But I dunno what is going on with my two
node cluster. Can a red hat cluster operates fine in a two nodes mode?
Do i need qdisk or it is optional. Which area do i need to focus to
run my 2 nodes red hat cluster using ipmi as fencing device.

Thanks
Paras.



From denisb+gmane at gmail.com  Thu Jan 15 09:49:11 2009
From: denisb+gmane at gmail.com (denis)
Date: Thu, 15 Jan 2009 10:49:11 +0100
Subject: [Linux-cluster] Documentation enhancement requests/patches?
Message-ID: <gkn0qn$n57$1@ger.gmane.org>

Hi,

After getting to know the "Configuring and Managing a Red Hat Cluster"
documentation [1] fairly well, I have a few enhancement suggestions.
What is the best way to submit these?

[1]
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Cluster_Administration/

Regards
--
Denis Braekhus



From denisb+gmane at gmail.com  Thu Jan 15 09:52:00 2009
From: denisb+gmane at gmail.com (denis)
Date: Thu, 15 Jan 2009 10:52:00 +0100
Subject: [Linux-cluster] Re: List Cluster Resources
In-Reply-To: <1ad236320901120314m35b1204bxfc2f24185b830dc2@mail.gmail.com>
References: <1ad236320901120314m35b1204bxfc2f24185b830dc2@mail.gmail.com>
Message-ID: <gkn100$n57$2@ger.gmane.org>

Chaitanya Kulkarni wrote:
> Hi All,
> 
> I am new to the RHEL Clusters. Is there any way, (other than the
> cluster.conf file) using which we can view / list all the Cluster
> Resources that are used under the Cluster Service (Resource Group)? Some
> command which might give some output as -
> 
> Service Name = Service1
> 
> Resources -
> IP Address = <Value>
> File System = <Value>
> Script = <Value>

Hi Chaitanya,

I recently discovered the rg_test tool, it might be of help to you. It
does currently not have a man page, but check the "Configuring and
Managing a Red Hat Cluster" chapter "Debugging and Testing Services and
Resource Ordering" [1] for usage.

Hope this is of some help to you.

[1]
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Cluster_Administration/s1-clust-rsc-testing-config-CA.html

Regards
-- 
Denis Braekhus



From denisb+gmane at gmail.com  Thu Jan 15 11:00:03 2009
From: denisb+gmane at gmail.com (denis)
Date: Thu, 15 Jan 2009 12:00:03 +0100
Subject: [Linux-cluster] "Simple" Managed NFS setup
Message-ID: <gkn4vj$5jp$1@ger.gmane.org>

Hi,

I have begun a setup with a pretty simple 3-node cluster and a couple of
services. One of these is NFS, and I have setup the basics as laid out
in the included cluster.conf below.

A couple of questions :

1. Do I need to keep the nfs-state information on the NFS_homes volume
so as to keep it in sync between clusternodes?

2. The nfsclient name="nfs" is added to enable the current NFS serving
node to mount its own export, otherwise I got

Jan 15 11:41:03 node03 mountd[14229]: mount request from unknown host
XX.XX.XX.174 for /mnt/nfshome (/mnt/nfshome)

This is obviously caused by the mount connecting as the NFS service
address instead of the hostaddress, what is the best way to resolve
this? Mounting with the serviceaddress is not a good solution it seems,
as failing the service over is problematic when that address is in use
locally.

3. I read "The Red Hat Cluster Suite NFS Cookbook" [1], as the reference
Red Hat documentation was a bit thin regarding best practices. Is there
more documentation available to read?


Any tips/pointers/help highly appreciated.


<rm>
        <failoverdomains>
                <failoverdomain name="failover_nfshome" ordered="1"
restricted="1">
                        <failoverdomainnode name="node01.domain"
priority="30"/>
                        <failoverdomainnode name="node02.domain"
priority="30"/>
                        <failoverdomainnode name="node03.domain"
priority="10"/>
                </failoverdomain>
        </failoverdomains>
        <resources>
                <ip address="XX.XX.XX.174" monitor_link="1"/>
                <nfsexport name="NFShome"/>
                <fs device="/dev/mapper/NFS_homes" fsid="2"
force_fsck="1" force_unmount="1" fstype="ext3" mountpoint="/mnt/nfshome"
name="nfs_homes" self_fence="0"/>
                <nfsclient name="node01" options="rw"
target="node01.domain"/>
                <nfsclient name="node02" options="rw"
target="node02.domain"/>
                <nfsclient name="node03" options="rw"
target="node03.domain"/>
                <nfsclient name="nfs" options="rw" target="nfs.domain"/>
        </resources>
        <service autostart="0" domain="failover_nfshome" exclusive="0"
name="client_nfshome" recovery="restart">
                <ip ref="XX.XX.XX.174"/>
                <fs ref="nfs_homes">
                        <nfsexport name="nfshome">
                                <nfsclient ref="node01"/>
                                <nfsclient ref="node02"/>
                                <nfsclient ref="node03"/>
				<nfsclient ref="nfs"/>
                        </nfsexport>
                </fs>
        </service>
</rm>


[1] http://sources.redhat.com/cluster/doc/nfscookbook.pdf

Best Regards
-- 
Denis Braekhus



From Alain.Moulle at bull.net  Thu Jan 15 12:48:42 2009
From: Alain.Moulle at bull.net (Alain.Moulle)
Date: Thu, 15 Jan 2009 13:48:42 +0100
Subject: [Linux-cluster] cman-2.0.98-1.el5 / question about a problem when
	launching cman
Message-ID: <496F30AA.3000106@bull.net>

Hi ,
About this problem, I wonder if it is a definitive behavior considered 
as normal ?
or if this will work differently in a next release of cman or openais ?
(in previous versions with cman-2.0.73, we did not had this problem)
Thanks if someone could give an answer...
Regards,
Alain
> Release : cman-2.0.98-1.el5
> (but same problem with 2.0.95)
>
> I face a problem when launching cman on a two-node cluster :
>
> 1. Launching cman on node 1 : OK
> 2. When launching cman on node 2, the log on node1 gives :
>     cman killed by node 2 because we rejoined the cluster without a full 
> restart
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090115/dd62c325/attachment.htm>

From rpeterso at redhat.com  Thu Jan 15 13:59:56 2009
From: rpeterso at redhat.com (Bob Peterson)
Date: Thu, 15 Jan 2009 08:59:56 -0500 (EST)
Subject: [Linux-cluster] Documentation enhancement requests/patches?
In-Reply-To: <2012779609.1429061232027887106.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
Message-ID: <40063497.1429461232027996695.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>

----- "denis" <denisb+gmane at gmail.com> wrote:
| Hi,
| 
| After getting to know the "Configuring and Managing a Red Hat
| Cluster"
| documentation [1] fairly well, I have a few enhancement suggestions.
| What is the best way to submit these?
| 
| [1]
| http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Cluster_Administration/
| 
| Regards
| --
| Denis Braekhus

Hi Denis,

Probably the best way to do this is to open a new bugzilla record
against product Red Hat Enterprise Linux 5, component
"Documentation--cluster"

If you have permission to look at it, you can follow this example:
https://bugzilla.redhat.com/show_bug.cgi?id=471364

You can assign it to slevine at redhat.com or pkennedy at redhat.com.

Regards,

Bob Peterson
Red Hat GFS



From ccaulfie at redhat.com  Thu Jan 15 15:01:17 2009
From: ccaulfie at redhat.com (Chrissie Caulfield)
Date: Thu, 15 Jan 2009 15:01:17 +0000
Subject: [Linux-cluster] cman-2.0.98-1.el5 / question about a problem
	when	launching cman
In-Reply-To: <496F30AA.3000106@bull.net>
References: <496F30AA.3000106@bull.net>
Message-ID: <496F4FBD.4040607@redhat.com>

Alain.Moulle wrote:
> Hi ,
> About this problem, I wonder if it is a definitive behavior considered
> as normal ?
> or if this will work differently in a next release of cman or openais ?
> (in previous versions with cman-2.0.73, we did not had this problem)
> Thanks if someone could give an answer...
> Regards,
> Alain
>> Release : cman-2.0.98-1.el5
>> (but same problem with 2.0.95)
>>
>> I face a problem when launching cman on a two-node cluster :
>>
>> 1. Launching cman on node 1 : OK
>> 2. When launching cman on node 2, the log on node1 gives :
>>     cman killed by node 2 because we rejoined the cluster without a full 
>> restart
> 

Alain,

I'm sure this question has been answer many times on IRC and on the
mailing list, as well as in the FAQ.


Chrissie



From ccaulfie at redhat.com  Thu Jan 15 15:05:33 2009
From: ccaulfie at redhat.com (Chrissie Caulfield)
Date: Thu, 15 Jan 2009 15:05:33 +0000
Subject: [Linux-cluster] Re: [Openais] cman in RHEL 5 cluster suite and
	Openais
In-Reply-To: <202789.86275.qm@web95314.mail.in2.yahoo.com>
References: <202789.86275.qm@web95314.mail.in2.yahoo.com>
Message-ID: <496F50BD.4020409@redhat.com>

Vivek Purohit wrote:
> Hi Steve,
> Thanks for the previous reply.
> 
> I was able to run the checkpointing tests in the tarball Openais
> on RHEL 5.
> 
> I explored and came to know that the CMAN service of RHEL 5's
> clustersuite runs as aisexec; thus the tests could be run directly.
> 
> Can you please tell how the Openais is being used by RHEL 5's
> CMAN service.
> 

Hi,

You might like to read these two documents:

http://people.redhat.com/ccaulfie/docs/aiscman.pdf
http://people.redhat.com/ccaulfie/docs/CSNetworking.pdf


-- 

Chrissie



From garromo at us.ibm.com  Thu Jan 15 16:08:08 2009
From: garromo at us.ibm.com (Gary Romo)
Date: Thu, 15 Jan 2009 09:08:08 -0700
Subject: [Linux-cluster] GFS/clvmd question
Message-ID: <OF48AE2273.1EEE35B0-ON8725753F.0057EC20-8725753F.0058A29F@us.ibm.com>


Why can't I mount my gfs logical volume on the second node in  the cluster?
I am creating a new GFS file system on an existing cluster.  Here is what I
did;

1.  I determined I had space in an existing volume group (both nodes)
2.  I created my logical volume (node 1)
3.  I ran my gfs_mkfs (node 1)
4.  I mounted my new lv on node 1 only

Here is the error I get on node 2

# mount /gfs/new_mount
/sbin/mount.gfs: invalid device path "/dev/vggfs/new_lv"

I see that the logical volume is "inactive" on node2 and "ACTIVE" on node 1

inactive          '/dev/vgclgfs/new_lv' [25.00 GB] inherit

ACTIVE            '/dev/vgclgfs/new_lv' [25.00 GB] inherit

What do I need to do in order to make this logical volume active on node
2 ?
I thought that this would have happened automagically via clvmd, and not
have to be done manually.


Gary Romo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090115/1be5b3b7/attachment.htm>

From garromo at us.ibm.com  Thu Jan 15 17:27:29 2009
From: garromo at us.ibm.com (Gary Romo)
Date: Thu, 15 Jan 2009 10:27:29 -0700
Subject: [Linux-cluster] GFS/clvmd question
In-Reply-To: <OF48AE2273.1EEE35B0-ON8725753F.0057EC20-8725753F.0058A29F@us.ibm.com>
Message-ID: <OFA78E6F1D.1E89A944-ON8725753F.005FC66B-8725753F.005FE67E@us.ibm.com>


I got my logical volume activated via a "lvchange -ay /dev/vg/new_lv" and
was able to mount it.
I just wondered why clvmd/gfs did not handle this, as I have seen it
before.

Gary Romo



                                                                           
             Gary                                                          
             Romo/Denver/IBM at I                                             
             BMUS                                                       To 
             Sent by:                  linux-cluster at redhat.com            
             linux-cluster-bou                                          cc 
             nces at redhat.com                                               
                                                                   Subject 
                                       [Linux-cluster] GFS/clvmd question  
             01/15/2009 09:08                                              
             AM                                                            
                                                                           
                                                                           
             Please respond to                                             
             linux clustering                                              
             <linux-cluster at re                                             
                 dhat.com>                                                 
                                                                           
                                                                           




Why can't I mount my gfs logical volume on the second node in the cluster?
I am creating a new GFS file system on an existing cluster. Here is what I
did;

1. I determined I had space in an existing volume group (both nodes)
2. I created my logical volume (node 1)
3. I ran my gfs_mkfs (node 1)
4. I mounted my new lv on node 1 only

Here is the error I get on node 2

# mount /gfs/new_mount
/sbin/mount.gfs: invalid device path "/dev/vggfs/new_lv"

I see that the logical volume is "inactive" on node2 and "ACTIVE" on node 1

inactive '/dev/vgclgfs/new_lv' [25.00 GB] inherit

ACTIVE '/dev/vgclgfs/new_lv' [25.00 GB] inherit

What do I need to do in order to make this logical volume active on node
2 ?
I thought that this would have happened automagically via clvmd, and not
have to be done manually.


Gary Romo--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090115/1be6ec8a/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090115/1be6ec8a/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pic24069.gif
Type: image/gif
Size: 1255 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090115/1be6ec8a/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090115/1be6ec8a/attachment-0002.gif>

From matt at bravenet.com  Fri Jan 16 02:52:52 2009
From: matt at bravenet.com (Matthew Kent)
Date: Thu, 15 Jan 2009 18:52:52 -0800
Subject: [Linux-cluster] Qdisk in initial quorum
In-Reply-To: <496AD30E.1010608@kkoncepts.net>
References: <4965FE33.10509@kkoncepts.net>
	<6008E5CED89FD44A86D3C376519E1DB2102553963B@megatron.ms.a2end.com>
	<496AD30E.1010608@kkoncepts.net>
Message-ID: <1232074372.24903.55.camel@fuego>

On Mon, 2009-01-12 at 13:20 +0800, Jeff Jansen wrote:
<snip>
> 
> Hmmmmmm, that's pretty much exactly what I have.  But qdiskd can't start because
> ccsd refuses the connection saying that it's not quorate.

Funny - just encountered this exact issue this morning during some
testing and have been trying to solve it.

Found a bug report that contains a patch that fixes this
https://bugzilla.redhat.com/show_bug.cgi?id=436381 see the last comment.
Guessing by the bug status it's coming in RHEL 5.3, but for now the
patch applies cleanly to the cman rpm from 5.2.

It does solve the issue, starting qdisk shortly after cman, getting its
vote and achieving a quorum, but it does leave some ugliness in the logs
in my testing. Mainly due to qdisk starting in the background and taking
10-20 secs in my config to provide its vote and achieve quorum. I've got
a small patch for this which I'll submit shortly.
-- 
Matthew Kent \ SA \ bravenet.com



From jeff.jansen at kkoncepts.net  Fri Jan 16 07:38:48 2009
From: jeff.jansen at kkoncepts.net (Jeff Jansen)
Date: Fri, 16 Jan 2009 15:38:48 +0800
Subject: [Linux-cluster] Qdisk in initial quorum
In-Reply-To: <1232074372.24903.55.camel@fuego>
References: <4965FE33.10509@kkoncepts.net>	<6008E5CED89FD44A86D3C376519E1DB2102553963B@megatron.ms.a2end.com>	<496AD30E.1010608@kkoncepts.net>
	<1232074372.24903.55.camel@fuego>
Message-ID: <49703988.5040605@kkoncepts.net>

Matthew Kent <matt at bravenet.com> wrote on 2009-Jan-16:
> Funny - just encountered this exact issue this morning during some
> testing and have been trying to solve it.
> 
> Found a bug report that contains a patch that fixes this
> https://bugzilla.redhat.com/show_bug.cgi?id=436381 see the last comment.
> Guessing by the bug status it's coming in RHEL 5.3, but for now the
> patch applies cleanly to the cman rpm from 5.2.
> 
> It does solve the issue, starting qdisk shortly after cman, getting its
> vote and achieving a quorum, 

Thanks Matt.  But my problem is that qdiskd can't ever start UNLESS there's
already a quorum.   ccsd simply won't talk to it until after there's quorum.  So
the qdisk keeps quorum once both nodes are up if one later dies.  But a single
node and can never get quorum alone because qdisk can't start and add its vote.

Jeff Jansen



From raju.rajsand at gmail.com  Fri Jan 16 10:58:04 2009
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Fri, 16 Jan 2009 16:28:04 +0530
Subject: [Linux-cluster] Red Hat Cluster Suite and Oracle RAC
In-Reply-To: <41E8D4F07FCE154CBEBAA60FFC92F67754D44C@apollo.eu.tieto.com>
References: <496DED21.6010702@ine.pt>
	<41E8D4F07FCE154CBEBAA60FFC92F67754D44C@apollo.eu.tieto.com>
Message-ID: <8786b91c0901160258n6b167f58k1b49b4caca06e41c@mail.gmail.com>

Greetings,

2009/1/14  <Harri.Paivaniemi at tieto.com>:
> No,
>
> you don't need RAC, you can cluster Oracle instances just like any other
> thinkg in the world.

Huh!

>
> You can make either an active-passive cluster (another as a spare) or
> an active-active where you spread database instances to both nodes
> and in the case of failure all run on one node. Or anything else.

AFAIK, Active-active _Database_ cluster is meant for  _One_  clustered
instance of database.

Also AFAIK to preserve the Cache coherancy across nodes One does
Require RAC which provides.

Which requires CLVM & GFS/OCFS or VCS or some other clustered filesystem.

This provides load sharing when all the nodes are up and reduced
performance (for the same amuont of load)

Typically used for real 24x7 availability like ATM, Plane/train
reservations backend database.


>
> You don't even need GFS, OCFS, LVM etc, you can keep things very simple,
> if you want to ;)


Please correct me if I am wrong somewhere above.

Regards and Thanks,

Rajagopal



From raju.rajsand at gmail.com  Fri Jan 16 11:16:48 2009
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Fri, 16 Jan 2009 16:46:48 +0530
Subject: [Linux-cluster] Re: Fencing test
In-Reply-To: <8b711df40901141148k740cf738ha9e43f0222b0a4ce@mail.gmail.com>
References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com>
	<8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com>
	<8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com>
	<8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com>
	<8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com>
	<8786b91c0901050623m46c79628i795e18dda28474c9@mail.gmail.com>
	<8b711df40901051011x79066243g38108439ffb1075f@mail.gmail.com>
	<8b711df40901081039m4351f8b9te7d3a2a10e118328@mail.gmail.com>
	<8786b91c0901082057t63abc80ct6ae041873a859bf@mail.gmail.com>
	<8b711df40901141148k740cf738ha9e43f0222b0a4ce@mail.gmail.com>
Message-ID: <8786b91c0901160316q307ece89m460b5e7cd4705f11@mail.gmail.com>

Greetings,

On Thu, Jan 15, 2009 at 1:18 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>> On Fri, Jan 9, 2009 at 12:09 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>
>>>
>>> In an act to solve my fencing issue in my 2 node cluster, i tried to
>>> run fence_ipmi to check if fencing is working or not. I need to know
>>> what is my problem
>>>
>>> -
>>> [root at ha1lx ~]# fence_ipmilan -a 10.42.21.28 -o off -l admin -p admin
> Yes as you said, I am able to power down node4 using node3, so it
> seems ipmi is working fine. But I dunno what is going on with my two
> node cluster. Can a red hat cluster operates fine in a two nodes mode?

Yes. I have configured few clusters on RHEL 4 and 5. They do work.

> Do i need qdisk or it is optional. Which area do i need to focus to
> run my 2 nodes red hat cluster using ipmi as fencing device.
>
But I have done it on HP, SUN and IBM servers. All of them have their
own technology like HP-ILO, SUN-ALOm etc.

I never had a chance on an IPMI.

BTW, This is a wild guess. I am just curious:

>              <clusternode name="10.42.21.27" nodeid="2" votes="1">

Why nodeid here is 2

>                               <method name="1">
>                                      <device name="fence1"/>
>                               </method>
>
>               <fencedevice agent="fence_ipmilan" ipaddr="10.42.21.28" login="admin"
> name="fence1" passwd="admin"/>

and

>               <clusternode name="10.42.21.29" nodeid="1" votes="1">

here it is 1


>                               <method name="1">
>                                       <device name="fence2"/>
>                               </method>

>               <fencedevice agent="fence_ipmilan" ipaddr="10.42.21.30" login="admin" name="fence2" passwd="admin"/>

<All the disclaimers ever invented apply>
HAve you tried exhanging the numbers? say the one with IP .27 to 1 and .29 to 2.
</All the disclaimers ever invented apply>

No warranties offered. Just a friendly suggestion....
Never try it on Production cluster.

Also we all will get a clearer picture if you use seperate switches
for heartbeat and data networks.

HTH

With warm regards

Rajagopal



From sunhux at gmail.com  Fri Jan 16 15:21:41 2009
From: sunhux at gmail.com (sunhux G)
Date: Fri, 16 Jan 2009 23:21:41 +0800
Subject: [Linux-cluster] 2 IP addrs on one NIC : for 2 applications 
	listening on same tcp port to bind to
In-Reply-To: <1231862922.19474.0.camel@marc>
References: <60f08e700901130405nd544e64y1408c04c9f0b534e@mail.gmail.com>
	<1231849108.9163.0.camel@dima-desktop>
	<60f08e700901130802j1feeb6es37722b79ecf12f1b@mail.gmail.com>
	<1231862922.19474.0.camel@marc>
Message-ID: <60f08e700901160721w28baff8did12a3e549daf0e1d@mail.gmail.com>

Sorry that repeated replies were posted, I must have clicked
too many times.

Just to verify, I'll just make a copy of existing file :
*cp -p /etc/sysconfig/network-scripts/ifcfg-eth0 * *
/etc/sysconfig/network-scripts/ifcfg-eth0:1*

Then edit ifcfg-eth0:1   to update it with the secondary IP address/mask
- is this correct?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090116/26499b53/attachment.htm>

From Shaun.Mccullagh at espritxb.nl  Fri Jan 16 15:31:08 2009
From: Shaun.Mccullagh at espritxb.nl (Shaun Mccullagh)
Date: Fri, 16 Jan 2009 16:31:08 +0100
Subject: [Linux-cluster] 2 IP addrs on one NIC : for 2 applications
	listening on same tcp port to bind to
In-Reply-To: <60f08e700901160721w28baff8did12a3e549daf0e1d@mail.gmail.com>
References: <60f08e700901130405nd544e64y1408c04c9f0b534e@mail.gmail.com>
	<1231849108.9163.0.camel@dima-desktop>
	<60f08e700901130802j1feeb6es37722b79ecf12f1b@mail.gmail.com>
	<1231862922.19474.0.camel@marc>,
	<60f08e700901160721w28baff8did12a3e549daf0e1d@mail.gmail.com>
Message-ID: <55557D0EBE9495428BFE94EF8BC5EBD26BF0ED9C39@EXCH01.campus.local>

You also need to change

DEVICE=eth0

to

DEVICE=eth0:1

in /etc/sysconfig/network-scripts/ifcfg-eth0:1

HTH
S
________________________________
From: linux-cluster-bounces at redhat.com [linux-cluster-bounces at redhat.com] On Behalf Of sunhux G [sunhux at gmail.com]
Sent: 16 January 2009 16:21
To: linux clustering
Subject: Re: [Linux-cluster] 2 IP addrs on one NIC : for 2 applications listening on same tcp port to bind to


Sorry that repeated replies were posted, I must have clicked
too many times.

Just to verify, I'll just make a copy of existing file :
cp -p /etc/sysconfig/network-scripts/ifcfg-eth0  /etc/sysconfig/network-scripts/ifcfg-eth0:1

Then edit ifcfg-eth0:1   to update it with the secondary IP address/mask
- is this correct?


________________________________
Op dit e-mailbericht is een disclaimer van toepassing, welke te vinden is op http://www.espritxb.nl/disclaimer
NB: Vanaf heden zijn al onze mailadressen gewijzigd in ... at espritxb.nl! Pas dit svp aan in uw adresboek.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090116/55ebce17/attachment.htm>

From jumanjiman at gmail.com  Fri Jan 16 15:29:35 2009
From: jumanjiman at gmail.com (jumanjiman at gmail.com)
Date: Fri, 16 Jan 2009 15:29:35 +0000
Subject: [Linux-cluster] 2 IP addrs on one NIC : for 2 applications
	listening on same tcp port to bind to
In-Reply-To: <60f08e700901160721w28baff8did12a3e549daf0e1d@mail.gmail.com>
References: <60f08e700901130405nd544e64y1408c04c9f0b534e@mail.gmail.com><1231849108.9163.0.camel@dima-desktop><60f08e700901130802j1feeb6es37722b79ecf12f1b@mail.gmail.com><1231862922.19474.0.camel@marc><60f08e700901160721w28baff8did12a3e549daf0e1d@mail.gmail.com>
Message-ID: <965344884-1232119793-cardhu_decombobulator_blackberry.rim.net-1204810867-@bxe252.bisx.prod.on.blackberry>

Remember to edit DEVICE variable and remove netmask. 

I wasn't following the thread, but
you can avoid aliases altogether by creating /sbin/ifup-local (755). For details, see the last few lines of ifup-post.

-paul
Sent via BlackBerry by AT&T

-----Original Message-----
From: sunhux G <sunhux at gmail.com>

Date: Fri, 16 Jan 2009 23:21:41 
To: linux clustering<linux-cluster at redhat.com>
Subject: Re: [Linux-cluster] 2 IP addrs on one NIC : for 2 applications 
	listening on same tcp port to bind to


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From matt at bravenet.com  Fri Jan 16 17:49:50 2009
From: matt at bravenet.com (Matthew Kent)
Date: Fri, 16 Jan 2009 09:49:50 -0800
Subject: [Linux-cluster] Qdisk in initial quorum
In-Reply-To: <49703988.5040605@kkoncepts.net>
References: <4965FE33.10509@kkoncepts.net>
	<6008E5CED89FD44A86D3C376519E1DB2102553963B@megatron.ms.a2end.com>
	<496AD30E.1010608@kkoncepts.net> <1232074372.24903.55.camel@fuego>
	<49703988.5040605@kkoncepts.net>
Message-ID: <1232128190.24903.76.camel@fuego>

On Fri, 2009-01-16 at 15:38 +0800, Jeff Jansen wrote:
> Matthew Kent <matt at bravenet.com> wrote on 2009-Jan-16:
> > Funny - just encountered this exact issue this morning during some
> > testing and have been trying to solve it.
> > 
> > Found a bug report that contains a patch that fixes this
> > https://bugzilla.redhat.com/show_bug.cgi?id=436381 see the last comment.
> > Guessing by the bug status it's coming in RHEL 5.3, but for now the
> > patch applies cleanly to the cman rpm from 5.2.
> > 
> > It does solve the issue, starting qdisk shortly after cman, getting its
> > vote and achieving a quorum, 
> 
> Thanks Matt.  But my problem is that qdiskd can't ever start UNLESS there's
> already a quorum.   ccsd simply won't talk to it until after there's quorum.  So
> the qdisk keeps quorum once both nodes are up if one later dies.  But a single
> node and can never get quorum alone because qdisk can't start and add its vote.

Hmm.. well I am using the RHEL52 branch versus your STABLE2 branch, but
looking at the qdisk git history for both I don't see any massive
differences.

In my logs from a CentOS 5.2 server (with the patch I mentioned applied)
I get on startup with a single node:

ccsd[6551]: Cluster is not quorate.  Refusing connection.
ccsd[6551]: Error while processing connect: Connection refused
qdiskd[6606]: <info> Initial score 1/1
qdiskd[6606]: <info> Initialization complete
openais[6565]: [CMAN ] quorum device registered
qdiskd[6606]: <notice> Score sufficient for master operation (1/1;
required=1); upgrading
qdiskd[6606]: <info> Assuming master role
ccsd[6551]: Cluster is not quorate.  Refusing connection.
ccsd[6551]: Error while processing connect: Connection refused
ccsd[6551]: Cluster is not quorate.  Refusing connection.
ccsd[6551]: Error while processing connect: Connection refused
openais[6565]: [CMAN ] quorum regained, resuming activity

Looking at your original log messages are you certain ccsd is being
started prior to qdiskd?
-- 
Matthew Kent \ SA \ bravenet.com



From T.Kumar at alcoa.com  Fri Jan 16 18:22:06 2009
From: T.Kumar at alcoa.com (Kumar, T Santhosh (TCS))
Date: Fri, 16 Jan 2009 13:22:06 -0500
Subject: [Linux-cluster] Restart lvm with out dismounting LUN/FS
In-Reply-To: <20090115170014.83C1061A2BE@hormel.redhat.com>
References: <20090115170014.83C1061A2BE@hormel.redhat.com>
Message-ID: <0C3FC6B507AF684199E57BFCA3EAB5532E572916@NOANDC-MXU11.NOA.Alcoa.com>


Restart LVM with out dismounting LUN/FS. Here is what u have to do.

#killall clvmd
#/usr/sbin/clvmd 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of
linux-cluster-request at redhat.com
Sent: Thursday, January 15, 2009 12:00 PM
To: linux-cluster at redhat.com
Subject: Linux-cluster Digest, Vol 57, Issue 14

Send Linux-cluster mailing list submissions to
	linux-cluster at redhat.com

To subscribe or unsubscribe via the World Wide Web, visit
	https://www.redhat.com/mailman/listinfo/linux-cluster
or, via email, send a message with subject or body 'help' to
	linux-cluster-request at redhat.com

You can reach the person managing the list at
	linux-cluster-owner at redhat.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-cluster digest..."


Today's Topics:

   1. Create Logical Volume  (Cluster Management)
   2. Re: Re: Fencing test (Paras pradhan)
   3. Documentation enhancement requests/patches? (denis)
   4. Re: List Cluster Resources (denis)
   5. "Simple" Managed NFS setup (denis)
   6. cman-2.0.98-1.el5 / question about a problem when	launching
      cman (Alain.Moulle)
   7. Re: Documentation enhancement requests/patches? (Bob Peterson)
   8. Re: cman-2.0.98-1.el5 / question about a problem	when
      launching cman (Chrissie Caulfield)
   9. Re: [Openais] cman in RHEL 5 cluster suite and	Openais
      (Chrissie Caulfield)
  10. GFS/clvmd question (Gary Romo)


----------------------------------------------------------------------

Message: 1
Date: Wed, 14 Jan 2009 19:29:23 +0100
From: "Cluster Management" <cluster at xinet.it>
Subject: [Linux-cluster] Create Logical Volume 
To: <linux-cluster at redhat.com>
Message-ID: <008201c97676$05c36600$114a3200$@it>
Content-Type: text/plain; charset="us-ascii"

Hi all,

 

i have a two_node cluster RHEL 5 and an external ISCSI storage. I use
XEN
for virtualizzation purpose. When i create a new LUN in my storage i use
hot_add command to discover it from nodes.

The problem is that i have to restart clvmd to be able to create a new
Logical Volume. This operation is very critical because i have to stop
or
migrate each VM running on the node ed i have to umout their own LUN.

Is there a way to update clvmd without restarting?

 

Thanks a lot,

--

Francesco Gallo

XiNet S.r.L.

gallo (at) xinet (dot) it

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.redhat.com/archives/linux-cluster/attachments/20090114/e827f
01f/attachment.html

------------------------------

Message: 2
Date: Wed, 14 Jan 2009 13:48:58 -0600
From: "Paras pradhan" <pradhanparas at gmail.com>
Subject: Re: [Linux-cluster] Re: Fencing test
To: "linux clustering" <linux-cluster at redhat.com>
Message-ID:
	<8b711df40901141148k740cf738ha9e43f0222b0a4ce at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Thu, Jan 8, 2009 at 10:57 PM, Rajagopal Swaminathan
<raju.rajsand at gmail.com> wrote:
> Greetings,
>
> On Fri, Jan 9, 2009 at 12:09 AM, Paras pradhan
<pradhanparas at gmail.com> wrote:
>>
>>
>> In an act to solve my fencing issue in my 2 node cluster, i tried to
>> run fence_ipmi to check if fencing is working or not. I need to know
>> what is my problem
>>
>> -
>> [root at ha1lx ~]# fence_ipmilan -a 10.42.21.28 -o off -l admin -p admin
>> Powering off machine @ IPMI:10.42.21.28...ipmilan: Failed to connect
>> after 30 seconds
>> Failed
>> [root at ha1lx ~]#
>> ---------------
>>
>>
>> Here 10.42.21.28 is an IP address assigned to IPMI interface and I am
>> running this command in the same host.
>>
>
> Sorry couldn't respond earlier as I do this on personal time (which as
> useual limited for us IT guys and gals ;-) ) and not during work per
> se..
>
> Do not run fence script from the node that you want to fence.
>
> Let us say you want to fence node 3.
> 1. Try pinging the node 3's IPMI from node 4. I should be successful
> 2. Issue the fence command from Node 4 with IP of Node 3 IPMI as
argument .
>
>
> HTH
>
> With warm regards
>
> Rajagopal
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

Yes as you said, I am able to power down node4 using node3, so it
seems ipmi is working fine. But I dunno what is going on with my two
node cluster. Can a red hat cluster operates fine in a two nodes mode?
Do i need qdisk or it is optional. Which area do i need to focus to
run my 2 nodes red hat cluster using ipmi as fencing device.

Thanks
Paras.



------------------------------

Message: 3
Date: Thu, 15 Jan 2009 10:49:11 +0100
From: denis <denisb+gmane at gmail.com>
Subject: [Linux-cluster] Documentation enhancement requests/patches?
To: linux-cluster at redhat.com
Message-ID: <gkn0qn$n57$1 at ger.gmane.org>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

After getting to know the "Configuring and Managing a Red Hat Cluster"
documentation [1] fairly well, I have a few enhancement suggestions.
What is the best way to submit these?

[1]
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Clust
er_Administration/

Regards
--
Denis Braekhus



------------------------------

Message: 4
Date: Thu, 15 Jan 2009 10:52:00 +0100
From: denis <denisb+gmane at gmail.com>
Subject: [Linux-cluster] Re: List Cluster Resources
To: linux-cluster at redhat.com
Message-ID: <gkn100$n57$2 at ger.gmane.org>
Content-Type: text/plain; charset=ISO-8859-1

Chaitanya Kulkarni wrote:
> Hi All,
> 
> I am new to the RHEL Clusters. Is there any way, (other than the
> cluster.conf file) using which we can view / list all the Cluster
> Resources that are used under the Cluster Service (Resource Group)?
Some
> command which might give some output as -
> 
> Service Name = Service1
> 
> Resources -
> IP Address = <Value>
> File System = <Value>
> Script = <Value>

Hi Chaitanya,

I recently discovered the rg_test tool, it might be of help to you. It
does currently not have a man page, but check the "Configuring and
Managing a Red Hat Cluster" chapter "Debugging and Testing Services and
Resource Ordering" [1] for usage.

Hope this is of some help to you.

[1]
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Clust
er_Administration/s1-clust-rsc-testing-config-CA.html

Regards
-- 
Denis Braekhus



------------------------------

Message: 5
Date: Thu, 15 Jan 2009 12:00:03 +0100
From: denis <denisb+gmane at gmail.com>
Subject: [Linux-cluster] "Simple" Managed NFS setup
To: linux-cluster at redhat.com
Message-ID: <gkn4vj$5jp$1 at ger.gmane.org>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I have begun a setup with a pretty simple 3-node cluster and a couple of
services. One of these is NFS, and I have setup the basics as laid out
in the included cluster.conf below.

A couple of questions :

1. Do I need to keep the nfs-state information on the NFS_homes volume
so as to keep it in sync between clusternodes?

2. The nfsclient name="nfs" is added to enable the current NFS serving
node to mount its own export, otherwise I got

Jan 15 11:41:03 node03 mountd[14229]: mount request from unknown host
XX.XX.XX.174 for /mnt/nfshome (/mnt/nfshome)

This is obviously caused by the mount connecting as the NFS service
address instead of the hostaddress, what is the best way to resolve
this? Mounting with the serviceaddress is not a good solution it seems,
as failing the service over is problematic when that address is in use
locally.

3. I read "The Red Hat Cluster Suite NFS Cookbook" [1], as the reference
Red Hat documentation was a bit thin regarding best practices. Is there
more documentation available to read?


Any tips/pointers/help highly appreciated.


<rm>
        <failoverdomains>
                <failoverdomain name="failover_nfshome" ordered="1"
restricted="1">
                        <failoverdomainnode name="node01.domain"
priority="30"/>
                        <failoverdomainnode name="node02.domain"
priority="30"/>
                        <failoverdomainnode name="node03.domain"
priority="10"/>
                </failoverdomain>
        </failoverdomains>
        <resources>
                <ip address="XX.XX.XX.174" monitor_link="1"/>
                <nfsexport name="NFShome"/>
                <fs device="/dev/mapper/NFS_homes" fsid="2"
force_fsck="1" force_unmount="1" fstype="ext3" mountpoint="/mnt/nfshome"
name="nfs_homes" self_fence="0"/>
                <nfsclient name="node01" options="rw"
target="node01.domain"/>
                <nfsclient name="node02" options="rw"
target="node02.domain"/>
                <nfsclient name="node03" options="rw"
target="node03.domain"/>
                <nfsclient name="nfs" options="rw" target="nfs.domain"/>
        </resources>
        <service autostart="0" domain="failover_nfshome" exclusive="0"
name="client_nfshome" recovery="restart">
                <ip ref="XX.XX.XX.174"/>
                <fs ref="nfs_homes">
                        <nfsexport name="nfshome">
                                <nfsclient ref="node01"/>
                                <nfsclient ref="node02"/>
                                <nfsclient ref="node03"/>
				<nfsclient ref="nfs"/>
                        </nfsexport>
                </fs>
        </service>
</rm>


[1] http://sources.redhat.com/cluster/doc/nfscookbook.pdf

Best Regards
-- 
Denis Braekhus



------------------------------

Message: 6
Date: Thu, 15 Jan 2009 13:48:42 +0100
From: "Alain.Moulle" <Alain.Moulle at bull.net>
Subject: [Linux-cluster] cman-2.0.98-1.el5 / question about a problem
	when	launching cman
To: linux-cluster at redhat.com
Message-ID: <496F30AA.3000106 at bull.net>
Content-Type: text/plain; charset="iso-8859-1"

Hi ,
About this problem, I wonder if it is a definitive behavior considered 
as normal ?
or if this will work differently in a next release of cman or openais ?
(in previous versions with cman-2.0.73, we did not had this problem)
Thanks if someone could give an answer...
Regards,
Alain
> Release : cman-2.0.98-1.el5
> (but same problem with 2.0.95)
>
> I face a problem when launching cman on a two-node cluster :
>
> 1. Launching cman on node 1 : OK
> 2. When launching cman on node 2, the log on node1 gives :
>     cman killed by node 2 because we rejoined the cluster without a
full 
> restart
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.redhat.com/archives/linux-cluster/attachments/20090115/dd62c
325/attachment.html

------------------------------

Message: 7
Date: Thu, 15 Jan 2009 08:59:56 -0500 (EST)
From: Bob Peterson <rpeterso at redhat.com>
Subject: Re: [Linux-cluster] Documentation enhancement
	requests/patches?
To: linux clustering <linux-cluster at redhat.com>
Message-ID:
	
<40063497.1429461232027996695.JavaMail.root at zmail02.collab.prod.int.phx2
.redhat.com>
	
Content-Type: text/plain; charset=utf-8

----- "denis" <denisb+gmane at gmail.com> wrote:
| Hi,
| 
| After getting to know the "Configuring and Managing a Red Hat
| Cluster"
| documentation [1] fairly well, I have a few enhancement suggestions.
| What is the best way to submit these?
| 
| [1]
|
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Clust
er_Administration/
| 
| Regards
| --
| Denis Braekhus

Hi Denis,

Probably the best way to do this is to open a new bugzilla record
against product Red Hat Enterprise Linux 5, component
"Documentation--cluster"

If you have permission to look at it, you can follow this example:
https://bugzilla.redhat.com/show_bug.cgi?id=471364

You can assign it to slevine at redhat.com or pkennedy at redhat.com.

Regards,

Bob Peterson
Red Hat GFS



------------------------------

Message: 8
Date: Thu, 15 Jan 2009 15:01:17 +0000
From: Chrissie Caulfield <ccaulfie at redhat.com>
Subject: Re: [Linux-cluster] cman-2.0.98-1.el5 / question about a
	problem	when	launching cman
To: linux clustering <linux-cluster at redhat.com>
Message-ID: <496F4FBD.4040607 at redhat.com>
Content-Type: text/plain; charset=ISO-8859-1

Alain.Moulle wrote:
> Hi ,
> About this problem, I wonder if it is a definitive behavior considered
> as normal ?
> or if this will work differently in a next release of cman or openais
?
> (in previous versions with cman-2.0.73, we did not had this problem)
> Thanks if someone could give an answer...
> Regards,
> Alain
>> Release : cman-2.0.98-1.el5
>> (but same problem with 2.0.95)
>>
>> I face a problem when launching cman on a two-node cluster :
>>
>> 1. Launching cman on node 1 : OK
>> 2. When launching cman on node 2, the log on node1 gives :
>>     cman killed by node 2 because we rejoined the cluster without a
full 
>> restart
> 

Alain,

I'm sure this question has been answer many times on IRC and on the
mailing list, as well as in the FAQ.


Chrissie



------------------------------

Message: 9
Date: Thu, 15 Jan 2009 15:05:33 +0000
From: Chrissie Caulfield <ccaulfie at redhat.com>
Subject: [Linux-cluster] Re: [Openais] cman in RHEL 5 cluster suite
	and	Openais
To: unleashing_vivek007 at yahoo.co.in
Cc: linux clustering <linux-cluster at redhat.com>
Message-ID: <496F50BD.4020409 at redhat.com>
Content-Type: text/plain; charset=UTF-8

Vivek Purohit wrote:
> Hi Steve,
> Thanks for the previous reply.
> 
> I was able to run the checkpointing tests in the tarball Openais
> on RHEL 5.
> 
> I explored and came to know that the CMAN service of RHEL 5's
> clustersuite runs as aisexec; thus the tests could be run directly.
> 
> Can you please tell how the Openais is being used by RHEL 5's
> CMAN service.
> 

Hi,

You might like to read these two documents:

http://people.redhat.com/ccaulfie/docs/aiscman.pdf
http://people.redhat.com/ccaulfie/docs/CSNetworking.pdf


-- 

Chrissie



------------------------------

Message: 10
Date: Thu, 15 Jan 2009 09:08:08 -0700
From: Gary Romo <garromo at us.ibm.com>
Subject: [Linux-cluster] GFS/clvmd question
To: linux-cluster at redhat.com
Message-ID:
	
<OF48AE2273.1EEE35B0-ON8725753F.0057EC20-8725753F.0058A29F at us.ibm.com>
Content-Type: text/plain; charset="us-ascii"


Why can't I mount my gfs logical volume on the second node in  the
cluster?
I am creating a new GFS file system on an existing cluster.  Here is
what I
did;

1.  I determined I had space in an existing volume group (both nodes)
2.  I created my logical volume (node 1)
3.  I ran my gfs_mkfs (node 1)
4.  I mounted my new lv on node 1 only

Here is the error I get on node 2

# mount /gfs/new_mount
/sbin/mount.gfs: invalid device path "/dev/vggfs/new_lv"

I see that the logical volume is "inactive" on node2 and "ACTIVE" on
node 1

inactive          '/dev/vgclgfs/new_lv' [25.00 GB] inherit

ACTIVE            '/dev/vgclgfs/new_lv' [25.00 GB] inherit

What do I need to do in order to make this logical volume active on node
2 ?
I thought that this would have happened automagically via clvmd, and not
have to be done manually.


Gary Romo
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.redhat.com/archives/linux-cluster/attachments/20090115/1be5b
3b7/attachment.html

------------------------------

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 57, Issue 14
*********************************************



From ramiblanco at gmail.com  Fri Jan 16 21:12:06 2009
From: ramiblanco at gmail.com (Ramiro Blanco)
Date: Fri, 16 Jan 2009 19:12:06 -0200
Subject: [Linux-cluster] GFS2 tuning recommendations on RHEL 5.3
In-Reply-To: <1231880424.3580.14.camel@localhost.localdomain>
References: <1603819793.647941231872034753.JavaMail.root@zmail01.collab.prod.int.phx2.redhat.com>
	<1231875278.3580.8.camel@localhost.localdomain>
	<713aecdf0901131244r1c1d1c53y6b71b71420fd5152@mail.gmail.com>
	<1231880424.3580.14.camel@localhost.localdomain>
Message-ID: <713aecdf0901161312g2b15177et9057069671fc5ff0@mail.gmail.com>

2009/1/13 Steven Whitehouse <swhiteho at redhat.com>

> Ideally you want to arrange the application so that you are not pushing
> the cache from node to node too often. So it depends on the application
> rather than the filesystem.

All my nodes are all accesing the same data, as they are serving a moodle
platform and a squirrelmail, so they _read_  all the same (i guess this is
not an issue because they only read).

The classic example is running a mail server
> with lots of small files in the same directory, and the solution is to
> have a number of separate directories. The issue in that case is that
> creating and deleting files requires exclusive access to the directory
> in which the files are being created and deleted and thus the
> application has to lay out its files such that all the nodes are not all
> trying to do that in just one single directory at once.

So, you mean that when a node wants to write to a file locks the whole dir?
In that case i would have a problem there because moodle saves lots
(eventually thousands) of session files on a single dir

It can make a huge difference to performance, and its not something
> which can really be fixed at a filesystem level,
>
I guess i would have to split that session dir in several dirs if that's the
case.

Thank you,


-- 
Ramiro Blanco
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090116/585c1175/attachment.htm>

From jumanjiman at gmail.com  Fri Jan 16 22:41:07 2009
From: jumanjiman at gmail.com (Paul Morgan)
Date: Fri, 16 Jan 2009 17:41:07 -0500
Subject: [Linux-cluster] GFS2 tuning recommendations on RHEL 5.3
In-Reply-To: <713aecdf0901161312g2b15177et9057069671fc5ff0@mail.gmail.com>
References: <1603819793.647941231872034753.JavaMail.root@zmail01.collab.prod.int.phx2.redhat.com>
	<1231875278.3580.8.camel@localhost.localdomain>
	<713aecdf0901131244r1c1d1c53y6b71b71420fd5152@mail.gmail.com>
	<1231880424.3580.14.camel@localhost.localdomain>
	<713aecdf0901161312g2b15177et9057069671fc5ff0@mail.gmail.com>
Message-ID: <553CCB42-FE60-4F04-947B-30F3AC99AC40@gmail.com>



On Jan 16, 2009, at 16:12, "Ramiro Blanco" <ramiblanco at gmail.com> wrote:

> 2009/1/13 Steven
>
> The classic example is running a mail server
> with lots of small files in the same directory, and the solution is to
> have a number of separate directories. The issue in that case is that
> creating and deleting files requires exclusive access to the directory
> in which the files are being created and deleted and thus the
> application has to lay out its files such that all the nodes are not  
> all
> trying to do that in just one single directory at once.
> So, you mean that when a node wants to write to a file locks the  
> whole dir? In that case i would have a problem there because moodle  
> saves lots (eventually thousands) of session files on a single dir
>
> It can make a huge difference to performance, and its not something
> which can really be fixed at a filesystem level,
> I guess i would have to split that session dir in several dirs if  
> that's the case.

I think Steven's referring to the creation of many small files in a dir.
The issue at play is that the node must grab a write lock on the  
directory inode in order to add a dentry.

The same thing applies to other dentry changes. If you're only  
modifying inodes, and not the dentries themselves, you can focus your  
tuning efforts elsewhere.

Hth,
-paul



From jurgen.knodlseder at cesr.fr  Sun Jan 18 21:29:48 2009
From: jurgen.knodlseder at cesr.fr (=?ISO-8859-1?Q?J=FCrgen_Kn=F6dlseder?=)
Date: Sun, 18 Jan 2009 22:29:48 +0100
Subject: [Linux-cluster] gfs/nfs trouble
Message-ID: <2423F289-058B-47C9-879B-93858431B09D@cesr.fr>

Hi all,

I have trouble exporting a GFS over NFS. Here the situation:

I have 2 PE1950 with a MD3000 SAN attached to build a failover  
cluster. Both machines run
kernel 2.6.27.7 and cluster-2.03.10. I configured several GFS on the  
storage device that I would
like to export via NFS to a rather old PE2800 which runs kernel  
2.4.26 (I have to stick to this
kernel on this machine since it runs openmosix).

When I copy on the PE2800 a small file (1858 Bytes) everything works  
fine. When I try to copy
a slightly larger file (e.g. 17318 Bytes) I get the error

cp: closing `test2.png': Invalid argument

and I have an empty file 'test2.png' on the disk ...

I've seen a similar bug on bugzilla (https://bugzilla.redhat.com/ 
show_bug.cgi?id=432544)
yet this should be fixed in my configuration ...

Can somebody help?

J?rgen



From jumanjiman at gmail.com  Mon Jan 19 02:29:55 2009
From: jumanjiman at gmail.com (jumanjiman at gmail.com)
Date: Mon, 19 Jan 2009 02:29:55 +0000
Subject: [Linux-cluster] gfs/nfs trouble
In-Reply-To: <2423F289-058B-47C9-879B-93858431B09D@cesr.fr>
References: <2423F289-058B-47C9-879B-93858431B09D@cesr.fr>
Message-ID: <1787093389-1232332208-cardhu_decombobulator_blackberry.rim.net-1197770890-@bxe252.bisx.prod.on.blackberry>

Try running 'strace -o /tmp/errs cp /path/to(test2.png  /path/to/destination/' and then inpect the temp file.

-paul

Sent via BlackBerry by AT&T

-----Original Message-----
From: "J?rgen Kn?dlseder" <jurgen.knodlseder at cesr.fr>

Date: Sun, 18 Jan 2009 22:29:48 
To: linux clustering<linux-cluster at redhat.com>
Subject: [Linux-cluster] gfs/nfs trouble


Hi all,

I have trouble exporting a GFS over NFS. Here the situation:

I have 2 PE1950 with a MD3000 SAN attached to build a failover  
cluster. Both machines run
kernel 2.6.27.7 and cluster-2.03.10. I configured several GFS on the  
storage device that I would
like to export via NFS to a rather old PE2800 which runs kernel  
2.4.26 (I have to stick to this
kernel on this machine since it runs openmosix).

When I copy on the PE2800 a small file (1858 Bytes) everything works  
fine. When I try to copy
a slightly larger file (e.g. 17318 Bytes) I get the error

cp: closing `test2.png': Invalid argument

and I have an empty file 'test2.png' on the disk ...

I've seen a similar bug on bugzilla (https://bugzilla.redhat.com/ 
show_bug.cgi?id=432544)
yet this should be fixed in my configuration ...

Can somebody help?

J?rgen

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From swhiteho at redhat.com  Mon Jan 19 08:25:20 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Mon, 19 Jan 2009 08:25:20 +0000
Subject: [Linux-cluster] GFS2 tuning recommendations on RHEL 5.3
In-Reply-To: <713aecdf0901161312g2b15177et9057069671fc5ff0@mail.gmail.com>
References: <1603819793.647941231872034753.JavaMail.root@zmail01.collab.prod.int.phx2.redhat.com>
	<1231875278.3580.8.camel@localhost.localdomain>
	<713aecdf0901131244r1c1d1c53y6b71b71420fd5152@mail.gmail.com>
	<1231880424.3580.14.camel@localhost.localdomain>
	<713aecdf0901161312g2b15177et9057069671fc5ff0@mail.gmail.com>
Message-ID: <1232353520.9571.592.camel@quoit>

On Fri, 2009-01-16 at 19:12 -0200, Ramiro Blanco wrote:
> 2009/1/13 Steven Whitehouse <swhiteho at redhat.com>
>         Ideally you want to arrange the application so that you are
>         not pushing
>         the cache from node to node too often. So it depends on the
>         application
>         rather than the filesystem.
> All my nodes are all accesing the same data, as they are serving a
> moodle platform and a squirrelmail, so they _read_  all the same (i
> guess this is not an issue because they only read). 
> 
Yes, in fact thats pretty much the best possible case.

> 
>         The classic example is running a mail server
>         with lots of small files in the same directory, and the
>         solution is to
>         have a number of separate directories. The issue in that case
>         is that
>         creating and deleting files requires exclusive access to the
>         directory
>         in which the files are being created and deleted and thus the
>         application has to lay out its files such that all the nodes
>         are not all
>         trying to do that in just one single directory at once.
> So, you mean that when a node wants to write to a file locks the whole
> dir? In that case i would have a problem there because moodle saves
> lots (eventually thousands) of session files on a single dir
> 
Yes. Ideally you'd want to break up that dir into smaller ones. You
might still be ok depending on the rate at which the files are being
written, but be aware that this might be slow.

> 
>         It can make a huge difference to performance, and its not
>         something
>         which can really be fixed at a filesystem level,
> I guess i would have to split that session dir in several dirs if
> that's the case.
> 
> Thank you,
> 
Yes, thats the best thing to do if the application allows that,

Steve.

> 
> -- 
> Ramiro Blanco
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From jurgen.knodlseder at cesr.fr  Mon Jan 19 08:46:52 2009
From: jurgen.knodlseder at cesr.fr (=?ISO-8859-1?Q?J=FCrgen_Kn=F6dlseder?=)
Date: Mon, 19 Jan 2009 09:46:52 +0100
Subject: [Linux-cluster] gfs/nfs trouble
In-Reply-To: <1787093389-1232332208-cardhu_decombobulator_blackberry.rim.net-1197770890-@bxe252.bisx.prod.on.blackberry>
References: <2423F289-058B-47C9-879B-93858431B09D@cesr.fr>
	<1787093389-1232332208-cardhu_decombobulator_blackberry.rim.net-1197770890-@bxe252.bisx.prod.on.blackberry>
Message-ID: <11BB808C-9F27-4528-9E60-74CFBBA8E215@cesr.fr>

Did it. Here the results:

lstat64("/home2/knodlseder/test2.png", {st_mode=S_IFREG|0644,  
st_size=0, ...}) = 0
stat64("/home2/knodlseder/test2.png", {st_mode=S_IFREG|0644,  
st_size=0, ...}) = 0
stat64("/home2/knodlseder/test.png", {st_mode=S_IFREG|0644,  
st_size=17318, ...}) = 0
stat64("/home2/knodlseder/test2.png", {st_mode=S_IFREG|0644,  
st_size=0, ...}) = 0
open("/home2/knodlseder/test.png", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
open("/home2/knodlseder/test2.png", O_WRONLY|O_TRUNC|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
read(3, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
\232"..., 32768) = 17318
write(4, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
\232"..., 17318) = 17318
read(3, "", 32768)                      = 0
close(4)                                = -1 EINVAL (Invalid argument)

Any clues ?

J?rgen

Le 19 janv. 09 ? 03:29, jumanjiman at gmail.com a ?crit :

> Try running 'strace -o /tmp/errs cp /path/to(test2.png  /path/to/ 
> destination/' and then inpect the temp file.
>
> -paul
>
> Sent via BlackBerry by AT&T
>
> -----Original Message-----
> From: "J?rgen Kn?dlseder" <jurgen.knodlseder at cesr.fr>
>
> Date: Sun, 18 Jan 2009 22:29:48
> To: linux clustering<linux-cluster at redhat.com>
> Subject: [Linux-cluster] gfs/nfs trouble
>
>
> Hi all,
>
> I have trouble exporting a GFS over NFS. Here the situation:
>
> I have 2 PE1950 with a MD3000 SAN attached to build a failover
> cluster. Both machines run
> kernel 2.6.27.7 and cluster-2.03.10. I configured several GFS on the
> storage device that I would
> like to export via NFS to a rather old PE2800 which runs kernel
> 2.4.26 (I have to stick to this
> kernel on this machine since it runs openmosix).
>
> When I copy on the PE2800 a small file (1858 Bytes) everything works
> fine. When I try to copy
> a slightly larger file (e.g. 17318 Bytes) I get the error
>
> cp: closing `test2.png': Invalid argument
>
> and I have an empty file 'test2.png' on the disk ...
>
> I've seen a similar bug on bugzilla (https://bugzilla.redhat.com/
> show_bug.cgi?id=432544)
> yet this should be fixed in my configuration ...
>
> Can somebody help?
>
> J?rgen
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From Gary_Hunt at gallup.com  Mon Jan 19 20:19:02 2009
From: Gary_Hunt at gallup.com (Hunt, Gary)
Date: Mon, 19 Jan 2009 14:19:02 -0600
Subject: [Linux-cluster] Virtual storage
Message-ID: <B0176B19AD215F4DA7E9EAC74EF4B0D6041F2829B1@EXCHNG5.noam.gallup.com>

Hello

I have a 2 node cluster running 2 virtual machines.  Each machine has additional storage mounted to it for data storage.  I recently tried to resize the mounted storage.   I added more disks using vgextend and expanded the logical volumes using lvextend.  How do I get the virtual server to see that the size has changed without restarting it?  If I restart the virtual server, it sees the new size just fine and am able to extend the fs online.  Just wondering if it can all be done without a virtual server restart.


Thanks

Gary
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090119/ab5a3af3/attachment.htm>

From ghe.rivero at gmail.com  Tue Jan 20 09:28:19 2009
From: ghe.rivero at gmail.com (Ghe Rivero)
Date: Tue, 20 Jan 2009 10:28:19 +0100
Subject: [Linux-cluster] stop responding rgmanager
Message-ID: <4c4c71e0901200128t12a4a53am7fa04150e2649ede@mail.gmail.com>

Hi everyone,
    i've been fighting the last days with a 2-node cluster, but finally i
quit.
I'm having problems with the clurgmgrd daemon. It stop responding  when i
restart the cluster (just the cluster, not the services or the nodes) and
become unkillable. The only way to revert this situation it's restarting the
nodes but as you can imagine that's not a solution.

I'm using conga to config it. Any ideas?

Ghe Rivero
<?xml version="1.0"?>
<cluster alias="AAA" config_version="14" name="AAA">
        <quorumd interval="3" label="quorumlnx"
status_file="/tmp/qdisk-status" tko="23" votes="1"/>
        <cman deadnode_timeout="135" expected_nodes="3"/>
        <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="node1.fqdn" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="iLO-node1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node2.fqdn" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="iLO-node2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="3" two_node="0"/>
        <fencedevices>
                <fencedevice agent="fence_ilo" hostname="10.110.65.6"
login="login" name="iLO-node1" passwd="Y"/>
                <fencedevice agent="fence_ilo" hostname="10.110.65.7"
login="login" name="iLO-node2" passwd="Y"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="Web" ordered="1"
restricted="1">
                                <failoverdomainnode name="node1.fqdn"
priority="1"/>
                                <failoverdomainnode name="node2.fqdn"
priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <script file="/etc/init.d/httpd" name="Apache"/>
                        <ip address="10.110.65.30" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="Web" exclusive="1"
name="Web">
                        <script ref="Apache"/>
                </service>
        </rm>
</cluster>




-- 
.''`.  Pienso, Luego Incordio
: :' :
`. `'
 `-    www.debian.org    www.hispalinux.es

GPG Key: 26F020F7
GPG fingerprint: 4986 39DA D152 050B 4699  9A71 66DB 5A36 26F0 20F7
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090120/95dd7674/attachment.htm>

From nick at javacat.f2s.com  Tue Jan 20 10:19:21 2009
From: nick at javacat.f2s.com (nick at javacat.f2s.com)
Date: Tue, 20 Jan 2009 10:19:21 +0000
Subject: [Linux-cluster] Directories with >100K files
Message-ID: <1232446761.4975a5293c53c@webmail.freedom2surf.net>

Hi all,

cman-2.0.84-2.el5
gfs2-utils-0.1.44-1.el5_2.1
gfs-utils-0.1.17-1.el5
kmod-gfs-0.1.23-5.el5
kmod-gfs2-1.92-1.1.el5
kmod-gfs2-PAE-1.92-1.1.el5
kmod-gfs-PAE-0.1.23-5.el5
openais-0.80.3-15.el5
rgmanager-2.0.38-2.el5

Red Hat Enterprise Linux Server release 5.2 (Tikanga)
kernel 2.6.18-92.1.10.el5PAE

We are running the standard GFS1 that came with RedHat 5.2.

We have a GFS filesystem mounted over iSCSI. When doing an 'ls' on directories with
several thousand files it takes around 10 minutes to get a response back -

# time ls
real    8m58.704s
user    0m1.369s
sys     1m32.641s

# ls | wc -l
120359

We only have this issue on directories with thousands of files in.

Can anyone recommend any GFS tunables to help us out here ?
Should we set statfs_fast to 1 ?
What about glock_purge ?

Here is the fstab entry for the GFS filesystem:
/dev/vggfs/lvol00       /apps                   gfs     _netdev         1 2

Here is gfs_tool df /apps
/apps:
  SB lock proto = "lock_dlm"
  SB lock table = "TEST:GFS1"
  SB ondisk format = 1309
  SB multihost format = 1401
  Block size = 4096
  Journals = 4
  Resource Groups = 3342
  Mounted lock proto = "lock_dlm"
  Mounted lock table = "TEST:GFS1"
  Mounted host data = "jid=1:id=65537:first=0"
  Journal number = 1
  Lock module flags = 0
  Local flocks = FALSE
  Local caching = FALSE
  Oopses OK = FALSE

  Type           Total          Used           Free           use%
  ------------------------------------------------------------------------
  inodes         5159283        5159283        0              100%
  metadata       5162977        165010         4997967        3%
  data           208673972      62025003       146648969      30%


Here is the output of gfs_tool gettune /apps | sort

atime_quantum = 3600
complain_secs = 10
demote_secs = 300
depend_secs = 60
entries_per_readdir = 32
glock_purge = 0
greedy_default = 100
greedy_max = 250
greedy_quantum = 25
ilimit1 = 100
ilimit1_min = 1
ilimit1_tries = 3
ilimit2 = 500
ilimit2_min = 3
ilimit2_tries = 10
incore_log_blocks = 1024
inoded_secs = 15
jindex_refresh_secs = 60
lockdump_size = 131072
logd_secs = 1
max_atomic_write = 4194304
max_mhc = 10000
max_readahead = 262144
new_files_directio = 0
new_files_jdata = 0
prefetch_secs = 10
quota_account = 1
quotad_secs = 5
quota_enforce = 1
quota_quantum = 60
quota_scale = 1.0000   (1, 1)
quota_simul_sync = 64
quota_warn_period = 10
reclaim_limit = 5000
recoverd_secs = 60
rgrp_try_threshold = 100
scand_secs = 5
stall_secs = 600
statfs_fast = 0
statfs_slots = 64

Any help appreciated

Regards,
Nick.














From wferi at niif.hu  Tue Jan 20 10:25:47 2009
From: wferi at niif.hu (Ferenc Wagner)
Date: Tue, 20 Jan 2009 11:25:47 +0100
Subject: [Linux-cluster] Virtual storage
In-Reply-To: <B0176B19AD215F4DA7E9EAC74EF4B0D6041F2829B1@EXCHNG5.noam.gallup.com>
	(Gary Hunt's message of "Mon, 19 Jan 2009 14:19:02 -0600")
References: <B0176B19AD215F4DA7E9EAC74EF4B0D6041F2829B1@EXCHNG5.noam.gallup.com>
Message-ID: <87iqoatgt0.fsf@tac.ki.iif.hu>

"Hunt, Gary" <Gary_Hunt at gallup.com> writes:

> I have a 2 node cluster running 2 virtual machines.  Each machine has
> additional storage mounted to it for data storage.  I recently tried
> to resize the mounted storage.  I added more disks using vgextend and
> expanded the logical volumes using lvextend.  How do I get the virtual
> server to see that the size has changed without restarting it?  If I
> restart the virtual server, it sees the new size just fine and am able
> to extend the fs online.  Just wondering if it can all be done without
> a virtual server restart.

It's a feature request for Xen 3.4, as far as I know.
-- 
Feri.



From stewart at epits.com.au  Tue Jan 20 11:42:48 2009
From: stewart at epits.com.au (Stewart Walters)
Date: Tue, 20 Jan 2009 20:42:48 +0900
Subject: [Linux-cluster] stop responding rgmanager
In-Reply-To: <4c4c71e0901200128t12a4a53am7fa04150e2649ede@mail.gmail.com>
References: <4c4c71e0901200128t12a4a53am7fa04150e2649ede@mail.gmail.com>
Message-ID: <4975B8B8.5070903@epits.com.au>

Ghe Rivero wrote:
> Hi everyone,
>     i've been fighting the last days with a 2-node cluster, but 
> finally i quit.
> I'm having problems with the clurgmgrd daemon. It stop responding  
> when i restart the cluster (just the cluster, not the services or the 
> nodes) and become unkillable. The only way to revert this situation 
> it's restarting the nodes but as you can imagine that's not a solution.
>
> I'm using conga to config it. Any ideas?
>
> Ghe Rivero
> <?xml version="1.0"?>
> <cluster alias="AAA" config_version="14" name="AAA">
>         <quorumd interval="3" label="quorumlnx" 
> status_file="/tmp/qdisk-status" tko="23" votes="1"/>
>         <cman deadnode_timeout="135" expected_nodes="3"/>
>         <fence_daemon clean_start="0" post_fail_delay="0" 
> post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="node1.fqdn" nodeid="1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="iLO-node1"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="node2.fqdn" nodeid="2" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="iLO-node2"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="3" two_node="0"/>
>         <fencedevices>
>                 <fencedevice agent="fence_ilo" hostname="10.110.65.6" 
> login="login" name="iLO-node1" passwd="Y"/>
>                 <fencedevice agent="fence_ilo" hostname="10.110.65.7" 
> login="login" name="iLO-node2" passwd="Y"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="Web" ordered="1" 
> restricted="1">
>                                 <failoverdomainnode name="node1.fqdn" 
> priority="1"/>
>                                 <failoverdomainnode name="node2.fqdn" 
> priority="2"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources>
>                         <script file="/etc/init.d/httpd" name="Apache"/>
>                         <ip address="10.110.65.30" monitor_link="1"/>
>                 </resources>
>                 <service autostart="1" domain="Web" exclusive="1" 
> name="Web">
>                         <script ref="Apache"/>
>                 </service>
>         </rm>
> </cluster>
>
>
>
>
>
> -- 
> .''`.  Pienso, Luego Incordio  
> : :' :  
> `. `'  
>  `-    www.debian.org <http://www.debian.org>    www.hispalinux.es 
> <http://www.hispalinux.es>
>
> GPG Key: 26F020F7
> GPG fingerprint: 4986 39DA D152 050B 4699  9A71 66DB 5A36 26F0 20F7
> ------------------------------------------------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Your cluster.conf looks a little out of whack for a 2 node cluster.  It 
looks as if it's designed for a 3 node cluster, but you've only defined 
two nodes.  This will get you in to trouble (I know from experience) :-)

You've got duplicate cman entries which do not look right (although I'm 
pretty new to RHCS myself so I wouldn't consider me an authority on the 
matter).  See <cman deadnode_timeout="135" expected_nodes="3"/> and 
<cman expected_votes="3" two_node="0"/>.

I would have thought that should be in a combined cman directive such as 
<cman deadnode_timeout=135 expected_votes="2" two_node="1"/>.  The 
expected votes would be 2, because in the event of split brain you'll 
want 1 node + quorum disk to remain a Quorate Cluster.

In my cluster.conf <cman> is defined after the </clusternodes>.  I'm not 
sure if it makes a difference, but I would suggest removing the top most 
cman directive and merge it's parameters in to the bottom directive.

Also, do you need a quorum disk?  A two node cluster can have but does 
not need one to operate.

If you don't, expected_votes=1.

See how you go.

Regards,

Stewart



From jruemker at redhat.com  Tue Jan 20 17:14:20 2009
From: jruemker at redhat.com (John Ruemker)
Date: Tue, 20 Jan 2009 12:14:20 -0500
Subject: [Linux-cluster] Create Logical Volume
In-Reply-To: <008201c97676$05c36600$114a3200$@it>
References: <008201c97676$05c36600$114a3200$@it>
Message-ID: <4976066C.3000803@redhat.com>

Cluster Management wrote:
>
> Hi all,
>
>  
>
> i have a two_node cluster RHEL 5 and an external ISCSI storage. I use 
> XEN for virtualizzation purpose. When i create a new LUN in my storage 
> i use hot_add command to discover it from nodes.
>
> The problem is that i have to restart clvmd to be able to create a new 
> Logical Volume. This operation is very critical because i have to stop 
> or migrate each VM running on the node ed i have to umout their own LUN.
>
> Is there a way to update clvmd without restarting?
>

   # clvmd -R 
   # pvscan
   # vgscan
   # lvscan

>  
>
> Thanks a lot,
>
> --
>
> Francesco Gallo
>
> XiNet S.r.L.
>
> gallo (at) xinet (dot) it
>
>  
>
>  
>
> ------------------------------------------------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From cluster at xinet.it  Tue Jan 20 17:48:13 2009
From: cluster at xinet.it (Cluster Management)
Date: Tue, 20 Jan 2009 18:48:13 +0100
Subject: R: [Linux-cluster] Create Logical Volume
In-Reply-To: <4976066C.3000803@redhat.com>
References: <008201c97676$05c36600$114a3200$@it> <4976066C.3000803@redhat.com>
Message-ID: <00d101c97b27$445a4f40$cd0eedc0$@it>


-----Messaggio originale-----
Da: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] Per conto di John Ruemker
Inviato: marted? 20 gennaio 2009 18.14
A: linux clustering
Oggetto: Re: [Linux-cluster] Create Logical Volume

[...]
>> Is there a way to update clvmd without restarting?
>>
>
>   # clvmd -R 
>   # pvscan
>   # vgscan
>   # lvscan

Exactly what i need. 

Thanks a lot.

Francesco Gallo





From jallgood at ohl.com  Tue Jan 20 18:50:56 2009
From: jallgood at ohl.com (Allgood, John)
Date: Tue, 20 Jan 2009 12:50:56 -0600
Subject: [Linux-cluster] Problems with clustering
Message-ID: <82E499DEBAB95F4E91140984379FB1C62FFB87@NOC-ML-09.ohlogistics.com>

Hello all,

I am experiencing some problems relocating guests. I could not relocate
the guest squid to the node physical node xen01
Other guests would migrate to xen01 without issue. I rebooted the node
and now I can relocate my squid guest to xen01.
Now however I am finding that I can no longer relocate the squid guest
to xen00. The errors below are what I am seeing
in the messages log on xen00. 

We are on redhat 5.2 using cluster suite and the stock xen. Any ideas?


Jan 20 10:03:07 xen00 clurgmgrd[11135]: <warning> #68: Failed to start
vm:squid; return value: 1 
Jan 20 10:03:07 xen00 clurgmgrd[11135]: <notice> Stopping service
vm:squid 
Jan 20 10:03:07 xen00 kernel: xenbr0: port 7(vif9.1) entering disabled
state
Jan 20 10:03:07 xen00 kernel: device vif9.1 left promiscuous mode
Jan 20 10:03:07 xen00 kernel: xenbr0: port 7(vif9.1) entering disabled
state
Jan 20 10:03:13 xen00 clurgmgrd[11135]: <notice> Service vm:squid is
recovering 
Jan 20 10:03:13 xen00 clurgmgrd[11135]: <warning> #71: Relocating failed
service vm:squid 
Jan 20 10:03:15 xen00 clurgmgrd[11135]: <notice> Service vm:squid is now
running on member 3 



 

John Allgood

Senior Systems Administrator

Turbo, division of OHL

2251 Jesse Jewell Pky. NE

Gainesville, GA 30507

tel: (678) 989-3051  fax: (770) 531-7878

 

jallgood at ohl.com <mailto:jallgood at ohl.com> 

www.ohl.com <http://www.ohl.com> 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090120/3f035ec0/attachment.htm>

From pegasus at nerv.eu.org  Wed Jan 21 00:00:21 2009
From: pegasus at nerv.eu.org (Jure =?UTF-8?B?UGXEjWFy?=)
Date: Wed, 21 Jan 2009 01:00:21 +0100
Subject: [Linux-cluster] clvm mirroring in rhel5.3
Message-ID: <20090121010021.9d929a0f.pegasus@nerv.eu.org>


If I read the release notes for rhel5.3 right, then there's now an option
to have lvm mirrors in a clvm environment.

In a quest to have a horizontally scalable server pool where each machine
both provide its local disks to the pool and mounts the global filesystem
provided by the pool, I was thinking along these lines the other day:

* each machine has a small private root fs and exports the rest of the
disks via the storage block protocol of choice (iscsi/aoe/fcoe)
* each machine mounts exported block devices from all other machines
* while we can't do raid5 in lvm, we should be satisfied with mirroring;
clvm takes care of mirroring between pairs of these network block devices
* clvm also takes care of joining together these mirrors in one volume group
* clvm also takes care of logical volumes, carved out of this volume group
* gfs2 takes care of sharing these logical volumes on a fs level between
servers

This way a failure of a single server should not affect the working of the
whole group, while there'll always be a room for growth both on the cpu/ram
side and on the storage/iops side.

Opinions?

Probably in real world such setup would still have too many bottlenecks to
be useful and too many points of failure to be reliable ... But as I see
many uses of it, I'd like to see something like that work reliably & fast
one day.


-- 

Jure Pe?ar
http://jure.pecar.org/



From gordan at bobich.net  Wed Jan 21 01:34:00 2009
From: gordan at bobich.net (Gordan Bobic)
Date: Wed, 21 Jan 2009 01:34:00 +0000
Subject: [Linux-cluster] Bug Report: fence_drac
Message-ID: <49767B88.3000305@bobich.net>

Hi,

I believe I already posted a patch for this before, but the latest 
release rpm just clobbered my modified version with the same old broken 
verison. >:-(

fence_drac doesn't work for Dell DRAC ERA/O variant of DRAC III. The 
problem is on line 145. The match line is this:

if (/Dell Embedded Remote Access Controller \(ERA\)\nFirmware Version/m)

However, DRAC ERA/O (as per Dell PowerEdge 1650) reports in as:

"Dell Embedded Remote Access Controller (ERA/O)"...

That means the match line should be:

if (/Dell Embedded Remote Access Controller \(ERA\/O\)\nFirmware Version/m)

Gordan



From jeff.sturm at eprize.com  Wed Jan 21 03:32:01 2009
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Tue, 20 Jan 2009 22:32:01 -0500
Subject: [Linux-cluster] Directories with >100K files
In-Reply-To: <1232446761.4975a5293c53c@webmail.freedom2surf.net>
References: <1232446761.4975a5293c53c@webmail.freedom2surf.net>
Message-ID: <64D0546C5EBBD147B75DE133D798665F021B9C1F@hugo.eprize.local>

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of 
> nick at javacat.f2s.com
> Sent: Tuesday, January 20, 2009 5:19 AM
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] Directories with >100K files
> 
> We have a GFS filesystem mounted over iSCSI. When doing an 
> 'ls' on directories with several thousand files it takes 
> around 10 minutes to get a response back -

You don't say how many nodes you have, or anything about your
networking.

Some general pointers:

- A plain "ls" is probably much faster any variant that fetches inode
metatdata, e.g. "ls -l".  The latter performs a stat() on each
individual file which in turn triggers locking activity of some sort.
This is known to be slow on GFS1.  (I've heard reports that GFS2 is/will
be better.)

- You want a fast, reliable low-latency network for your cluster.  Intel
GigE cards and a fast switch are a good bet.

- Unless your application needs access times or quota support, mounting
with "noquota,noatime" is a good idea.  Maybe also "nodiratime".

> Can anyone recommend any GFS tunables to help us out here ?

You could try bumping demote_secs up from its default of 5 minutes.
That'll cause locks to be held longer so they may not need to be
reacquired so often.  It won't help with the initial directory listing,
but should help on subsequent invocations.

In your case, with "ls" taking 8 minutes to run, some locks initially
acuired during execution of the command have already been demoted once
complete.

> Should we set statfs_fast to 1 ?

Probably good to set this, regardless.

> What about glock_purge ?

Glock_purge helps limit CPU time consumed by gfs_scand when a large
number of unused glocks are present.  See
http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4
.  This may make your system run better but I'm not sure it's going to
help with listing your giant directories.

> Here is the fstab entry for the GFS filesystem:
> /dev/vggfs/lvol00       /apps                   gfs     
> _netdev         1 2

Try "noatime,noquota" here.

Jeff




From jeff.sturm at eprize.com  Wed Jan 21 03:47:39 2009
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Tue, 20 Jan 2009 22:47:39 -0500
Subject: [Linux-cluster] GFS withdraw and/or node I/O errors affect whole
	cluster?
Message-ID: <64D0546C5EBBD147B75DE133D798665F021B9C20@hugo.eprize.local>

Using a 14-node cluster on CentOS 5.2 with GFS1.
 
We've observed a problem in production that caused us to peform an
unplanned cluster restart.  We also reproduced similar behavior in a lab
environment.
 
If one node loses its connection to shared storage, it can no longer
perform any filesystem activity.  The GFS filesystem may decide to
withdraw.  That's expected.
 
The same node that withdraws does not get fenced.  Since the cluster
itself depends on networking and not storage, and cluster services other
than GFS may be active, that's not surprising.
 
When one node withdraws or otherwise fails on a GFS mount without
getting fenced, other nodes freeze when attempting to access the same
filesystem.  That's unexpected.
 
For a high-availabliity cluster, this can be a bad thing, because it
isn't handled automatically and effectively causes a cluster-wide
outage.  Does this sound right?  How can we mitigate or prevent such
outages?  Are there relevant configuration settings I've missed?
 
Thanks for any insight.
 
Jeff




From diegoliz at gmail.com  Wed Jan 21 07:15:38 2009
From: diegoliz at gmail.com (Diego Liziero)
Date: Wed, 21 Jan 2009 08:15:38 +0100
Subject: [Linux-cluster] GFS upgrade questions
In-Reply-To: <68fe87e60812180358m5e2edd4ej5b108a8c5fdcda63@mail.gmail.com>
References: <alpine.DEB.2.00.0812171436410.26780@lxserv0.kfki.hu>
	<Pine.LNX.4.64.0812180636520.5667@trider-g7>
	<alpine.DEB.2.00.0812180811410.12748@lxserv1.kfki.hu>
	<Pine.LNX.4.64.0812180857520.5667@trider-g7>
	<68fe87e60812180113j107b7c70r12a63eecb08f6e67@mail.gmail.com>
	<Pine.LNX.4.64.0812181215280.5667@trider-g7>
	<68fe87e60812180358m5e2edd4ej5b108a8c5fdcda63@mail.gmail.com>
Message-ID: <68fe87e60901202315h1a4bde31iabed904229321f45@mail.gmail.com>

On Thu, Dec 18, 2008 at 12:58 PM, Diego Liziero <diegoliz at gmail.com> wrote:
> On Thu, Dec 18, 2008 at 12:18 PM, Fabio M. Di Nitto <fdinitto at redhat.com> wrote:
>>
>> If you are running RHEL or CentOS, you have no reason to upgrade.
>>
>> the RHEL packages and the STABLE2 branch (cluster-2.03.xx) receives the same
>> set of bug fixes.
>> [..]
>>
>> Fabio
>
> Thanks Fabio for your clear explanation.
>
> I was thinking about this upgrade just to see if the "two nodes" case
> was better handled.
>
> Here first we had to add a quorum disk, and despite this, we were
> still having some troubles when rebooting (done with a reboot command
> from a shell):
> rgmanager waiting forever while stopping, services not migrating to
> the second node, fencing not starting when the other node is powered
> off (clean_start="1" and power fencing)...
>
> Adding a third node _seems_ to have solved most of them, though.

One thing that seems not to be solved by the third node is the slow reboot.

After writing "reboot" on a node shell, with all the other nodes
running and with the cluster quorated, sometimes it takes from about 8
to more than 17 minutes before actually rebooting, and the following
messages appear at the console:

openais[pid]: [TOTEM] The token was lost in the OPERATIONAL state.
openais[pid]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
openais[pid]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
openais[pid]: [TOTEM] The network interface is down.
openais[pid]: [TOTEM] entering GATHER state from 15.
openais[pid]: [TOTEM] entering GATHER state from 2.
openais[pid]: [TOTEM] entering GATHER state from 0.
openais[pid]: [TOTEM] The consensus timeout expired.
openais[pid]: [TOTEM] entering GATHER state from 3.
openais[pid]: [TOTEM] The consensus timeout expired.
openais[pid]: [TOTEM] entering GATHER state from 3.

Then the last 2 lines are repeated multiple times.

I saw a mail in this list stating that this can be solved by setting
/proc/sys/net/ipv4/ip_forward to 1, but here it doesn't make any
difference, then another one saying that it could be an iptables
issue, but here linux firewall is disabled.

Is this the correct way of rebooting a node?

Any chance to have a faster clean reboot?

Regards,
Diego.



From swhiteho at redhat.com  Wed Jan 21 10:10:07 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 21 Jan 2009 10:10:07 +0000
Subject: [Linux-cluster] Directories with >100K files
In-Reply-To: <64D0546C5EBBD147B75DE133D798665F021B9C1F@hugo.eprize.local>
References: <1232446761.4975a5293c53c@webmail.freedom2surf.net>
	<64D0546C5EBBD147B75DE133D798665F021B9C1F@hugo.eprize.local>
Message-ID: <1232532607.3578.8.camel@localhost.localdomain>

Hi,

On Tue, 2009-01-20 at 22:32 -0500, Jeff Sturm wrote:
> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com 
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of 
> > nick at javacat.f2s.com
> > Sent: Tuesday, January 20, 2009 5:19 AM
> > To: linux-cluster at redhat.com
> > Subject: [Linux-cluster] Directories with >100K files
> > 
> > We have a GFS filesystem mounted over iSCSI. When doing an 
> > 'ls' on directories with several thousand files it takes 
> > around 10 minutes to get a response back -
> 
> You don't say how many nodes you have, or anything about your
> networking.
> 
> Some general pointers:
> 
> - A plain "ls" is probably much faster any variant that fetches inode
> metatdata, e.g. "ls -l".  The latter performs a stat() on each
> individual file which in turn triggers locking activity of some sort.
> This is known to be slow on GFS1.  (I've heard reports that GFS2 is/will
> be better.)
> 
The latest gfs1 is also much better. It is a tricky thing to do
efficiently, and not doing the stats is a good plan.

> - You want a fast, reliable low-latency network for your cluster.  Intel
> GigE cards and a fast switch are a good bet.
> 
> - Unless your application needs access times or quota support, mounting
> with "noquota,noatime" is a good idea.  Maybe also "nodiratime".
> 
> > Can anyone recommend any GFS tunables to help us out here ?
> 
> You could try bumping demote_secs up from its default of 5 minutes.
> That'll cause locks to be held longer so they may not need to be
> reacquired so often.  It won't help with the initial directory listing,
> but should help on subsequent invocations.
> 
> In your case, with "ls" taking 8 minutes to run, some locks initially
> acuired during execution of the command have already been demoted once
> complete.
> 
Also the question to ask is how many nodes are accessing this
filesystem? If more than one node is accessing the same directory and at
least one of those does a write (i.e. inode create/delete) within the
demote_secs time, then the demote_secs time will not make much
difference since the locks will be pushed out by the other node's access
anyway.

> > Should we set statfs_fast to 1 ?
> 
> Probably good to set this, regardless.
> 
> > What about glock_purge ?
> 
> Glock_purge helps limit CPU time consumed by gfs_scand when a large
> number of unused glocks are present.  See
> http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4
> .  This may make your system run better but I'm not sure it's going to
> help with listing your giant directories.
> 
Better to disable this altogether unless there is a very good reason to
use it. It generally has the effect of pushing things out of cache early
so is to be avoided.

> > Here is the fstab entry for the GFS filesystem:
> > /dev/vggfs/lvol00       /apps                   gfs     
> > _netdev         1 2
> 
> Try "noatime,noquota" here.
> 
> Jeff
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Steve.




From swhiteho at redhat.com  Wed Jan 21 10:12:28 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 21 Jan 2009 10:12:28 +0000
Subject: [Linux-cluster] GFS withdraw and/or node I/O errors affect
	whole cluster?
In-Reply-To: <64D0546C5EBBD147B75DE133D798665F021B9C20@hugo.eprize.local>
References: <64D0546C5EBBD147B75DE133D798665F021B9C20@hugo.eprize.local>
Message-ID: <1232532748.3578.11.camel@localhost.localdomain>

Hi,

On Tue, 2009-01-20 at 22:47 -0500, Jeff Sturm wrote:
> Using a 14-node cluster on CentOS 5.2 with GFS1.
>  
> We've observed a problem in production that caused us to peform an
> unplanned cluster restart.  We also reproduced similar behavior in a lab
> environment.
>  
> If one node loses its connection to shared storage, it can no longer
> perform any filesystem activity.  The GFS filesystem may decide to
> withdraw.  That's expected.
>  
> The same node that withdraws does not get fenced.  Since the cluster
> itself depends on networking and not storage, and cluster services other
> than GFS may be active, that's not surprising.
>  
> When one node withdraws or otherwise fails on a GFS mount without
> getting fenced, other nodes freeze when attempting to access the same
> filesystem.  That's unexpected.
Yes, I'd agree that should not happen.

>  
> For a high-availabliity cluster, this can be a bad thing, because it
> isn't handled automatically and effectively causes a cluster-wide
> outage.  Does this sound right?  How can we mitigate or prevent such
> outages?  Are there relevant configuration settings I've missed?
>  
> Thanks for any insight.
>  
> Jeff
> 
I'd suggest checking your fencing settings. The chances are that
something has gone wrong and the failed node could not be fenced for
some reason. Do you get any log messages which might explain it?

Steve.

> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From marcelogomesrp at gmail.com  Wed Jan 21 10:12:58 2009
From: marcelogomesrp at gmail.com (Marcelo Gomes)
Date: Wed, 21 Jan 2009 08:12:58 -0200
Subject: [Linux-cluster] fence manual
Message-ID: <a2e590bf0901210212r66c6b59bq8615f7f14f7b09f5@mail.gmail.com>

Hi!


(Sorry for my English)

I have a redhat (Centos 5.1) cluster with gfs and the fence is manual.
What's the command to remove some computer from de cluster? and what is the
command to go back this computer in the cluster ?


Thanks

Marcelo

My cluster.conf

<?xml version="1.0"?>
<cluster config_version="7" name="cluster">
    <fence_daemon post_fail_delay="0" post_join_delay="3"/>
    <clusternodes>
        <clusternode name="server1" nodeid="1" votes="1">
            <fence>
                <method name="1">
                    <device name="Manual" nodename="server1"/>
                </method>
            </fence>
        </clusternode>
        <clusternode name="server2" nodeid="2" votes="1">
            <fence>
                <method name="1">
                    <device name="Manual" nodename="server2"/>
                </method>
            </fence>
        </clusternode>
        <clusternode name="server3" nodeid="3" votes="1">
            <fence>
                <method name="1">
                    <device name="Manual" nodename="server3"/>
                </method>
            </fence>
        </clusternode>
        <clusternode name="server4" nodeid="4" votes="1">
            <fence>
                <method name="1">
                    <device name="Manual" nodename="server4"/>
                </method>
            </fence>
        </clusternode>
        <clusternode name="server5" nodeid="5" votes="1">
            <fence>
                <method name="1">
                    <device name="Manual" nodename="server5"/>
                </method>
            </fence>
        </clusternode>
    </clusternodes>
    <cman/>
    <fencedevices>
        <fencedevice agent="fence_manual" name="Manual"/>
    </fencedevices>
    <rm>
        <failoverdomains/>
        <resources/>
    </rm>
</cluster>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090121/62e98486/attachment.htm>

From nick at javacat.f2s.com  Wed Jan 21 10:16:08 2009
From: nick at javacat.f2s.com (nick at javacat.f2s.com)
Date: Wed, 21 Jan 2009 10:16:08 +0000
Subject: [Linux-cluster] Directories with >100K files
In-Reply-To: <64D0546C5EBBD147B75DE133D798665F021B9C1F@hugo.eprize.local>
References: <1232446761.4975a5293c53c@webmail.freedom2surf.net>
	<64D0546C5EBBD147B75DE133D798665F021B9C1F@hugo.eprize.local>
Message-ID: <1232532968.4976f5e8bb8d4@webmail.freedom2surf.net>

Hi Jeff

thanks for taking the time to reply :)
Please see my comments below -

Quoting Jeff Sturm <jeff.sturm at eprize.com>:

> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of
> > nick at javacat.f2s.com
> > Sent: Tuesday, January 20, 2009 5:19 AM
> > To: linux-cluster at redhat.com
> > Subject: [Linux-cluster] Directories with >100K files
> >
> > We have a GFS filesystem mounted over iSCSI. When doing an
> > 'ls' on directories with several thousand files it takes
> > around 10 minutes to get a response back -
>
> You don't say how many nodes you have, or anything about your
> networking.

4 nodes. Each node has 2 x 1GB NIC's dedicated to iSCSI. Each NIC is on a dedicated iSCSI VLAN on a 1GB switch. The storage is on a Dell MD3000i.

>
> Some general pointers:
>
> - A plain "ls" is probably much faster any variant that fetches inode
> metatdata, e.g. "ls -l".  The latter performs a stat() on each
> individual file which in turn triggers locking activity of some sort.
> This is known to be slow on GFS1.  (I've heard reports that GFS2 is/will
> be better.)
>
> - You want a fast, reliable low-latency network for your cluster.  Intel
> GigE cards and a fast switch are a good bet.
>
> - Unless your application needs access times or quota support, mounting
> with "noquota,noatime" is a good idea.  Maybe also "nodiratime".

I have remounted with "noquota,noatime,nodiratime" with no performance increase.


> > Can anyone recommend any GFS tunables to help us out here ?
>
> You could try bumping demote_secs up from its default of 5 minutes.
> That'll cause locks to be held longer so they may not need to be
> reacquired so often.  It won't help with the initial directory listing,
> but should help on subsequent invocations.

I will investigate demote_secs, thankyou.


> In your case, with "ls" taking 8 minutes to run, some locks initially
> acuired during execution of the command have already been demoted once
> complete.
>
> > Should we set statfs_fast to 1 ?
>
> Probably good to set this, regardless.

Done.

>
> > What about glock_purge ?
>
> Glock_purge helps limit CPU time consumed by gfs_scand when a large
> number of unused glocks are present.  See
> http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4
> .  This may make your system run better but I'm not sure it's going to
> help with listing your giant directories.
>
> > Here is the fstab entry for the GFS filesystem:
> > /dev/vggfs/lvol00       /apps                   gfs
> > _netdev         1 2
>
> Try "noatime,noquota" here.
>
> Jeff
>

Here is what I have so far -

fstab: noatime,nodiratime,noquota
statfs_fast 1
statfs_slots 128
glock_purge 50
directio and inherit_directio set on the files/dirs that are taking a long time to 'ls'.

I will continue investigations and get back to the list shortly.

Many thanks
Nick.






From nick at javacat.f2s.com  Wed Jan 21 10:32:02 2009
From: nick at javacat.f2s.com (nick at javacat.f2s.com)
Date: Wed, 21 Jan 2009 10:32:02 +0000
Subject: [Linux-cluster] Directories with >100K files
In-Reply-To: <1232532607.3578.8.camel@localhost.localdomain>
References: <1232446761.4975a5293c53c@webmail.freedom2surf.net>
	<64D0546C5EBBD147B75DE133D798665F021B9C1F@hugo.eprize.local>
	<1232532607.3578.8.camel@localhost.localdomain>
Message-ID: <1232533922.4976f9a21fce1@webmail.freedom2surf.net>

Hi,

Quoting Steven Whitehouse <swhiteho at redhat.com>:

> Hi,
>
> On Tue, 2009-01-20 at 22:32 -0500, Jeff Sturm wrote:
> > > -----Original Message-----
> > > From: linux-cluster-bounces at redhat.com
> > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of
> > > nick at javacat.f2s.com
> > > Sent: Tuesday, January 20, 2009 5:19 AM
> > > To: linux-cluster at redhat.com
> > > Subject: [Linux-cluster] Directories with >100K files
> > >
> > > We have a GFS filesystem mounted over iSCSI. When doing an
> > > 'ls' on directories with several thousand files it takes
> > > around 10 minutes to get a response back -
> >
> > You don't say how many nodes you have, or anything about your
> > networking.
> >
> > Some general pointers:
> >
> > - A plain "ls" is probably much faster any variant that fetches inode
> > metatdata, e.g. "ls -l".  The latter performs a stat() on each
> > individual file which in turn triggers locking activity of some sort.
> > This is known to be slow on GFS1.  (I've heard reports that GFS2 is/will
> > be better.)
> >
> The latest gfs1 is also much better. It is a tricky thing to do
> efficiently, and not doing the stats is a good plan.
>
> > - You want a fast, reliable low-latency network for your cluster.  Intel
> > GigE cards and a fast switch are a good bet.
> >
> > - Unless your application needs access times or quota support, mounting
> > with "noquota,noatime" is a good idea.  Maybe also "nodiratime".
> >
> > > Can anyone recommend any GFS tunables to help us out here ?
> >
> > You could try bumping demote_secs up from its default of 5 minutes.
> > That'll cause locks to be held longer so they may not need to be
> > reacquired so often.  It won't help with the initial directory listing,
> > but should help on subsequent invocations.
> >
> > In your case, with "ls" taking 8 minutes to run, some locks initially
> > acuired during execution of the command have already been demoted once
> > complete.
> >
> Also the question to ask is how many nodes are accessing this
> filesystem? If more than one node is accessing the same directory and at
> least one of those does a write (i.e. inode create/delete) within the
> demote_secs time, then the demote_secs time will not make much
> difference since the locks will be pushed out by the other node's access
> anyway.

We all 4 nodes in our test env and 5 in our prod env.
The directory structure is as follows:

[root at finapp4 ~]# cd /apps/prod/prodcomn/admin/
[root at finapp4 admin]# ls
inbound  install  log  out  outbound  scripts  trace
[root at finapp4 admin]# ls log/ out/
log/:
PROD_finapp1  PROD_finapp2  PROD_finapp3  PROD_finapp4  PROD_finapp5  WFSC_oracleprod

out/:
o14679499.out  o14798714.out  PROD_finapp2  PROD_finapp4  WFSC_oracleprod
o14698655.out  PROD_finapp1   PROD_finapp3  PROD_finapp5

The WFSC_oracleprod dirs in both the log/ and the out/ directories each contain over 120,000 small files.
This WFSC_oracleprod dir will be accessed by all cluster members for both read and write operations.
If it help to make it any clearer these servers are clustered Oracle Applications servers running concurrent managers.


> > > Should we set statfs_fast to 1 ?
> >
> > Probably good to set this, regardless.
> >
> > > What about glock_purge ?
> >
> > Glock_purge helps limit CPU time consumed by gfs_scand when a large
> > number of unused glocks are present.  See
> > http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4
> > .  This may make your system run better but I'm not sure it's going to
> > help with listing your giant directories.
> >
> Better to disable this altogether unless there is a very good reason to
> use it. It generally has the effect of pushing things out of cache early
> so is to be avoided.
>
> > > Here is the fstab entry for the GFS filesystem:
> > > /dev/vggfs/lvol00       /apps                   gfs
> > > _netdev         1 2
> >
> > Try "noatime,noquota" here.

We also the the Oracle DB server accessing the GFS /apps directory from one of the Oracle Application servers via NFS, which I reckon is not helping
performance. I'm trying to get the DBA's to give me a list of directories to export instead of exporting the whole /apps partition.


Doing testing I can set statfs_fast to 1 and it makes no difference at all on an ls of any of the WFSC_oraclprod directories.

I am making tuning changes 1 at a time and seeing what happens ...

This really does seem to be harder than it should be.

Thanks
Nick.
#





From ghe.rivero at gmail.com  Wed Jan 21 10:37:30 2009
From: ghe.rivero at gmail.com (Ghe Rivero)
Date: Wed, 21 Jan 2009 11:37:30 +0100
Subject: [Linux-cluster] stop responding rgmanager
In-Reply-To: <4975B8B8.5070903@epits.com.au>
References: <4c4c71e0901200128t12a4a53am7fa04150e2649ede@mail.gmail.com>
	<4975B8B8.5070903@epits.com.au>
Message-ID: <4c4c71e0901210237u7747cecdrbf05a50b5da7ea67@mail.gmail.com>

Hi!

Thx for the advice. Now it's working properly. I don't know why rgmanager
stop working with a malformed cluster.conf, and not the other components.
Anyway, about the cluster votes, i take the info from the cluster faq (
http://sources.redhat.com/cluster/wiki/FAQ/CMAN#quorumdiskhow) with says:

Note that if you configure a quorum disk/partition, you want two_node="1" or
expected_votes="2" since the quorum disk solves the voting imbalance. You
want two_node="0" and expected_votes="3" (or nodes + 1 if it's not a
two-node cluster). However, since 0 is the default value for two_node, you
don't need to specify it at all. If this is an existing two-node cluster and
you're changing the two_node value from "1" to "0", you'll have to stop the
entire cluster and restart it after the configuration is changed (normally,
the cluster doesn't have to be stopped and restarted for configuration
changes, but two_node is a special case.) Basically, you want something like
this in your /etc/cluster/cluster.conf:

  <cman two_node="0" expected_votes="3" .../>
    <clusternodes>
       <clusternode name="node1" votes="1" .../>
       <clusternode name="node2" votes="1" .../>
    </clusternodes>
  <quorumd device="/dev/mapper/lun01" votes="1"/>

Thx!

On Tue, Jan 20, 2009 at 12:42 PM, Stewart Walters <stewart at epits.com.au>wrote:

> Ghe Rivero wrote:
>
>> Hi everyone,
>>    i've been fighting the last days with a 2-node cluster, but finally i
>> quit.
>> I'm having problems with the clurgmgrd daemon. It stop responding  when i
>> restart the cluster (just the cluster, not the services or the nodes) and
>> become unkillable. The only way to revert this situation it's restarting the
>> nodes but as you can imagine that's not a solution.
>>
>> I'm using conga to config it. Any ideas?
>>
>> Ghe Rivero
>> <?xml version="1.0"?>
>> <cluster alias="AAA" config_version="14" name="AAA">
>>        <quorumd interval="3" label="quorumlnx"
>> status_file="/tmp/qdisk-status" tko="23" votes="1"/>
>>        <cman deadnode_timeout="135" expected_nodes="3"/>
>>        <fence_daemon clean_start="0" post_fail_delay="0"
>> post_join_delay="3"/>
>>        <clusternodes>
>>                <clusternode name="node1.fqdn" nodeid="1" votes="1">
>>                        <fence>
>>                                <method name="1">
>>                                        <device name="iLO-node1"/>
>>                                </method>
>>                        </fence>
>>                </clusternode>
>>                <clusternode name="node2.fqdn" nodeid="2" votes="1">
>>                        <fence>
>>                                <method name="1">
>>                                        <device name="iLO-node2"/>
>>                                </method>
>>                        </fence>
>>                </clusternode>
>>        </clusternodes>
>>        <cman expected_votes="3" two_node="0"/>
>>        <fencedevices>
>>                <fencedevice agent="fence_ilo" hostname="10.110.65.6"
>> login="login" name="iLO-node1" passwd="Y"/>
>>                <fencedevice agent="fence_ilo" hostname="10.110.65.7"
>> login="login" name="iLO-node2" passwd="Y"/>
>>        </fencedevices>
>>        <rm>
>>                <failoverdomains>
>>                        <failoverdomain name="Web" ordered="1"
>> restricted="1">
>>                                <failoverdomainnode name="node1.fqdn"
>> priority="1"/>
>>                                <failoverdomainnode name="node2.fqdn"
>> priority="2"/>
>>                        </failoverdomain>
>>                </failoverdomains>
>>                <resources>
>>                        <script file="/etc/init.d/httpd" name="Apache"/>
>>                        <ip address="10.110.65.30" monitor_link="1"/>
>>                </resources>
>>                <service autostart="1" domain="Web" exclusive="1"
>> name="Web">
>>                        <script ref="Apache"/>
>>                </service>
>>        </rm>
>> </cluster>
>>
>>
>>
>>
>>
>> --
>> .''`.  Pienso, Luego Incordio  : :' :  `. `'   `-    www.debian.org <
>> http://www.debian.org>    www.hispalinux.es <http://www.hispalinux.es>
>>
>> GPG Key: 26F020F7
>> GPG fingerprint: 4986 39DA D152 050B 4699  9A71 66DB 5A36 26F0 20F7
>> ------------------------------------------------------------------------
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> Your cluster.conf looks a little out of whack for a 2 node cluster.  It
> looks as if it's designed for a 3 node cluster, but you've only defined two
> nodes.  This will get you in to trouble (I know from experience) :-)
>
> You've got duplicate cman entries which do not look right (although I'm
> pretty new to RHCS myself so I wouldn't consider me an authority on the
> matter).  See <cman deadnode_timeout="135" expected_nodes="3"/> and <cman
> expected_votes="3" two_node="0"/>.
>
> I would have thought that should be in a combined cman directive such as
> <cman deadnode_timeout=135 expected_votes="2" two_node="1"/>.  The expected
> votes would be 2, because in the event of split brain you'll want 1 node +
> quorum disk to remain a Quorate Cluster.
>
> In my cluster.conf <cman> is defined after the </clusternodes>.  I'm not
> sure if it makes a difference, but I would suggest removing the top most
> cman directive and merge it's parameters in to the bottom directive.
>
> Also, do you need a quorum disk?  A two node cluster can have but does not
> need one to operate.
>
> If you don't, expected_votes=1.
>
> See how you go.
>
> Regards,
>
> Stewart
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
.''`.  Pienso, Luego Incordio
: :' :
`. `'
 `-    www.debian.org    www.hispalinux.es

GPG Key: 26F020F7
GPG fingerprint: 4986 39DA D152 050B 4699  9A71 66DB 5A36 26F0 20F7
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090121/8aa374c1/attachment.htm>

From xavier.montagutelli at unilim.fr  Wed Jan 21 11:30:41 2009
From: xavier.montagutelli at unilim.fr (Xavier Montagutelli)
Date: Wed, 21 Jan 2009 12:30:41 +0100
Subject: [Linux-cluster] fence manual
In-Reply-To: <a2e590bf0901210212r66c6b59bq8615f7f14f7b09f5@mail.gmail.com>
References: <a2e590bf0901210212r66c6b59bq8615f7f14f7b09f5@mail.gmail.com>
Message-ID: <200901211230.41455.xavier.montagutelli@unilim.fr>

On Wednesday 21 January 2009, Marcelo Gomes wrote:
> Hi!
>
>
> (Sorry for my English)
>
> I have a redhat (Centos 5.1) cluster with gfs and the fence is manual.
> What's the command to remove some computer from de cluster? and what is the
> command to go back this computer in the cluster ?

What do you mean by "remove" ?

If you want to fence server1, you can :
  - use the command "fence_node server1" on server2
  - stop server1
  - use the command "fence_ack_manual -n server1" on server2
  - restart server1 to make it join the cluster again

If you want to really remove a server from the cluster, you can remove the 
entry from the cluster.conf file (for instance with the command "ccs_tool 
delnode server1")

If you want to stop one node : you can simply stop the services, in the right 
order ? "service gfs stop ; service clvmd stop ; service rgmanager stop ; 
service cman stop" ?

>
>
> Thanks
>
> Marcelo
>
> My cluster.conf
>
> <?xml version="1.0"?>
> <cluster config_version="7" name="cluster">
>     <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>     <clusternodes>
>         <clusternode name="server1" nodeid="1" votes="1">
>             <fence>
>                 <method name="1">
>                     <device name="Manual" nodename="server1"/>
>                 </method>
>             </fence>
>         </clusternode>
>         <clusternode name="server2" nodeid="2" votes="1">
>             <fence>
>                 <method name="1">
>                     <device name="Manual" nodename="server2"/>
>                 </method>
>             </fence>
>         </clusternode>
>         <clusternode name="server3" nodeid="3" votes="1">
>             <fence>
>                 <method name="1">
>                     <device name="Manual" nodename="server3"/>
>                 </method>
>             </fence>
>         </clusternode>
>         <clusternode name="server4" nodeid="4" votes="1">
>             <fence>
>                 <method name="1">
>                     <device name="Manual" nodename="server4"/>
>                 </method>
>             </fence>
>         </clusternode>
>         <clusternode name="server5" nodeid="5" votes="1">
>             <fence>
>                 <method name="1">
>                     <device name="Manual" nodename="server5"/>
>                 </method>
>             </fence>
>         </clusternode>
>     </clusternodes>
>     <cman/>
>     <fencedevices>
>         <fencedevice agent="fence_manual" name="Manual"/>
>     </fencedevices>
>     <rm>
>         <failoverdomains/>
>         <resources/>
>     </rm>
> </cluster>



-- 
Xavier Montagutelli                      Tel : +33 (0)5 55 45 77 20
Service Commun Informatique              Fax : +33 (0)5 55 45 75 95
Universite de Limoges
123, avenue Albert Thomas
87060 Limoges cedex



From Santosh.Panigrahi at in.unisys.com  Wed Jan 21 12:13:02 2009
From: Santosh.Panigrahi at in.unisys.com (Panigrahi, Santosh Kumar)
Date: Wed, 21 Jan 2009 17:43:02 +0530
Subject: [Linux-cluster] active-active httpd issue
In-Reply-To: <200707171803.39966.huangxiong@uit.com.cn>
References: <200707171803.39966.huangxiong@uit.com.cn>
Message-ID: <D566E8CF3538B54D95B925CB69CB4D2A1A490CA4@inblr-exch1.eu.uis.unisys.com>

Hello,
I have a 2 node HA cluster and I am configuring the httpd service as
active-active. I want to run httpd service in both the cluster nodes
simultaneously. I have configured 2 failover domains FD1 and FD2.
FD1 = p6pv1(priority 1) & p7pv1(priority 2)
FD2 = p6pv1(priority 2) & p7pv1(priority 1)
nodes(both installed RHEL5.2):
--------------------------
p6pv1: 10.1.40.204
p7pv1: 10.1.40.206

and here's the "/etc/cluster/cluster.conf":
-----------------------------------------
[root at p6pv1 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="HA" config_version="17" name="HA">
        <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="p6pv1.tr.unisys.com" nodeid="1"
votes="1">
                        <fence>
                                <method name="1">
                                        <device name="man1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="p7pv1.tr.unisys.com" nodeid="2"
votes="1">
                        <fence>
                                <method name="1">
                                        <device name="man2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_xen" domain="p6pv1" host="p6"
name="man1"/>
                <fencedevice agent="fence_xen" domain="p7pv1" host="p7"
name="man2"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="FD1" nofailback="0"
ordered="1" restricted="1">
                                <failoverdomainnode
name="p6pv1.tr.unisys.com" priority="1"/>
                                <failoverdomainnode
name="p7pv1.tr.unisys.com" priority="2"/>
                        </failoverdomain>
                        <failoverdomain name="FD2" nofailback="0"
ordered="1" restricted="1">
                                <failoverdomainnode
name="p6pv1.tr.unisys.com" priority="2"/>
                                <failoverdomainnode
name="p7pv1.tr.unisys.com" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="10.1.40.200" monitor_link="1"/>
                        <ip address="10.1.40.201" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="FD2" exclusive="0"
name="httpd2" recovery="relocate">
                        <ip ref="10.1.40.201">
                                <script file="/etc/rc.d/init.d/httpd"
name="httpd"/>
                        </ip>
                </service>
                <service autostart="1" domain="FD1" exclusive="0"
name="httpd1" recovery="relocate">
                        <ip ref="10.1.40.200">
                                <script file="/etc/rc.d/init.d/httpd"
name="httpd"/>
                        </ip>
                </service>
        </rm>
</cluster>
[root at p6pv1 ~]#
-----------------------------------------

When these two nodes are both running, I am seeing configured services
httpd1 and httpd2 are running on p6pv1 and p7pv1 respectively.
--------------------------------------------
[root at p6pv1 ~]# clustat
Cluster Status for HA @ Tue Dec 23 06:31:47 2008
Member Status: Quorate

 Member Name                                                     ID
Status
 ------ ----                                                     ----
------
 p6pv1.tr.unisys.com                                                 1
Online, Local, rgmanager
 p7pv1.tr.unisys.com                                                 2
Online, rgmanager

 Service Name                                                     Owner
(Last)                                                     State
 ------- ----                                                     -----
------                                                     -----
 service:httpd1
p6pv1.tr.unisys.com                                              started
 service:httpd2
p7pv1.tr.unisys.com                                              started
---------------------------------------------------

But in real the httpd service is running only at one node at any
particular instant of time. Here expected behavior is httpd should run
on both the nodes. I am getting below log messages.

[root at p6pv1 ~]# !ta
tail -f /var/log/messages
Dec 23 06:06:37 p6pv1 init: Id "x" respawning too fast: disabled for 5
minutes
Dec 23 06:09:10 p6pv1 ccsd[11169]: Update of cluster.conf complete
(version 16 -> 17).
Dec 23 06:09:22 p6pv1 clurgmgrd[11282]: <notice> Reconfiguring
Dec 23 06:09:23 p6pv1 clurgmgrd[11282]: <err> Primary attribute
collision. type=script attr=name value=httpd
Dec 23 06:11:38 p6pv1 init: Id "x" respawning too fast: disabled for 5
minutes
Dec 23 06:16:39 p6pv1 init: Id "x" respawning too fast: disabled for 5
minutes
Dec 23 06:21:40 p6pv1 init: Id "x" respawning too fast: disabled for 5
minutes
Dec 23 06:26:41 p6pv1 init: Id "x" respawning too fast: disabled for 5
minutes
Dec 23 06:31:42 p6pv1 init: Id "x" respawning too fast: disabled for 5
minutes
Dec 23 06:36:43 p6pv1 init: Id "x" respawning too fast: disabled for 5
minutes

Can some one help me to fix this issue? Any suggestions appreciated.

Thanks and Regards,
Santosh



From nick at javacat.f2s.com  Wed Jan 21 13:29:15 2009
From: nick at javacat.f2s.com (nick at javacat.f2s.com)
Date: Wed, 21 Jan 2009 13:29:15 +0000
Subject: [Linux-cluster] Directories with >100K files
In-Reply-To: <1232533922.4976f9a21fce1@webmail.freedom2surf.net>
References: <1232446761.4975a5293c53c@webmail.freedom2surf.net>
	<64D0546C5EBBD147B75DE133D798665F021B9C1F@hugo.eprize.local>
	<1232532607.3578.8.camel@localhost.localdomain>
	<1232533922.4976f9a21fce1@webmail.freedom2surf.net>
Message-ID: <1232544555.4977232bbfd47@webmail.freedom2surf.net>

Quoting nick at javacat.f2s.com:

> Hi,
>
> Quoting Steven Whitehouse <swhiteho at redhat.com>:
>
> > Hi,
> >
> > On Tue, 2009-01-20 at 22:32 -0500, Jeff Sturm wrote:
> > > > -----Original Message-----
> > > > From: linux-cluster-bounces at redhat.com
> > > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of
> > > > nick at javacat.f2s.com
> > > > Sent: Tuesday, January 20, 2009 5:19 AM
> > > > To: linux-cluster at redhat.com
> > > > Subject: [Linux-cluster] Directories with >100K files
> > > >
> > > > We have a GFS filesystem mounted over iSCSI. When doing an
> > > > 'ls' on directories with several thousand files it takes
> > > > around 10 minutes to get a response back -
> > >
> > > You don't say how many nodes you have, or anything about your
> > > networking.
> > >
> > > Some general pointers:
> > >
> > > - A plain "ls" is probably much faster any variant that fetches inode
> > > metatdata, e.g. "ls -l".  The latter performs a stat() on each
> > > individual file which in turn triggers locking activity of some sort.
> > > This is known to be slow on GFS1.  (I've heard reports that GFS2 is/will
> > > be better.)
> > >
> > The latest gfs1 is also much better. It is a tricky thing to do
> > efficiently, and not doing the stats is a good plan.
> >
> > > - You want a fast, reliable low-latency network for your cluster.  Intel
> > > GigE cards and a fast switch are a good bet.
> > >
> > > - Unless your application needs access times or quota support, mounting
> > > with "noquota,noatime" is a good idea.  Maybe also "nodiratime".
> > >
> > > > Can anyone recommend any GFS tunables to help us out here ?
> > >
> > > You could try bumping demote_secs up from its default of 5 minutes.
> > > That'll cause locks to be held longer so they may not need to be
> > > reacquired so often.  It won't help with the initial directory listing,
> > > but should help on subsequent invocations.
> > >
> > > In your case, with "ls" taking 8 minutes to run, some locks initially
> > > acuired during execution of the command have already been demoted once
> > > complete.
> > >
> > Also the question to ask is how many nodes are accessing this
> > filesystem? If more than one node is accessing the same directory and at
> > least one of those does a write (i.e. inode create/delete) within the
> > demote_secs time, then the demote_secs time will not make much
> > difference since the locks will be pushed out by the other node's access
> > anyway.
>
> We all 4 nodes in our test env and 5 in our prod env.
> The directory structure is as follows:
>
> [root at finapp4 ~]# cd /apps/prod/prodcomn/admin/
> [root at finapp4 admin]# ls
> inbound  install  log  out  outbound  scripts  trace
> [root at finapp4 admin]# ls log/ out/
> log/:
> PROD_finapp1  PROD_finapp2  PROD_finapp3  PROD_finapp4  PROD_finapp5  WFSC_oracleprod
>
> out/:
> o14679499.out  o14798714.out  PROD_finapp2  PROD_finapp4  WFSC_oracleprod
> o14698655.out  PROD_finapp1   PROD_finapp3  PROD_finapp5
>
> The WFSC_oracleprod dirs in both the log/ and the out/ directories each contain over 120,000 small files.
> This WFSC_oracleprod dir will be accessed by all cluster members for both read and write operations.
> If it help to make it any clearer these servers are clustered Oracle Applications servers running concurrent managers.
>
>
> > > > Should we set statfs_fast to 1 ?
> > >
> > > Probably good to set this, regardless.
> > >
> > > > What about glock_purge ?
> > >
> > > Glock_purge helps limit CPU time consumed by gfs_scand when a large
> > > number of unused glocks are present.  See
> > > http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4
> > > .  This may make your system run better but I'm not sure it's going to
> > > help with listing your giant directories.
> > >
> > Better to disable this altogether unless there is a very good reason to
> > use it. It generally has the effect of pushing things out of cache early
> > so is to be avoided.
> >
> > > > Here is the fstab entry for the GFS filesystem:
> > > > /dev/vggfs/lvol00       /apps                   gfs
> > > > _netdev         1 2
> > >
> > > Try "noatime,noquota" here.
>
> We also the the Oracle DB server accessing the GFS /apps directory from one of the Oracle Application servers via NFS, which I reckon is not
> helping
> performance. I'm trying to get the DBA's to give me a list of directories to export instead of exporting the whole /apps partition.
>
>
> Doing testing I can set statfs_fast to 1 and it makes no difference at all on an ls of any of the WFSC_oraclprod directories.
>
> I am making tuning changes 1 at a time and seeing what happens ...

OK here's a record of what I've done and the associated ls response times -

1. run ls with no tuning:  5m 42s
2. set statfs_fast 1:      5m 43s
                           6m 13s
3. set statfs_slots 128:   6m 10s
                           5m 36s
                           9m 31s
4. noatime,nodiratime,noquota: 6m 12s
                               6m 36s
5. set glock_purge 50:     7m 0s
                           9m 12s
6. set demote_secs 600     5m 06s
                           5m 47s
7. set directio on all files and inherit_directio on parent directory: 3m 44s
                                                                       4m 24s
                                                                       3m 44s
                                                                       4m 03s
                                                                       4m 47s
                                                                       5m 18s

So changing these values has made no difference.

What is the way forward now ? I've got users complaining left right and centre. Should I ditch GFS and use NFS ?

Cheers
Nick.







From rhurst at bidmc.harvard.edu  Wed Jan 21 14:47:12 2009
From: rhurst at bidmc.harvard.edu (Robert Hurst)
Date: Wed, 21 Jan 2009 09:47:12 -0500
Subject: [Linux-cluster] system-config-cluster Error
In-Reply-To: <1034294076.526991231365436502.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
References: <1034294076.526991231365436502.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
Message-ID: <1232549232.5189.81.camel@WSBID06223.bidmc.harvard.edu>

This appears to be new behavior for us, too, ever since applying the new
updates for kernel-smp-2.6.9-78.0.8.EL and the subsequent Cluster
Suite / GFS for it.


$ rpm -q system-config-cluster
system-config-cluster-1.0.54-2.0


With this version, regardless, I can no longer ADD / CHANGE working
fence device configurations, from any existing production cluster or by
starting a new one.  The management tab also does not display the NAME
of the node any longer -- it places the node STATUS, i.e. "M" or "X", in
that column now.

Exmaple of EXISTING RUNNING CLUSTER, editing a fence device:


[root at phoenix-1 cluster]# system-config-cluster
Traceback (most recent call last):
  File "/usr/share/system-config-cluster/ConfigTabController.py", line 1129, in on_fd
    self.fence_handler.populate_fd_form(agent_type, attrs)
  File "/usr/share/system-config-cluster/FenceHandler.py", line 274, in populate_fd_form
    apply(self.fd_populate[agent_type], attrs)
  File "/usr/share/system-config-cluster/FenceHandler.py", line 503, in pop_ilo_fd
    self.ilo_ssh.set_active(False)
AttributeError: 'NoneType' object has no attribute 'set_active'


Example of NEW CLUSTER, creating a new fence device:


[root at atlantia ~]# system-config-cluster
Traceback (most recent call last):
  File "/usr/share/system-config-cluster/ConfigTabController.py", line 1232, in on_fd_panel_ok
    return_list = self.fence_handler.validate_fencedevice(agent_type, None)
  File "/usr/share/system-config-cluster/FenceHandler.py", line 713, in validate_fencedevice
    returnlist = apply(self.fd_validate[agent_type], args)
  File "/usr/share/system-config-cluster/FenceHandler.py", line 1188, in val_bladecenter_fd
    if self.bc_ssh.get_active == True:
AttributeError: 'NoneType' object has no attribute 'get_active'


I also tried the new GUI in luci, did the luci_admin init, but its
service won't startup either -- it fails at:


[root at atlantia ~]# sh -x /var/lib/luci/bin/runzope
+ PYTHON=/usr/bin/python
+ ZOPE_HOME=/usr/lib64/luci/zope
+ INSTANCE_HOME=/var/lib/luci
+ CONFIG_FILE=/var/lib/luci/etc/zope.conf
+ '[' -d /usr/lib64/luci/zope/lib64/python ']'
+ SOFTWARE_HOME=/usr/lib64/luci/zope/lib/python
+ PYTHONPATH=/usr/lib64/luci/zope/lib/python
+ export PYTHONPATH INSTANCE_HOME SOFTWARE_HOME
+ ZOPE_RUN=/usr/lib64/luci/zope/lib/python/Zope2/Startup/run.py
+ /bin/grep True /var/lib/luci/.default_password_has_been_reset
+ exec /usr/bin/python /usr/lib64/luci/zope/lib/python/Zope2/Startup/run.py -C /var/lib/luci/etc/zope.conf
Traceback (most recent call last):
  File "/usr/lib64/luci/zope/lib/python/Zope2/Startup/run.py", line 56, in ?
  File "/usr/lib64/luci/zope/lib/python/Zope2/Startup/run.py", line 21, in run
  File "/var/tmp/conga-0.11.1-5.el4-root-brewbuilder/usr/lib64/luci/zope/lib/python/Zope2/Startup/__init__.py", line 95, in prepare
  File "/var/tmp/conga-0.11.1-5.el4-root-brewbuilder/usr/lib64/luci/zope/lib/python/Zope2/Startup/__init__.py", line 272, in makeLockFile
ImportError: No module named misc.lock_file



What gives??



Robert Hurst, Sr. Cach? Administrator
Beth Israel Deaconess Medical Center
1135 Tremont Street, REN-7
Boston, Massachusetts   02120-2140
617-754-8754 ? Fax: 617-754-8730 ? Cell: 401-787-3154
Any technology distinguishable from magic is insufficiently advanced.


On Wed, 2009-01-07 at 16:57 -0500, Bob Peterson wrote:

> ----- "Gary Romo" <garromo at us.ibm.com> wrote:
> | When I opened system-config-cluster today, I got this error;
> | 
> | Poorly Formed XML Error
> | 
> | A problem was encountered while reading configuration file
> | /etc/cluster/cluster.conf
> | Details or the error appear below. Click the `New` button to create a
> | new configuration file.
> | To continue anyway (Not recommended), click the `Ok` button
> | 
> | Relax-NG validity error : Extra element rm in interleave
> | /etc/cluster/cluster.conf:2: element cluster: Relax-NG validity error
> | : Element cluster failed to validate content
> | /etc/cluster/cluster.conf fails to validate
> | 
> | Can anyone tell me what this is and how to correct? Thanks!
> | 
> | Gary Romo 
> 
> Hi Gary,
> 
> Could it be:
> 
> http://sources.redhat.com/cluster/wiki/FAQ/GUI#gui_validityerror
> 
> Without seeing your cluster.conf it's hard to tell if it's a "real" error.
> 
> Regards,
> 
> Bob Peterson
> Red Hat GFS
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090121/e9f763ab/attachment.htm>

From Gary_Hunt at gallup.com  Wed Jan 21 16:24:11 2009
From: Gary_Hunt at gallup.com (Hunt, Gary)
Date: Wed, 21 Jan 2009 10:24:11 -0600
Subject: [Linux-cluster] Virtual storage
In-Reply-To: <87iqoatgt0.fsf@tac.ki.iif.hu>
References: <B0176B19AD215F4DA7E9EAC74EF4B0D6041F2829B1@EXCHNG5.noam.gallup.com>
	<87iqoatgt0.fsf@tac.ki.iif.hu>
Message-ID: <B0176B19AD215F4DA7E9EAC74EF4B0D6041F420FC0@EXCHNG5.noam.gallup.com>

Thanks for the information.

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Ferenc Wagner
Sent: Tuesday, January 20, 2009 4:26 AM
To: linux clustering
Subject: Re: [Linux-cluster] Virtual storage

"Hunt, Gary" <Gary_Hunt at gallup.com> writes:

> I have a 2 node cluster running 2 virtual machines.  Each machine has
> additional storage mounted to it for data storage.  I recently tried
> to resize the mounted storage.  I added more disks using vgextend and
> expanded the logical volumes using lvextend.  How do I get the virtual
> server to see that the size has changed without restarting it?  If I
> restart the virtual server, it sees the new size just fine and am able
> to extend the fs online.  Just wondering if it can all be done without
> a virtual server restart.

It's a feature request for Xen 3.4, as far as I know.
--
Feri.

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From jurgen.knodlseder at cesr.fr  Wed Jan 21 22:10:37 2009
From: jurgen.knodlseder at cesr.fr (=?ISO-8859-1?Q?J=FCrgen_Kn=F6dlseder?=)
Date: Wed, 21 Jan 2009 23:10:37 +0100
Subject: [Linux-cluster] gfs/nfs trouble
In-Reply-To: <11BB808C-9F27-4528-9E60-74CFBBA8E215@cesr.fr>
References: <2423F289-058B-47C9-879B-93858431B09D@cesr.fr>
	<1787093389-1232332208-cardhu_decombobulator_blackberry.rim.net-1197770890-@bxe252.bisx.prod.on.blackberry>
	<11BB808C-9F27-4528-9E60-74CFBBA8E215@cesr.fr>
Message-ID: <DFF96EAD-04C2-4530-A3F1-977C4375808C@cesr.fr>

I did some more testing concerning the gfs export problem over nfs.  
Here the
strace results. First for a small file where the copy was done  
successfully, then
for a larger file where the problems start. Note that the strace  
results do
vary for an issue of identical commands, indicating some  
arbitrariness in the
behaviour.

Success:
open("test.xml", O_RDONLY|O_LARGEFILE)  = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=1858, ...}) = 0
open("test2.xml", O_WRONLY|O_TRUNC|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=1858, ...}) = 0
read(3, "<?xml version=\"1.0\" ?><source_li"..., 4096) = 1858
write(4, "<?xml version=\"1.0\" ?><source_li"..., 1858) = 1858
read(3, "", 4096)                       = 0
close(4)                                = 0
close(3)                                = 0
exit_group(0)                           = ?

Failure:
open("test.png", O_RDONLY|O_LARGEFILE)  = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
open("test2.png", O_WRONLY|O_TRUNC|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
read(3, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
\232"..., 4096) = 4096
write(4, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
\232"..., 4096) = 4096
read(3, "\222$)\243\364\367\303\366\355a\305\243\266\26*+\335bW"...,  
4096) = 4096
write(4, "\222$)\243\364\367\303\366\355a\305\243\266\26*+\335bW"...,  
4096) = 4096
read(3, "\244\24\364\302\v/\360\375\357\177\377\312\327N\262\334"...,  
4096) = 4096
write(4, "\244\24\364\302\v/\360\375\357\177\377\312\327N\262 
\334"..., 4096) = 4096
read(3, "D\222$I\267\316\21,)I\215\217CM\r\24\26\302\322\245p 
\356"..., 4096) = 4096
write(4, "D\222$I\267\316\21,)I\215\217CM\r\24\26\302\322\245p 
\356"..., 4096) = 4096
read(3, "\"\361\361\361\272\343\216;t\340\300\201p\227\202\10 
\365"..., 4096) = 934
write(4, "\"\361\361\361\272\343\216;t\340\300\201p\227\202\10 
\365"..., 934) = -1 EINVAL (Invalid argument)

Another failure:
open("test.png", O_RDONLY|O_LARGEFILE)  = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
open("test2.png", O_WRONLY|O_CREAT|O_LARGEFILE, 0100644) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
read(3, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
\232"..., 4096) = 4096
write(4, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
\232"..., 4096) = 4096
read(3, "\222$)\243\364\367\303\366\355a\305\243\266\26*+\335bW"...,  
4096) = 4096
write(4, "\222$)\243\364\367\303\366\355a\305\243\266\26*+\335bW"...,  
4096) = 4096
read(3, "\244\24\364\302\v/\360\375\357\177\377\312\327N\262\334"...,  
4096) = 4096
write(4, "\244\24\364\302\v/\360\375\357\177\377\312\327N\262 
\334"..., 4096) = 4096
read(3, "D\222$I\267\316\21,)I\215\217CM\r\24\26\302\322\245p 
\356"..., 4096) = 4096
write(4, "D\222$I\267\316\21,)I\215\217CM\r\24\26\302\322\245p 
\356"..., 4096) = 4096
read(3, "\"\361\361\361\272\343\216;t\340\300\201p\227\202\10 
\365"..., 4096) = 934
write(4, "\"\361\361\361\272\343\216;t\340\300\201p\227\202\10 
\365"..., 934) = 934
read(3, "", 4096)                       = 0
close(4)                                = -1 EINVAL (Invalid argument)

And yet another one:
open("/home2/knodlseder/test.png", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
open("/home2/knodlseder/test2.png", O_WRONLY|O_TRUNC|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
read(3, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
\232"..., 4096) = 4096
write(4, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
\232"..., 4096) = 4096
read(3, "\222$)\243\364\367\303\366\355a\305\243\266\26*+\335bW"...,  
4096) = 4096
write(4, "\222$)\243\364\367\303\366\355a\305\243\266\26*+\335bW"...,  
4096) = 4096
read(3, "\244\24\364\302\v/\360\375\357\177\377\312\327N\262\334"...,  
4096) = 4096
write(4, "\244\24\364\302\v/\360\375\357\177\377\312\327N\262 
\334"..., 4096) = 4096
read(3, "D\222$I\267\316\21,)I\215\217CM\r\24\26\302\322\245p 
\356"..., 4096) = 4096
write(4, "D\222$I\267\316\21,)I\215\217CM\r\24\26\302\322\245p 
\356"..., 4096) = -1 EINVAL (Invalid argument)

J?rgen

Le 19 janv. 09 ? 09:46, J?rgen Kn?dlseder a ?crit :

> Did it. Here the results:
>
> lstat64("/home2/knodlseder/test2.png", {st_mode=S_IFREG|0644,  
> st_size=0, ...}) = 0
> stat64("/home2/knodlseder/test2.png", {st_mode=S_IFREG|0644,  
> st_size=0, ...}) = 0
> stat64("/home2/knodlseder/test.png", {st_mode=S_IFREG|0644,  
> st_size=17318, ...}) = 0
> stat64("/home2/knodlseder/test2.png", {st_mode=S_IFREG|0644,  
> st_size=0, ...}) = 0
> open("/home2/knodlseder/test.png", O_RDONLY|O_LARGEFILE) = 3
> fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
> open("/home2/knodlseder/test2.png", O_WRONLY|O_TRUNC|O_LARGEFILE) = 4
> fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
> read(3, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
> \232"..., 32768) = 17318
> write(4, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
> \232"..., 17318) = 17318
> read(3, "", 32768)                      = 0
> close(4)                                = -1 EINVAL (Invalid argument)
>
> Any clues ?
>
> J?rgen
>
> Le 19 janv. 09 ? 03:29, jumanjiman at gmail.com a ?crit :
>
>> Try running 'strace -o /tmp/errs cp /path/to(test2.png  /path/to/ 
>> destination/' and then inpect the temp file.
>>
>> -paul
>>
>> Sent via BlackBerry by AT&T
>>
>> -----Original Message-----
>> From: "J?rgen Kn?dlseder" <jurgen.knodlseder at cesr.fr>
>>
>> Date: Sun, 18 Jan 2009 22:29:48
>> To: linux clustering<linux-cluster at redhat.com>
>> Subject: [Linux-cluster] gfs/nfs trouble
>>
>>
>> Hi all,
>>
>> I have trouble exporting a GFS over NFS. Here the situation:
>>
>> I have 2 PE1950 with a MD3000 SAN attached to build a failover
>> cluster. Both machines run
>> kernel 2.6.27.7 and cluster-2.03.10. I configured several GFS on the
>> storage device that I would
>> like to export via NFS to a rather old PE2800 which runs kernel
>> 2.4.26 (I have to stick to this
>> kernel on this machine since it runs openmosix).
>>
>> When I copy on the PE2800 a small file (1858 Bytes) everything works
>> fine. When I try to copy
>> a slightly larger file (e.g. 17318 Bytes) I get the error
>>
>> cp: closing `test2.png': Invalid argument
>>
>> and I have an empty file 'test2.png' on the disk ...
>>
>> I've seen a similar bug on bugzilla (https://bugzilla.redhat.com/
>> show_bug.cgi?id=432544)
>> yet this should be fixed in my configuration ...
>>
>> Can somebody help?
>>
>> J?rgen
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090121/3f0fdc66/attachment.htm>

From fdinitto at redhat.com  Thu Jan 22 12:54:49 2009
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 22 Jan 2009 13:54:49 +0100
Subject: [Linux-cluster] Cluster 2.03.11 released
Message-ID: <1232628889.23905.49.camel@cerberus.int.fabbione.net>

The cluster team and its vibrant community are proud to announce the
2.03.11 release from the STABLE2 branch.

The STABLE2 branch collects, on a daily base, all bug fixes and the bare
minimal changes required to run the cluster on top of the most recent
Linux kernel (2.6.27) and rock solid openais (0.80.3).

This release includes several bug fixes.
Please consider upgrading as soon as possible.

The new source tarball can be downloaded here:

   ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.11.tar.gz
   https://fedorahosted.org/releases/c/l/cluster/cluster-2.03.11.tar.gz

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Happy clustering,
Fabio

Under the hood (from 2.03.10):

Bob Peterson (8):
      Use jbsize for height computations on journaled files.
      mkfs.gfs2 hangs with many journals
      Grab hold of journal-turned-RG buffers so they're not freed.
      Remove splice_read file op for jdata files.
      Make gfs2_freedi delete indirect blocks with height >= 2
      gfs: improve gfs_fsck rindex repair code
      Add optional human-readable output to gfs_tool df
      Non-default block size confuses gfs2_grow

Christine Caulfield (6):
      cman: fix memory leak
      cman: drastically improve startup errors
      cman: fix cman_tool join return code
      cman: make 'cman_tool leave -w' wait until cman has shut down
      cman: Return an error if 'cman_tool leave' is attempted during
shutdown
      cman: let 'cman-tool leave -w' wait even if shutdown has already
started

David Teigland (1):
      gfs_controld: read lockless resources from ckpts

Fabio M. Di Nitto (4):
      misc: Update copyright for 2009
      [rgmanager] Add intelligent "follow-service" logic script (part 2)
      rgmanager: fix again randomization of temp files
      build: add release script to help release management task

Jan Friesse (1):
      fence: Fix virsh agent and ssh_options in case of ssh private key

Mark Hlawatschek (3):
      [rgmanager] Add no_unmount to netfs.sh
      [rgmanager] Add intelligent "follow-service" logic script
      rgmanager: Update SAPInstance / SAPDatabase to current versions

marx (2):
      [RGMANAGER] Resolves: #474444 - Zero-length pid files cause
resource start failures
      [RGMANAGER] Resolves #449394 - Recovery policy of type restart
doesn't work

 cman/cman_tool/join.c                            |  124 +++++--
 cman/cman_tool/main.c                            |   11 +-
 cman/daemon/commands.c                           |    3 +-
 cman/lib/libcman.c                               |    3 +
 config/copyright.cf                              |    2 +-
 dlm/tests/tcpdump/dlmtop.c                       |    2 -
 doc/COPYRIGHT                                    |   62 ++--
 fence/agents/ifmib/fence_ifmib.py                |    2 +-
 fence/agents/lib/fencing.py.py                   |    6 +-
 fence/agents/virsh/fence_virsh.py                |    2 +-
 fence/man/fence_ifmib.8                          |    2 +-
 gfs-kernel/src/gfs/inode.c                       |   15 +-
 gfs-kernel/src/gfs/ops_file.c                    |   34 ++
 gfs-kernel/src/gfs/ops_file.h                    |    2 +
 gfs/gfs_fsck/rgrp.c                              |    5 +
 gfs/gfs_fsck/super.c                             |  160 ++++++--
 gfs/gfs_tool/df.c                                |   87 ++++-
 gfs/gfs_tool/gfs_tool.h                          |    6 +-
 gfs/gfs_tool/main.c                              |   12 +-
 gfs/man/gfs_tool.8                               |   22 +-
 gfs2/convert/gfs2_convert.c                      |   10 +-
 gfs2/libgfs2/buf.c                               |    3 +-
 gfs2/libgfs2/fs_ops.c                            |   65 +++-
 gfs2/libgfs2/libgfs2.h                           |    3 +-
 gfs2/libgfs2/misc.c                              |   11 +-
 gfs2/mkfs/main_grow.c                            |    2 +-
 group/gfs_controld/plock.c                       |   11 +-
 make/release.mk                                  |   80 ++++
 rgmanager/src/resources/Makefile                 |   10 +-
 rgmanager/src/resources/SAPDatabase              |  442
++++++++++++++++++----
 rgmanager/src/resources/SAPInstance              |  327
+++++++++++++----
 rgmanager/src/resources/follow-service.sl        |  151 ++++++++
 rgmanager/src/resources/netfs.sh                 |   22 ++
 rgmanager/src/resources/utils/config-utils.sh.in |    8 +
 rgmanager/src/resources/utils/ra-skelet.sh       |    5 +-
 35 files changed, 1394 insertions(+), 318 deletions(-)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090122/5f46ab90/attachment.sig>

From jurgen.knodlseder at cesr.fr  Thu Jan 22 13:54:09 2009
From: jurgen.knodlseder at cesr.fr (=?ISO-8859-1?Q?J=FCrgen_Kn=F6dlseder?=)
Date: Thu, 22 Jan 2009 14:54:09 +0100
Subject: [Linux-cluster] gfs/nfs trouble
In-Reply-To: <DFF96EAD-04C2-4530-A3F1-977C4375808C@cesr.fr>
References: <2423F289-058B-47C9-879B-93858431B09D@cesr.fr>
	<1787093389-1232332208-cardhu_decombobulator_blackberry.rim.net-1197770890-@bxe252.bisx.prod.on.blackberry>
	<11BB808C-9F27-4528-9E60-74CFBBA8E215@cesr.fr>
	<DFF96EAD-04C2-4530-A3F1-977C4375808C@cesr.fr>
Message-ID: <8C82B14B-EBD9-42CC-9A80-00B183B8E598@cesr.fr>

I confirm that the problem persists with cluster 2.03.11 release ...

By the way: replacing the gfs filesystem by gfs2 makes the problem go  
away.
Yet as one reads so often: gfs2 is not production ready, so I would  
prefer to
get this working with gfs :-)

J?rgen

Le 21 janv. 09 ? 23:10, J?rgen Kn?dlseder a ?crit :

> I did some more testing concerning the gfs export problem over nfs.  
> Here the
> strace results. First for a small file where the copy was done  
> successfully, then
> for a larger file where the problems start. Note that the strace  
> results do
> vary for an issue of identical commands, indicating some  
> arbitrariness in the
> behaviour.
>
> Success:
> open("test.xml", O_RDONLY|O_LARGEFILE)  = 3
> fstat64(3, {st_mode=S_IFREG|0644, st_size=1858, ...}) = 0
> open("test2.xml", O_WRONLY|O_TRUNC|O_LARGEFILE) = 4
> fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> fstat64(3, {st_mode=S_IFREG|0644, st_size=1858, ...}) = 0
> read(3, "<?xml version=\"1.0\" ?><source_li"..., 4096) = 1858
> write(4, "<?xml version=\"1.0\" ?><source_li"..., 1858) = 1858
> read(3, "", 4096)                       = 0
> close(4)                                = 0
> close(3)                                = 0
> exit_group(0)                           = ?
>
> Failure:
> open("test.png", O_RDONLY|O_LARGEFILE)  = 3
> fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
> open("test2.png", O_WRONLY|O_TRUNC|O_LARGEFILE) = 4
> fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
> read(3, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
> \232"..., 4096) = 4096
> write(4, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
> \232"..., 4096) = 4096
> read(3, "\222$)\243\364\367\303\366\355a\305\243\266\26*+ 
> \335bW"..., 4096) = 4096
> write(4, "\222$)\243\364\367\303\366\355a\305\243\266\26*+ 
> \335bW"..., 4096) = 4096
> read(3, "\244\24\364\302\v/\360\375\357\177\377\312\327N\262 
> \334"..., 4096) = 4096
> write(4, "\244\24\364\302\v/\360\375\357\177\377\312\327N\262 
> \334"..., 4096) = 4096
> read(3, "D\222$I\267\316\21,)I\215\217CM\r\24\26\302\322\245p 
> \356"..., 4096) = 4096
> write(4, "D\222$I\267\316\21,)I\215\217CM\r\24\26\302\322\245p 
> \356"..., 4096) = 4096
> read(3, "\"\361\361\361\272\343\216;t\340\300\201p\227\202\10 
> \365"..., 4096) = 934
> write(4, "\"\361\361\361\272\343\216;t\340\300\201p\227\202\10 
> \365"..., 934) = -1 EINVAL (Invalid argument)
>
> Another failure:
> open("test.png", O_RDONLY|O_LARGEFILE)  = 3
> fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
> open("test2.png", O_WRONLY|O_CREAT|O_LARGEFILE, 0100644) = 4
> fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
> read(3, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
> \232"..., 4096) = 4096
> write(4, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
> \232"..., 4096) = 4096
> read(3, "\222$)\243\364\367\303\366\355a\305\243\266\26*+ 
> \335bW"..., 4096) = 4096
> write(4, "\222$)\243\364\367\303\366\355a\305\243\266\26*+ 
> \335bW"..., 4096) = 4096
> read(3, "\244\24\364\302\v/\360\375\357\177\377\312\327N\262 
> \334"..., 4096) = 4096
> write(4, "\244\24\364\302\v/\360\375\357\177\377\312\327N\262 
> \334"..., 4096) = 4096
> read(3, "D\222$I\267\316\21,)I\215\217CM\r\24\26\302\322\245p 
> \356"..., 4096) = 4096
> write(4, "D\222$I\267\316\21,)I\215\217CM\r\24\26\302\322\245p 
> \356"..., 4096) = 4096
> read(3, "\"\361\361\361\272\343\216;t\340\300\201p\227\202\10 
> \365"..., 4096) = 934
> write(4, "\"\361\361\361\272\343\216;t\340\300\201p\227\202\10 
> \365"..., 934) = 934
> read(3, "", 4096)                       = 0
> close(4)                                = -1 EINVAL (Invalid argument)
>
> And yet another one:
> open("/home2/knodlseder/test.png", O_RDONLY|O_LARGEFILE) = 3
> fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
> open("/home2/knodlseder/test2.png", O_WRONLY|O_TRUNC|O_LARGEFILE) = 4
> fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
> read(3, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
> \232"..., 4096) = 4096
> write(4, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
> \232"..., 4096) = 4096
> read(3, "\222$)\243\364\367\303\366\355a\305\243\266\26*+ 
> \335bW"..., 4096) = 4096
> write(4, "\222$)\243\364\367\303\366\355a\305\243\266\26*+ 
> \335bW"..., 4096) = 4096
> read(3, "\244\24\364\302\v/\360\375\357\177\377\312\327N\262 
> \334"..., 4096) = 4096
> write(4, "\244\24\364\302\v/\360\375\357\177\377\312\327N\262 
> \334"..., 4096) = 4096
> read(3, "D\222$I\267\316\21,)I\215\217CM\r\24\26\302\322\245p 
> \356"..., 4096) = 4096
> write(4, "D\222$I\267\316\21,)I\215\217CM\r\24\26\302\322\245p 
> \356"..., 4096) = -1 EINVAL (Invalid argument)
>
> J?rgen
>
> Le 19 janv. 09 ? 09:46, J?rgen Kn?dlseder a ?crit :
>
>> Did it. Here the results:
>>
>> lstat64("/home2/knodlseder/test2.png", {st_mode=S_IFREG|0644,  
>> st_size=0, ...}) = 0
>> stat64("/home2/knodlseder/test2.png", {st_mode=S_IFREG|0644,  
>> st_size=0, ...}) = 0
>> stat64("/home2/knodlseder/test.png", {st_mode=S_IFREG|0644,  
>> st_size=17318, ...}) = 0
>> stat64("/home2/knodlseder/test2.png", {st_mode=S_IFREG|0644,  
>> st_size=0, ...}) = 0
>> open("/home2/knodlseder/test.png", O_RDONLY|O_LARGEFILE) = 3
>> fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
>> open("/home2/knodlseder/test2.png", O_WRONLY|O_TRUNC|O_LARGEFILE) = 4
>> fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
>> fstat64(3, {st_mode=S_IFREG|0644, st_size=17318, ...}) = 0
>> read(3, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
>> \232"..., 32768) = 17318
>> write(4, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\3 \0\0\2X\10\6\0\0\0 
>> \232"..., 17318) = 17318
>> read(3, "", 32768)                      = 0
>> close(4)                                = -1 EINVAL (Invalid  
>> argument)
>>
>> Any clues ?
>>
>> J?rgen
>>
>> Le 19 janv. 09 ? 03:29, jumanjiman at gmail.com a ?crit :
>>
>>> Try running 'strace -o /tmp/errs cp /path/to(test2.png  /path/to/ 
>>> destination/' and then inpect the temp file.
>>>
>>> -paul
>>>
>>> Sent via BlackBerry by AT&T
>>>
>>> -----Original Message-----
>>> From: "J?rgen Kn?dlseder" <jurgen.knodlseder at cesr.fr>
>>>
>>> Date: Sun, 18 Jan 2009 22:29:48
>>> To: linux clustering<linux-cluster at redhat.com>
>>> Subject: [Linux-cluster] gfs/nfs trouble
>>>
>>>
>>> Hi all,
>>>
>>> I have trouble exporting a GFS over NFS. Here the situation:
>>>
>>> I have 2 PE1950 with a MD3000 SAN attached to build a failover
>>> cluster. Both machines run
>>> kernel 2.6.27.7 and cluster-2.03.10. I configured several GFS on the
>>> storage device that I would
>>> like to export via NFS to a rather old PE2800 which runs kernel
>>> 2.4.26 (I have to stick to this
>>> kernel on this machine since it runs openmosix).
>>>
>>> When I copy on the PE2800 a small file (1858 Bytes) everything works
>>> fine. When I try to copy
>>> a slightly larger file (e.g. 17318 Bytes) I get the error
>>>
>>> cp: closing `test2.png': Invalid argument
>>>
>>> and I have an empty file 'test2.png' on the disk ...
>>>
>>> I've seen a similar bug on bugzilla (https://bugzilla.redhat.com/
>>> show_bug.cgi?id=432544)
>>> yet this should be fixed in my configuration ...
>>>
>>> Can somebody help?
>>>
>>> J?rgen
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090122/96e44e2a/attachment.htm>

From rpeterso at redhat.com  Thu Jan 22 14:28:57 2009
From: rpeterso at redhat.com (Bob Peterson)
Date: Thu, 22 Jan 2009 09:28:57 -0500 (EST)
Subject: [Linux-cluster] gfs/nfs trouble
In-Reply-To: <8C82B14B-EBD9-42CC-9A80-00B183B8E598@cesr.fr>
Message-ID: <2060351896.1086321232634537789.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>

----- "J?rgen Kn?dlseder" <jurgen.knodlseder at cesr.fr> wrote:
| I confirm that the problem persists with cluster 2.03.11 release ...
| 
| 
| By the way: replacing the gfs filesystem by gfs2 makes the problem go
| away.
| Yet as one reads so often: gfs2 is not production ready, so I would
| prefer to
| get this working with gfs :-)
| 
| 
| J?rgen

Hi J?rgen,

With Red Hat's the release of 5.3 gfs2 is not only production ready,
it is also fully supported.

Regards,

Bob Peterson
Red Hat GFS



From nico at altiva.fr  Thu Jan 22 14:56:42 2009
From: nico at altiva.fr (NM)
Date: Thu, 22 Jan 2009 14:56:42 +0000 (UTC)
Subject: [Linux-cluster] Session timeout in Luci, and password autocomplete
Message-ID: <gla1fa$skj$1@ger.gmane.org>

Is there a way to change the session idle timeout in Luci? As it stands 
it's quite annoying to have to re-log in so often, considering how it's 
typically used. And the popup Javascript alert is annoying as hell too, 
couldn't it just print a message inside the browser window?

Furthermore the login dialog does not register with Firefox's password 
autocompletion feature. I had to write a little greasemonkey script to 
fill it in for me, but that's not something I should have had to do. 



From jurgen.knodlseder at cesr.fr  Thu Jan 22 15:05:54 2009
From: jurgen.knodlseder at cesr.fr (=?ISO-8859-1?Q?J=FCrgen_Kn=F6dlseder?=)
Date: Thu, 22 Jan 2009 16:05:54 +0100
Subject: [Linux-cluster] gfs/nfs trouble
In-Reply-To: <2060351896.1086321232634537789.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
References: <2060351896.1086321232634537789.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
Message-ID: <E2AF78E8-3B73-4B79-894D-736A8D5F0A37@cesr.fr>

Hi Bob,

Can you run Red Hat 5.3 on a 2.6.27 or 2.6.28 kernel to get full SSI
support from MOSIX? (that's my second constraint in addition to having
gfs working ...)

Regards,

J?rgen

Le 22 janv. 09 ? 15:28, Bob Peterson a ?crit :

> ----- "J?rgen Kn?dlseder" <jurgen.knodlseder at cesr.fr> wrote:
> | I confirm that the problem persists with cluster 2.03.11 release ...
> |
> |
> | By the way: replacing the gfs filesystem by gfs2 makes the  
> problem go
> | away.
> | Yet as one reads so often: gfs2 is not production ready, so I would
> | prefer to
> | get this working with gfs :-)
> |
> |
> | J?rgen
>
> Hi J?rgen,
>
> With Red Hat's the release of 5.3 gfs2 is not only production ready,
> it is also fully supported.
>
> Regards,
>
> Bob Peterson
> Red Hat GFS
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From burton at simondsfamily.com  Thu Jan 22 15:36:34 2009
From: burton at simondsfamily.com (burton at simondsfamily.com)
Date: Thu, 22 Jan 2009 10:36:34 -0500
Subject: [Linux-cluster] Ip Settings
Message-ID: <16451199.601601232638594927.JavaMail.servlet@perfora>

I would like some advice on setting up a High  Availability solution using RHCS with Apache.

I would like my clustered nodes to communicate on vlan, and the client connections to come through another vlan.   Presently, I have 2 interfaces on each of 2 nodes.  1 interface on each vlan.  I also have a vip address for the clients to connect to for the apache server.

node a:
xxx.xxx.100.1  eth0 (used for client connections)
xxx.xxx.200.1  eth1 (used for interconnect, cluster communication)

node b
xxx.xxx.100.2  eth0 (used for client connections)
xxx.xxx.200.2  eth1 (used for interconnect, cluster communication)

xxx.xxx.100.100 address for clients to connect to apache.


My question is how do I configure the apache service (and / or the resources)  to use client interfaces using the vip address?

If i need to provide more information, please ask.  This is my first go at setting up a cluster.

Thank you,
B



From jeff.sturm at eprize.com  Thu Jan 22 16:10:16 2009
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Thu, 22 Jan 2009 11:10:16 -0500
Subject: [Linux-cluster] Directories with >100K files
In-Reply-To: <1232544555.4977232bbfd47@webmail.freedom2surf.net>
References: <1232446761.4975a5293c53c@webmail.freedom2surf.net><64D0546C5EBBD147B75DE133D798665F021B9C1F@hugo.eprize.local><1232532607.3578.8.camel@localhost.localdomain><1232533922.4976f9a21fce1@webmail.freedom2surf.net>
	<1232544555.4977232bbfd47@webmail
Message-ID: <64D0546C5EBBD147B75DE133D798665F021B9C71@hugo.eprize.local>

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of 
> nick at javacat.f2s.com
> Sent: Wednesday, January 21, 2009 8:29 AM
> To: linux clustering
> Subject: RE: [Linux-cluster] Directories with >100K files
> 
> What is the way forward now ? I've got users complaining left 
> right and centre. Should I ditch GFS and use NFS ?

You've hit an area where GFS doesn't work so well.  I don't know if NFS
will be much better--others with more experience may know.  (For our
application we solely use GFS over other shared filesystem technologies
because we require strict posix locking.)

Your options seem to be:

A) Limit FS activity to as few nodes as possible.  (Does it perform
suitably when mounted on only a single node?)

B) Crank up demote_secs, an hour or more, until it either relieves your
problem, or cripples the system because too many locks are held too
long.  (I have a filesystem here with demote_secs=86400 so we can get
generally good rsync performance with over 50,000 file/directory
entries.)

C) Use some alternative to GFS.

Sorry if there's not a better answer.

Jeff




From swhiteho at redhat.com  Thu Jan 22 17:39:35 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Thu, 22 Jan 2009 17:39:35 +0000
Subject: [Linux-cluster] gfs/nfs trouble
In-Reply-To: <E2AF78E8-3B73-4B79-894D-736A8D5F0A37@cesr.fr>
References: <2060351896.1086321232634537789.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
	<E2AF78E8-3B73-4B79-894D-736A8D5F0A37@cesr.fr>
Message-ID: <1232645975.9571.653.camel@quoit>

Hi,

On Thu, 2009-01-22 at 16:05 +0100, J?rgen Kn?dlseder wrote:
> Hi Bob,
> 
> Can you run Red Hat 5.3 on a 2.6.27 or 2.6.28 kernel to get full SSI
> support from MOSIX? (that's my second constraint in addition to having
> gfs working ...)
> 
> Regards,
> 
> J?rgen
> 
Well if you use a different kernel, then its not 5.3... the issue is
that we need to do a little more testing of the upstream GFS2. Its
pretty close to being ready though. If you are getting a system ready
and are doing pre-production tests, then you might well want to consider
using GFS2 as it won't be too much longer now before we declare it
ready.

The more people who can help test, the closer we'll get towards our goal
of calling it stable and production ready. The upstream GFS2 has one or
two features which 5.3 doesn't have so that we need to ensure that those
changes haven't caused any regressions along the way.

We are of course very interested in any feedback from testing, both
positive and negative in the mean time,

Steve.

> Le 22 janv. 09 ? 15:28, Bob Peterson a ?crit :
> 
> > ----- "J?rgen Kn?dlseder" <jurgen.knodlseder at cesr.fr> wrote:
> > | I confirm that the problem persists with cluster 2.03.11 release ...
> > |
> > |
> > | By the way: replacing the gfs filesystem by gfs2 makes the  
> > problem go
> > | away.
> > | Yet as one reads so often: gfs2 is not production ready, so I would
> > | prefer to
> > | get this working with gfs :-)
> > |
> > |
> > | J?rgen
> >
> > Hi J?rgen,
> >
> > With Red Hat's the release of 5.3 gfs2 is not only production ready,
> > it is also fully supported.
> >
> > Regards,
> >
> > Bob Peterson
> > Red Hat GFS
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From jurgen.knodlseder at cesr.fr  Thu Jan 22 20:33:50 2009
From: jurgen.knodlseder at cesr.fr (=?ISO-8859-1?Q?J=FCrgen_Kn=F6dlseder?=)
Date: Thu, 22 Jan 2009 21:33:50 +0100
Subject: [Linux-cluster] gfs/nfs trouble
In-Reply-To: <1232645975.9571.653.camel@quoit>
References: <2060351896.1086321232634537789.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
	<E2AF78E8-3B73-4B79-894D-736A8D5F0A37@cesr.fr>
	<1232645975.9571.653.camel@quoit>
Message-ID: <64891EE4-F38E-4D8F-BC16-FA8032D7FB35@cesr.fr>

Hi Steve,

I think indeed gfs2 is the way to go ... I'm now running on cluster  
2.03.11 using both
gfs and gfs2 on some partitions for comparison and will of course  
continue to post
any problems that I encounter ...

J?rgen

Le 22 janv. 09 ? 18:39, Steven Whitehouse a ?crit :

> Hi,
>
> On Thu, 2009-01-22 at 16:05 +0100, J?rgen Kn?dlseder wrote:
>> Hi Bob,
>>
>> Can you run Red Hat 5.3 on a 2.6.27 or 2.6.28 kernel to get full SSI
>> support from MOSIX? (that's my second constraint in addition to  
>> having
>> gfs working ...)
>>
>> Regards,
>>
>> J?rgen
>>
> Well if you use a different kernel, then its not 5.3... the issue is
> that we need to do a little more testing of the upstream GFS2. Its
> pretty close to being ready though. If you are getting a system ready
> and are doing pre-production tests, then you might well want to  
> consider
> using GFS2 as it won't be too much longer now before we declare it
> ready.
>
> The more people who can help test, the closer we'll get towards our  
> goal
> of calling it stable and production ready. The upstream GFS2 has  
> one or
> two features which 5.3 doesn't have so that we need to ensure that  
> those
> changes haven't caused any regressions along the way.
>
> We are of course very interested in any feedback from testing, both
> positive and negative in the mean time,
>
> Steve.
>
>> Le 22 janv. 09 ? 15:28, Bob Peterson a ?crit :
>>
>>> ----- "J?rgen Kn?dlseder" <jurgen.knodlseder at cesr.fr> wrote:
>>> | I confirm that the problem persists with cluster 2.03.11  
>>> release ...
>>> |
>>> |
>>> | By the way: replacing the gfs filesystem by gfs2 makes the
>>> problem go
>>> | away.
>>> | Yet as one reads so often: gfs2 is not production ready, so I  
>>> would
>>> | prefer to
>>> | get this working with gfs :-)
>>> |
>>> |
>>> | J?rgen
>>>
>>> Hi J?rgen,
>>>
>>> With Red Hat's the release of 5.3 gfs2 is not only production ready,
>>> it is also fully supported.
>>>
>>> Regards,
>>>
>>> Bob Peterson
>>> Red Hat GFS
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From stewart at epits.com.au  Fri Jan 23 02:40:44 2009
From: stewart at epits.com.au (Stewart Walters)
Date: Fri, 23 Jan 2009 11:40:44 +0900
Subject: [Linux-cluster] Ip Settings
In-Reply-To: <16451199.601601232638594927.JavaMail.servlet@perfora>
References: <16451199.601601232638594927.JavaMail.servlet@perfora>
Message-ID: <49792E2C.8080005@epits.com.au>

burton at simondsfamily.com wrote:
> I would like some advice on setting up a High  Availability solution using RHCS with Apache.
>
> I would like my clustered nodes to communicate on vlan, and the client connections to come through another vlan.   Presently, I have 2 interfaces on each of 2 nodes.  1 interface on each vlan.  I also have a vip address for the clients to connect to for the apache server.
>
> node a:
> xxx.xxx.100.1  eth0 (used for client connections)
> xxx.xxx.200.1  eth1 (used for interconnect, cluster communication)
>
> node b
> xxx.xxx.100.2  eth0 (used for client connections)
> xxx.xxx.200.2  eth1 (used for interconnect, cluster communication)
>
> xxx.xxx.100.100 address for clients to connect to apache.
>
>
> My question is how do I configure the apache service (and / or the resources)  to use client interfaces using the vip address?
>
> If i need to provide more information, please ask.  This is my first go at setting up a cluster.
>
> Thank you,
> B
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   

One way to do this is similar to a way that I have done on previous 
customer.  We had several different NICs each with a VLANed IP and we 
used differing DNS names attached to each NIC to allow the traffic to be 
routed via particular NICs.

So you might want the following in DNS:

nodea.example.com = xxx.xxx.100.1
nodea-cluster.example.com = xxx.xxx.200.1
nodeb.example.com = xxx.xxx.100.2
nodeb-cluster.example.com = xxx.xxx.200.2

In the previous installation, we had to also put in place some advanced 
IP routing policies through the iproute2 package.  The trick is to send 
any traffic received on the eth1 that back out via eth1 - not via the 
default route (which would be eth0).

Linux Journal has an article on how to do this titled "Overcoming 
Asymmetric Routing on Multi-Homed Servers" found at 
http://www.linuxjournal.com/article/7291.

Then you just define in your cluster.conf that the two nodes are 
nodea-cluster.example.com and nodeb-cluster.example.com instead of 
nodea.example.com etc. etc.

We ran in to our fair share of network traffic problems with this sort 
of configuration due to the complexity of the network we were working 
with but once we had tuned it, it worked quite well.

If you can, I'd recommend you make yourself friendly with top-gun 
network engineer to help you assist you with networking issues should 
they arise. They can be invaluable when troubleshooting something that 
doesn't work :-)

Regards

Stewart



From Robert.Hell at fabasoft.com  Fri Jan 23 10:51:55 2009
From: Robert.Hell at fabasoft.com (Hell, Robert)
Date: Fri, 23 Jan 2009 11:51:55 +0100
Subject: [Linux-cluster] Two-node clusters in split-sites
Message-ID: <B710F3299F04664DB6B37C258FDEEB9402489D67@FABAMAIL.fabagl.fabasoft.com>

Hi,

 

we are planning to run two-node clusters (for database and
file-services) with cluster nodes in different sites with a fast, but
sometimes unreliable network connection. Data should be replicated with
DRBD between the cluster nodes/sites (no shared storage).

I'm specially afraid of split brain here. In this scenario fencing
depends on the (unreliable) network connection between the two sites -
in my opinion fencing is unreliable and not possible in this scenario. 

 

What would be the best way to run such a cluster?

 

Regards,

Robert

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090123/efcc253d/attachment.htm>

From m.watts at eris.qinetiq.com  Fri Jan 23 14:32:25 2009
From: m.watts at eris.qinetiq.com (Mark Watts)
Date: Fri, 23 Jan 2009 14:32:25 +0000
Subject: [Linux-cluster] Cluster reboot problems
Message-ID: <200901231432.29333.m.watts@eris.qinetiq.com>


Hi,

I've got a 3-node RHEL 5.3 cluster. I'm running the cluster nodes as XEN Dom0 
domains so I can deploy DomU domains as vm services within  the cluster.
Hardware is:

3 x Dell PowerEdge 1855 blades
2 x Dell PowerConnect 5316M Ethernet modules (for eth0 and eth1)

I have a 4th blade acting as an iSCSI target, exporting a 2GB and two 20GB 
targets. The 2GB target is used as /etc/xen/ on the cluster nodes, mounted as 
a _netdev mount in /etc/fstab on the cluster nodes (mounted on /xen, with 
symlinks from /etc/xen to /xen/xen).
All network traffic uses the same switch module, since I'm only using eth0 at 
this time.

To install the nodes, I'm kickstarting from a Satellite, and doing a "yum 
update" followed by a reboot to get to RHEL 5.3.
I also deploy the same cluster.conf to each node (appended to this email).
I then bring up cman, rgmanager. clvmd and gfs on all nodes (using the "Send 
input to all sessions" feature of Konsole to start the services at the same 
time on all nodes). This brings up the cluster, and allows me to mount the 
iSCSI target for /xen.
Starting xend allows me to enable the vm service listed in cluster.conf 
(clusvcadm -e vm:node1)
Oh, I also log *.* to a syslog server so I can see all the logs in one place.

Nodes are:
	c1.eris.qinetiq.com
	c2.eris.qinetiq.com
	c3.eris.qinetiq.com

"So far so good", I think.

So, I enable cman, rgmanager, clvmd, gfs and xend to start on boot and reboot 
the cluster (all three nodes at the same time)

At which point everything starts to fall apart.

As the nodes come up and try and create a cluster, nodes c1 and c2 appear to 
form a cluster, and then fence node c3 when it joins.

When node c3 comes back up and tries to join the cluster, node c1 decides the 
cluster is no-longer quorate, and fences node c2.
When node c2 comes back up and tries to join the cluster, node c1 decides the 
cluster is no-longer quorate, and fences node c3.

This then continues for as long as I'm entertained watching the logs, and 
switch off all three servers.


Does anyone have any insight as to what the difference is between starting the 
cluster services manually, and starting them at boot is, and why that 
difference (because I can't think of any other difference between the two 
states) would cause me to never gain a stable cluster?

I'm at a bit of a loss really - I moved from a 2-node cluster to a 3-node one 
to try and avoid exactly these problems.
I've also had the same problem with a CentOS 5.2 cluster on the same 
hardware - in that case the nodes were still fencing each other the following 
morning, 18 hours later!


Regards,

Mark.

-- 
Mark Watts BSc RHCE MBCS
Senior Systems Engineer
QinetiQ Applied Technologies
GPG Key: http://www.linux-corner.info/mwatts.gpg
-------------- next part --------------
<?xml version="1.0"?>
<cluster alias="WebFarmTest" config_version="1" name="WebFarmTest">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="c1.eris.qinetiq.com" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="DRACMC" modulename="Server-1" action="Off"/>
                                        <device name="DRACMC" modulename="Server-1" action="On"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="c2.eris.qinetiq.com" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="DRACMC" modulename="Server-2" action="Off"/>
                                        <device name="DRACMC" modulename="Server-2" action="On"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="c3.eris.qinetiq.com" nodeid="3" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="DRACMC" modulename="Server-3" action="Off"/>
                                        <device name="DRACMC" modulename="Server-3" action="On"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="2"/>
        <fencedevices>
                <fencedevice agent="fence_drac" ipaddr="XXX" login="XXX" name="DRACMC" passwd="XXX"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="webfarm-fd" nofailback="0" ordered="0" restricted="1">
                                <failoverdomainnode name="c1.eris.qinetiq.com" priority="1"/>
                                <failoverdomainnode name="c2.eris.qinetiq.com" priority="1"/>
                                <failoverdomainnode name="c3.eris.qinetiq.com" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources/>
                <vm autostart="1" domain="webfarm-fd" exclusive="1" migrate="live" name="node1" path="/etc/xen/" recovery="relocate"/>
        </rm>
</cluster>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090123/d20c6910/attachment.sig>

From lists at serioustechnology.com  Fri Jan 23 15:26:15 2009
From: lists at serioustechnology.com (Geoffrey)
Date: Fri, 23 Jan 2009 10:26:15 -0500
Subject: [Linux-cluster] virtualization on top of Red Hat cluster
Message-ID: <4979E197.3080605@serioustechnology.com>

We would like to find out if there are other folks attempting to do what 
we are trying to do.  We are having all kinds of problems.

We are trying to set up an 8 node cluster and run a number of virtual 
machines on top of this cluster.  The boxes are running Red Hat 5 and we 
are using Xen.

Before going into the actual issues we are running into, I thought we 
should first find out if there are others who are doing this, or 
attempting to do this.

This includes a separate virtual server for the following services: 
squid, ldap, mail server, dns, samba, email and a handful of 
Xservers/application servers.

The specific hardware includes:

8 Dell 19150 nodes w/ rh5.2 xen
Each node has 32GB of ram and 8 cores (xeon 2.66)

EMC San CX-310
2 Brocade 5000 fibre switches

Any feedback from anyone attempting or currently running a similar 
solution (virtualization on top of cluster) would be greatly appreciated.

-- 
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin



From swells at redhat.com  Fri Jan 23 16:01:59 2009
From: swells at redhat.com (Shawn Wells)
Date: Fri, 23 Jan 2009 11:01:59 -0500
Subject: [Linux-cluster] virtualization on top of Red Hat cluster
In-Reply-To: <4979E197.3080605@serioustechnology.com>
References: <4979E197.3080605@serioustechnology.com>
Message-ID: <4979E9F7.5000805@redhat.com>

Geoffrey wrote:
> We would like to find out if there are other folks attempting to do 
> what we are trying to do.  We are having all kinds of problems.
>
> We are trying to set up an 8 node cluster and run a number of virtual 
> machines on top of this cluster.  The boxes are running Red Hat 5 and 
> we are using Xen.
>
> Before going into the actual issues we are running into, I thought we 
> should first find out if there are others who are doing this, or 
> attempting to do this.
Plenty.  A public one is Booz-Allen, 
http://customers.press.redhat.com/2008/06/25/booz-allen-hamilton-2008-red-hat-innovation-award-winner/

"The solution consisted of Red Hat Enterprise Linux Advanced Platform, 
Red Hat Global File System (GFS) and Red Hat Cluster Suite, Red Hat 
Network Satellite and soon Red Hat Directory Server, JBoss Application 
Server, JBoss Enterprise Service Bus, JBoss Operations Network, and, 
most recently, Metamatrix. The deployed environment consist of a 
seven-node Oracle Real Application Clusters (RAC) grid (five Dell 2950s 
and two Dell 6850s all with maxed-out memory running Red Hat Enterprise 
Linux housing multiple Oracle RAC databases that performed data 
warehousing functions, online transaction processing (OLTP) functions 
and multi-language text indexing functions. There also is an eight-node 
(Dell 2950s with varying amounts of memory) services grid that acts as a 
clustered Xen host. The Xen machines on the services grid include a 
clustered JBoss application server, a Metamatrix node, and many other 
Xen machines that perform various mission-specific tasks. Both the 
database grid and the services grid can leverage the storage on the 
Fiber Channel fabric backed with multiple storage area networks (SANs)."

Their setup is pretty slick.


> This includes a separate virtual server for the following services: 
> squid, ldap, mail server, dns, samba, email and a handful of 
> Xservers/application servers.
>
> The specific hardware includes:
>
> 8 Dell 19150 nodes w/ rh5.2 xen
> Each node has 32GB of ram and 8 cores (xeon 2.66)
>
> EMC San CX-310
> 2 Brocade 5000 fibre switches
>
> Any feedback from anyone attempting or currently running a similar 
> solution (virtualization on top of cluster) would be greatly appreciated.
So, what's the exact problem?



From schlegel at riege.com  Fri Jan 23 16:03:13 2009
From: schlegel at riege.com (Gunther Schlegel)
Date: Fri, 23 Jan 2009 17:03:13 +0100
Subject: [Linux-cluster] vm.sh: vm services depend on xend
Message-ID: <4979EA41.9000709@riege.com>

Hi,


The service script 'vm.sh' gathers the vm service status using the 'xm' 
command, however 'xm' relies on xend for proper operation. If xend is 
down, bad things happen, up to destroying the VM.

I would have filed this issue with RH support, but I feel the solution 
to this problem requires some qualified thinking in the first place.



What happened:

(Environment: production 4 node Xen / RHEL 5.2 cluster running 30+ pv 
guests, Nagios monitoring, VM services configured to "Restart" failover)

a) xenconsoled died (this happens from time to time, monitored by Nagios).

b) Operations guy ran "service xend restart" to bring xenconsoled back 
up. The restart operation implies that xend is down for a short period 
of time.

c) rgmanager checked 3 VMs within the time frame xend was down. In vm.sh

xm list $OCF_RESKEY_name &> /dev/null

failed as xm could not communicate with xend. As a result rgmanager 
tried to stop and restart these 3 VMs. As the time frame without xend 
running has been quite short, xend was up again at the time rgmanager 
ran "vm.sh stop" on the 3 VMs, therefore the 3 VMs were shut down 
properly and came up afterwards.


This had been bad enough, but in fact we had been lucky, as I learned 
when replaying the issue in our test environment. A notable difference 
is that the test cluster is set to "Relocate" service recovery at the 
moment. I also had to shut down xend for the test, so it was down 
significantly longer than on the production cluster.

Background information on xend: xend is not required for Xen VMs to run, 
it is only required to control VMs. Restarting xend while VMs are 
running is a safe operation.


As a result of the longer xend downtime, "vm.sh stop" could not shut 
down the VM, as the stop operation again uses 'xm' to communicate with xend.

Afterwards rgmanager started the VM on another cluster node, where it 
came up perfectly  well.

But the VM has never been shut down on the cluster node not running 
xend. As a result the VM (which is installed on shared storage) was 
running twice on two different nodes and the ext3-filesystems had been 
mounted rw by both VM instances.

Any production server's filesystems would not have survived this for 
more than a couple of seconds. So there is the risk of severe damage 
here, especially as "relocate" is the default failover configuration.



As a workaround I propose to change xm.sh:

status()
{
+       xm info &> /dev/null || return 0
         xm list $OCF_RESKEY_name &> /dev/null
         if [ $? -eq 0 ]; then
                 return 0
         fi
         xm list migrating-$OCF_RESKEY_name &> /dev/null
         return $?
}


Though: this is not good enough. xend may vanish between 'xm info' and 
'xm list', leading to the described scenario.

Therefore xend should be a cluster service. The VM services would have 
to depend in the xend service. If a VM fails rgmanager would have to 
additionally check xend, and only act on the VM if xend has not failed 
and the VM fails a second test (xend may have just come up again, so we 
need to retest the VM).

If a VM has failed and it turns out that xend has failed as well, 
rgmanager should try to reactivate xend.

If xend cannot be started, the cluster node has to be fenced. As xend is 
not required for VMs to run, the VMs may be perfectly fine and must niot 
be restarted on another node unless they are guaranteed to be down.


Any comment is welcome.

best regards, Gunther

-- 
Gunther Schlegel
Manager IT Infrastructure


.............................................................
Riege Software International GmbH  Fon: +49 (2159) 9148 0
Mollsfeld 10                       Fax: +49 (2159) 9148 11
40670 Meerbusch                    Web: www.riege.com
Germany                            E-Mail: schlegel at riege.com
---                                ---
Handelsregister:                   Managing Directors:
Amtsgericht Neuss HRB-NR 4207      Christian Riege
USt-ID-Nr.: DE120585842            Gabriele  Riege
                                   Johannes  Riege
.............................................................
           YOU CARE FOR FREIGHT, WE CARE FOR YOU          



-------------- next part --------------
A non-text attachment was scrubbed...
Name: schlegel.vcf
Type: text/x-vcard
Size: 346 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090123/5e961e6a/attachment.vcf>

From tiagocruz at forumgdh.net  Fri Jan 23 17:18:10 2009
From: tiagocruz at forumgdh.net (Tiago Cruz)
Date: Fri, 23 Jan 2009 14:18:10 -0300
Subject: [Linux-cluster] Fork Bomb breaking the cluster
Message-ID: <1232731090.11352.5.camel@tuxkiller.ig.com.br>

Hello Guys,

One simple shell fork bomb was breaking all the cluster (11 virtual
machines).

This not solve:
http://www.cyberciti.biz/tips/linux-limiting-user-process.html

(If I run as root, the virtual machine goes down, the I/O on GFS stop
and the cluster does not fence the virtual machine automatic).

The I/O return only if I fence_node manually.

PS: IF the fork bomb as write in C, does not break the cluster. Only the
Shell one...

[root at cluster-11 ~]# ulimit -a | grep proc
max user processes              (-u) 512
 
[root at cluster-11 ~]# cat fork.c 
#include <unistd.h>
 
int main(int argc, char* argv[])
{
  while(1)
    fork();
  return 0;
}

Fork Shell: # :(){ :|:& };:

Thanks a lot!

-- 
Tiago Cruz
http://everlinux.com
Linux User #282636




From paolom at prisma-eng.it  Fri Jan 23 17:04:00 2009
From: paolom at prisma-eng.it (Paolo Marini)
Date: Fri, 23 Jan 2009 18:04:00 +0100 (CET)
Subject: [Linux-cluster] virtualization on top of Red Hat cluster
In-Reply-To: <4979E197.3080605@serioustechnology.com>
References: <4979E197.3080605@serioustechnology.com>
Message-ID: <4622.83.103.117.254.1232730240.squirrel@webmail.kpnqwest.it>

We have a conceptually similar Cluster with cehaper HW working since one
year.

1. 2 HP machines with RAID HW and multiple disks providing iscsi with
channel bonding, one for the working GFS fs and the other one for backup
GFS, SW is openfiler 2.2, ethernet channel bonding

2. 3 Dell machines, single Xeon quad core, running Centos 5.2 with
ethernet channel bonding and acting as xen servers in a single cluster

3. Services are implemented on a separated cluster, running on XEN virtual
machines on top of the real HW cluster. XEN virtual machines are loaded
from GFS (to be able to recover/relocate virtual machines)

4. XEN virutal machines mount a GFS filesystem (single filesystem ->
multiple access -> single backup -> better disk optimization !!!) and
provide their services: samba, dns, nfs, applications like wikis and SVN
repositories

5. Recovery is done via IPMI fencing, available on most pizza boxes, for
real HW cluster.

6. Recovery is done via XEN VM fencing on virtualized cluster

May be performance could be a bit better, but nothing more could be asked
with the cheap HW. The idea is that the whole cluster budget is less than
the price of a SAN box.

There is a nice tutorial on the RedHat magazine on the implementation of a
similar cluster, with lot of user comments.

Ciao, Paolo

> We would like to find out if there are other folks attempting to do what
> we are trying to do.  We are having all kinds of problems.
>
> We are trying to set up an 8 node cluster and run a number of virtual
> machines on top of this cluster.  The boxes are running Red Hat 5 and we
> are using Xen.
>
> Before going into the actual issues we are running into, I thought we
> should first find out if there are others who are doing this, or
> attempting to do this.
>
> This includes a separate virtual server for the following services:
> squid, ldap, mail server, dns, samba, email and a handful of
> Xservers/application servers.
>
> The specific hardware includes:
>
> 8 Dell 19150 nodes w/ rh5.2 xen
> Each node has 32GB of ram and 8 cores (xeon 2.66)
>
> EMC San CX-310
> 2 Brocade 5000 fibre switches
>
> Any feedback from anyone attempting or currently running a similar
> solution (virtualization on top of cluster) would be greatly appreciated.
>
> --
> Until later, Geoffrey
>
> Those who would give up essential Liberty, to purchase a little
> temporary Safety, deserve neither Liberty nor Safety.
>   - Benjamin Franklin
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>




From carlopmart at gmail.com  Fri Jan 23 17:41:43 2009
From: carlopmart at gmail.com (carlopmart)
Date: Fri, 23 Jan 2009 18:41:43 +0100
Subject: [Linux-cluster] Problems setting up a three node cluster
Message-ID: <497A0157.9040407@gmail.com>

Hi all,

  I have several problems to setup a three nodes cluster under rhel5.3. Under 
some circumstances, I need to startup only one node, but clustat every time 
shows me: "Member Status: Inquorate". Another times, when I startup the third 
node, all cluster nodes go offline.

  I use quorum disk to accomplish this pourpose, but what is wrong on my 
cluster.conf??

  Many thanks.

-- 
CL Martinez
carlopmart {at} gmail {d0t} com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: text/xml
Size: 3026 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090123/4a9d76f2/attachment.xml>

From schlegel at riege.com  Fri Jan 23 19:19:57 2009
From: schlegel at riege.com (Gunther Schlegel)
Date: Fri, 23 Jan 2009 20:19:57 +0100
Subject: [Linux-cluster] virtualization on top of Red Hat cluster
In-Reply-To: <4979E197.3080605@serioustechnology.com>
References: <4979E197.3080605@serioustechnology.com>
Message-ID: <497A185D.10800@riege.com>



Geoffrey wrote:

> Before going into the actual issues we are running into, I thought we 
> should first find out if there are others who are doing this, or 
> attempting to do this.
> 
> This includes a separate virtual server for the following services: 
> squid, ldap, mail server, dns, samba, email and a handful of 
> Xservers/application servers.
> 
> The specific hardware includes:
> 
> 8 Dell 19150 nodes w/ rh5.2 xen
> Each node has 32GB of ram and 8 cores (xeon 2.66)

you need a 64-bit distro to have xen support more than 16GB RAM.

> EMC San CX-310
> 2 Brocade 5000 fibre switches
> 
> Any feedback from anyone attempting or currently running a similar 
> solution (virtualization on top of cluster) would be greatly appreciated.

We are running something like that for more than a year now (tests on 
rhel5.0, live on rhel5.1, now running rhel5.2).

45+ paravirtualized VMs on 2 clusters.
VMs are located on clustered logical volumes, not gfs.
gfs is used for /etc/xen, though.

performance is good, even with io-intensive apps inside the VMs.
stability is fair (but improving over time).

The main issue has always been the cluster loosing quorum without 
apparent reason. Improved after we added a Quorum Disk. Improved further 
after RH Support recommended a couple of not so well documented ;) 
configuration parameters, which also cannot be maintained with 
conga/luci/ricci or system-config-cluster. Improved even further after 
we complained again and RH came up with even more undocumented setting 
to solve the race-conditions we experienced when a node left the cluster 
(even intentionally).

Another big issue was live migration, turned out that the bridge in Dom0 
  has a default forward delay of 15 seconds. We may have experienced 
that because we use a different xen network-script than delivered by RH. 
Though, the original one could not deal with neither bonding nor Vlans, 
so we had to do that. This has changed according to the rhel5.3 
changelog, but we have not tested it yet.

We also had issues with the DomU time jumping around after live 
migration, but RH fixed that recently and as a workaround one could run 
ntpd inside the DomU.


best regards, Gunther

-- 
Gunther Schlegel
Manager IT Infrastructure


.............................................................
Riege Software International GmbH  Fon: +49 (2159) 9148 0
Mollsfeld 10                       Fax: +49 (2159) 9148 11
40670 Meerbusch                    Web: www.riege.com
Germany                            E-Mail: schlegel at riege.com
---                                ---
Handelsregister:                   Managing Directors:
Amtsgericht Neuss HRB-NR 4207      Christian Riege
USt-ID-Nr.: DE120585842            Gabriele  Riege
                                   Johannes  Riege
.............................................................
           YOU CARE FOR FREIGHT, WE CARE FOR YOU          



-------------- next part --------------
A non-text attachment was scrubbed...
Name: schlegel.vcf
Type: text/x-vcard
Size: 346 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090123/60f29a28/attachment.vcf>

From lists at serioustechnology.com  Fri Jan 23 20:43:02 2009
From: lists at serioustechnology.com (Geoffrey)
Date: Fri, 23 Jan 2009 15:43:02 -0500
Subject: [Linux-cluster] virtualization on top of Red Hat cluster
In-Reply-To: <497A185D.10800@riege.com>
References: <4979E197.3080605@serioustechnology.com> <497A185D.10800@riege.com>
Message-ID: <497A2BD6.6080105@serioustechnology.com>

Gunther Schlegel wrote:
> 
> 
> Geoffrey wrote:
> 
>> Before going into the actual issues we are running into, I thought we 
>> should first find out if there are others who are doing this, or 
>> attempting to do this.
>>
>> This includes a separate virtual server for the following services: 
>> squid, ldap, mail server, dns, samba, email and a handful of 
>> Xservers/application servers.
>>
>> The specific hardware includes:
>>
>> 8 Dell 19150 nodes w/ rh5.2 xen
>> Each node has 32GB of ram and 8 cores (xeon 2.66)
> 
> you need a 64-bit distro to have xen support more than 16GB RAM.

We do.

> 
>> EMC San CX-310
>> 2 Brocade 5000 fibre switches
>>
>> Any feedback from anyone attempting or currently running a similar 
>> solution (virtualization on top of cluster) would be greatly appreciated.
> 
> We are running something like that for more than a year now (tests on 
> rhel5.0, live on rhel5.1, now running rhel5.2).
> 
> 45+ paravirtualized VMs on 2 clusters.
> VMs are located on clustered logical volumes, not gfs.
> gfs is used for /etc/xen, though.

Yes, we are using lvm as well.

> performance is good, even with io-intensive apps inside the VMs.
> stability is fair (but improving over time).

This is good to hear.

> The main issue has always been the cluster loosing quorum without 
> apparent reason. Improved after we added a Quorum Disk.

We are using Quorum disk as well.

> Improved further 
> after RH Support recommended a couple of not so well documented ;) 
> configuration parameters, which also cannot be maintained with 
> conga/luci/ricci or system-config-cluster. Improved even further after 
> we complained again and RH came up with even more undocumented setting 
> to solve the race-conditions we experienced when a node left the cluster 
> (even intentionally).

Any possible way you can share these undocumented settings with us?

> Another big issue was live migration, turned out that the bridge in Dom0 
>  has a default forward delay of 15 seconds. We may have experienced that 
> because we use a different xen network-script than delivered by RH. 
> Though, the original one could not deal with neither bonding nor Vlans, 
> so we had to do that. This has changed according to the rhel5.3 
> changelog, but we have not tested it yet.
> 
> We also had issues with the DomU time jumping around after live 
> migration, but RH fixed that recently and as a workaround one could run 
> ntpd inside the DomU.
> 
> 
> best regards, Gunther
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


-- 
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin



From carlopmart at gmail.com  Fri Jan 23 21:22:59 2009
From: carlopmart at gmail.com (carlopmart)
Date: Fri, 23 Jan 2009 22:22:59 +0100
Subject: [Linux-cluster] Re: Problems setting up a three node cluster
In-Reply-To: <497A0157.9040407@gmail.com>
References: <497A0157.9040407@gmail.com>
Message-ID: <497A3533.5050504@gmail.com>

carlopmart wrote:
> Hi all,
> 
>  I have several problems to setup a three nodes cluster under rhel5.3. 
> Under some circumstances, I need to startup only one node, but clustat 
> every time shows me: "Member Status: Inquorate". Another times, when I 
> startup the third node, all cluster nodes go offline.
> 
>  I use quorum disk to accomplish this pourpose, but what is wrong on my 
> cluster.conf??
> 
>  Many thanks.
> 

Please, any hints??

-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From vcmarti at sph.emory.edu  Fri Jan 23 22:00:57 2009
From: vcmarti at sph.emory.edu (Vernard C. Martin)
Date: Fri, 23 Jan 2009 17:00:57 -0500
Subject: [Linux-cluster] Trouble adding back in an old node
Message-ID: <497A3E19.6070904@sph.emory.edu>

I'm running Centos 5.2 and using the the cluster suite + GFS1. I have an 
EMC CX600 providing shared storage to some LUNs. Im using broacde port 
fencing.

I'm experiencing a problem trying to add a previously removed node back 
into the cluster. The node was having hardare RAM issues so it was 
removed from the cluster completely (i.e. removed from the cluster.conf 
and removed from the storage zoning as well).  I then added 3 more nodes 
to the cluster. Now that the bad RAM has been identified and removed, I 
wanted to add the node back in. I followed the instructions that I had 
used on the previous 3 nodes (i.e. used system-config-cluster to 
configure the node, save and propagate the cluster.conf, manually 
propagate the cluster.conf to the newly added node, and then start up 
cman and clvmd). However when I tried to start up cman with "service 
cman start". The process hangs when actually starting up cman. I did 
some digging and in the /var/log/messages of the node I'm attempting to 
add, I get the following:

Jan 23 15:41:39 node004 ccsd[9342]: Initial status:: Inquorate
Jan 23 15:41:40 node004 ccsd[9342]: Cluster is not quorate.  Refusing 
connection.
Jan 23 15:41:40 node004 ccsd[9342]: Error while processing connect: 
Connection refused
Jan 23 15:41:45 node004 ccsd[9342]: Cluster is not quorate.  Refusing 
connection.
Jan 23 15:41:45 node004 ccsd[9342]: Error while processing connect: 
Connection refused
Jan 23 15:41:50 node004 ccsd[9342]: Cluster is not quorate.  Refusing 
connection.
Jan 23 15:41:50 node004 ccsd[9342]: Error while processing connect: 
Connection refused

I suspect that this is at least part of the problem. However, I'm a bit 
confused because the cluster its attempting to join is most definitely 
quorate.  At least according to clustat -f

Cluster Status for rsph_centos_5 @ Fri Jan 23 17:00:45 2009
Member Status: Quorate

 Member Name                                                  ID   Status
 ------ ----                                                  ---- ------
 head1.clus.sph.emory.edu                                         1 
Online, Local
 node002.clus.sph.emory.edu                                       2 Online
 node003.clus.sph.emory.edu                                       3 Online
 node004.clus.sph.emory.edu                                       4 Offline
 node005.clus.sph.emory.edu                                       5 Online
 node006.clus.sph.emory.edu                                       6 Online
 node007.clus.sph.emory.edu                                       7 Online


I'm thinking that there is something subtlet that I am missing that I 
can change to make this work. I really don't want to have to re-install 
and reconfigure the machine to get this to work. That is something that 
you do in the Windows world :-)


So here is my cluster.conf file. Passwords changed to protect the guilty.

<?xml version="2.0"?>
<cluster alias="rsph_centos_5" config_version="41" name="rsph_centos_5">
        <fence_daemon clean_start="1" post_fail_delay="30" 
post_join_delay="90"/>
        <clusternodes>
                <clusternode name="head1.clus.sph.emory.edu" nodeid="1" 
votes="7">
                        <fence>
                                <method name="1">
                                        <device 
name="sanclusa1.sph.emory.edu" port="1"/>
                                        <device 
name="sanclusb1.sph.emory.edu" port="1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node002.clus.sph.emory.edu" 
nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device 
name="sanclusa1.sph.emory.edu" port="2"/>
                                        <device 
name="sanclusb1.sph.emory.edu" port="2"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node003.clus.sph.emory.edu" 
nodeid="3" votes="1">
                        <fence>
                                <method name="1">
                                        <device 
name="sanclusa1.sph.emory.edu" port="3"/>
                                        <device 
name="sanclusb1.sph.emory.edu" port="3"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node005.clus.sph.emory.edu" 
nodeid="5" votes="1">
                        <fence>
                                <method name="1">
                                        <device 
name="sanclusa1.sph.emory.edu" port="5"/>
                                        <device 
name="sanclusb1.sph.emory.edu" port="5"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node006.clus.sph.emory.edu" 
nodeid="6" votes="1">
                        <fence>
                                <method name="1">
                                        <device 
name="sanclusa1.sph.emory.edu" port="6"/>
                                        <device 
name="sanclusb1.sph.emory.edu" port="6"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node007.clus.sph.emory.edu" 
nodeid="7" votes="1">
                        <fence>
                                <method name="1">
                                        <device 
name="sanclusa1.sph.emory.edu" port="7"/>
                                        <device 
name="sanclusb1.sph.emory.edu" port="7"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node004.clus.sph.emory.edu" 
nodeid="4" votes="1">
                        <fence>
                                <method name="1">
                                        <device 
name="sanclusa1.sph.emory.edu" port="4"/>
                                        <device 
name="sanclusb1.sph.emory.edu" port="4"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices>
                <fencedevice agent="fence_brocade" 
ipaddr="170.140.183.87" login="admin" name="sanclusa1.sph.emory.edu" 
passwd="mypasshere"/>
                <fencedevice agent="fence_brocade" 
ipaddr="170.140.183.88" login="admin" name="sanclusb1.sph.emory.edu" 
passwd="mypasshere"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>



From burton at simondsfamily.com  Sat Jan 24 14:53:06 2009
From: burton at simondsfamily.com (Burton S Simonds)
Date: Sat, 24 Jan 2009 09:53:06 -0500
Subject: [Linux-cluster] Ip Settings
In-Reply-To: <49792E2C.8080005@epits.com.au>
References: <16451199.601601232638594927.JavaMail.servlet@perfora>
	<49792E2C.8080005@epits.com.au>
Message-ID: <741E5E8A-DE43-4936-992D-EDBED3B07E7D@simondsfamily.com>

Stewart -
Thanks for the info, this was exactly what I was looking for.  I am  
working closing with our network engineer on this project, so I will  
be sharing this info with her and working out a solution.

B
On Jan 22, 2009, at 9:40 PM, Stewart Walters wrote:

> burton at simondsfamily.com wrote:
>> I would like some advice on setting up a High  Availability  
>> solution using RHCS with Apache.
>>
>> I would like my clustered nodes to communicate on vlan, and the  
>> client connections to come through another vlan.   Presently, I  
>> have 2 interfaces on each of 2 nodes.  1 interface on each vlan.  I  
>> also have a vip address for the clients to connect to for the  
>> apache server.
>>
>> node a:
>> xxx.xxx.100.1  eth0 (used for client connections)
>> xxx.xxx.200.1  eth1 (used for interconnect, cluster communication)
>>
>> node b
>> xxx.xxx.100.2  eth0 (used for client connections)
>> xxx.xxx.200.2  eth1 (used for interconnect, cluster communication)
>>
>> xxx.xxx.100.100 address for clients to connect to apache.
>>
>>
>> My question is how do I configure the apache service (and / or the  
>> resources)  to use client interfaces using the vip address?
>>
>> If i need to provide more information, please ask.  This is my  
>> first go at setting up a cluster.
>>
>> Thank you,
>> B
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> One way to do this is similar to a way that I have done on previous  
> customer.  We had several different NICs each with a VLANed IP and  
> we used differing DNS names attached to each NIC to allow the  
> traffic to be routed via particular NICs.
>
> So you might want the following in DNS:
>
> nodea.example.com = xxx.xxx.100.1
> nodea-cluster.example.com = xxx.xxx.200.1
> nodeb.example.com = xxx.xxx.100.2
> nodeb-cluster.example.com = xxx.xxx.200.2
>
> In the previous installation, we had to also put in place some  
> advanced IP routing policies through the iproute2 package.  The  
> trick is to send any traffic received on the eth1 that back out via  
> eth1 - not via the default route (which would be eth0).
>
> Linux Journal has an article on how to do this titled "Overcoming  
> Asymmetric Routing on Multi-Homed Servers" found at http://www.linuxjournal.com/article/7291 
> .
>
> Then you just define in your cluster.conf that the two nodes are  
> nodea-cluster.example.com and nodeb-cluster.example.com instead of  
> nodea.example.com etc. etc.
>
> We ran in to our fair share of network traffic problems with this  
> sort of configuration due to the complexity of the network we were  
> working with but once we had tuned it, it worked quite well.
>
> If you can, I'd recommend you make yourself friendly with top-gun  
> network engineer to help you assist you with networking issues  
> should they arise. They can be invaluable when troubleshooting  
> something that doesn't work :-)
>
> Regards
>
> Stewart
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From moya at latertulia.org  Mon Jan 26 05:01:21 2009
From: moya at latertulia.org (Maykel Moya)
Date: Mon, 26 Jan 2009 00:01:21 -0500
Subject: [Linux-cluster] Help setting up failoverdomain for 4 resources in a
	4 nodes cluster
Message-ID: <1232946081.16497.53.camel@emma>

After reading some mails about GFS oddities for serving Maildir I
decided to split the whole-Maildir-data into 4 ext3 filesystems and use
4 floating IPs for communication with imap proxies.

So I have identified 4 services each one composed of a filesystem and a
virtual ip. My config sketch is like:

<cluster>
  ...
  <rm>
    <service name="mdir-part1-svc" autostart="1">
      <fs name="mdir-part1-fs" mountpoint="/mdir/part1"
          device="/dev/disk/by-id/dm-uuid-part1-mpath-......."
          options="noatime"/>
      <ip address="10.100.100.1"/>
    </service>

    <service name="mdir-part2-svc" ...
      ...
    </service>

    <service name="mdir-part3-svc" ...
      ...
    </service>

    <service name="mdir-part4-svc" ...
      ...
    </service>
  </rm
</cluster>

So I have mdir-part{1,2,3,4}-svc services and node{1,2,3,4} nodes. My
doubt is how to setup the failoverdomain stanzas so mdir-part1-svc will
be served using node1 preferably, mdir-part2-svc preferably on node2 and
so on.

Regards,
maykel




From carlopmart at gmail.com  Mon Jan 26 08:51:25 2009
From: carlopmart at gmail.com (carlopmart)
Date: Mon, 26 Jan 2009 09:51:25 +0100
Subject: [Linux-cluster] Node2 kills node1 when it is booting ...
Message-ID: <497D798D.20905@gmail.com>

Hi all,

  I need to setup another rhcs today with two nodes. But every times that I 
start second node, node1 returns this error:

cman killed by node 2 because we rejoined the cluster without a full restart

  .. and cman stops on node1. Why?? I didn't find any solution under 
http://sources.redhat.com/cluster/wiki/FAQ/

  My nodes are rhel5.3

  Many thanks.

-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From nick at javacat.f2s.com  Mon Jan 26 09:24:20 2009
From: nick at javacat.f2s.com (nick at javacat.f2s.com)
Date: Mon, 26 Jan 2009 09:24:20 +0000
Subject: [Linux-cluster] Directories with >100K files
In-Reply-To: <64D0546C5EBBD147B75DE133D798665F021B9C71@hugo.eprize.local>
References: <1232446761.4975a5293c53c@webmail.freedom2surf.net><64D0546C5EBBD147B75DE133D798665F021B9C1F@hugo.eprize.local><1232532607.3578.8.camel@localhost.localdomain><1232533922.4976f9a21fce1@webmail.freedom2surf.net>
	<1232544555.4977232bbfd47@webmail
	<64D0546C5EBBD147B75DE133D798665F021B9C71@hugo.eprize.local>
Message-ID: <1232961860.497d814456e4c@webmail.freedom2surf.net>

Hi Jeff

Quoting Jeff Sturm <jeff.sturm at eprize.com>:

> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of
> > nick at javacat.f2s.com
> > Sent: Wednesday, January 21, 2009 8:29 AM
> > To: linux clustering
> > Subject: RE: [Linux-cluster] Directories with >100K files
> >
> > What is the way forward now ? I've got users complaining left
> > right and centre. Should I ditch GFS and use NFS ?
>
> You've hit an area where GFS doesn't work so well.  I don't know if NFS
> will be much better--others with more experience may know.  (For our
> application we solely use GFS over other shared filesystem technologies
> because we require strict posix locking.)
>
> Your options seem to be:
>
> A) Limit FS activity to as few nodes as possible.  (Does it perform
> suitably when mounted on only a single node?)
>
> B) Crank up demote_secs, an hour or more, until it either relieves your
> problem, or cripples the system because too many locks are held too
> long.  (I have a filesystem here with demote_secs=86400 so we can get
> generally good rsync performance with over 50,000 file/directory
> entries.)
>
> C) Use some alternative to GFS.
>
> Sorry if there's not a better answer.

I'm going to have to just keep working at this to see what we can do.
If we get a fix I'll post back.

Thanks for your help.

Nick.








From fdinitto at redhat.com  Mon Jan 26 11:31:47 2009
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 26 Jan 2009 12:31:47 +0100
Subject: [Linux-cluster] STABLE2 branch lifetime
Message-ID: <1232969507.10253.7.camel@cerberus.int.fabbione.net>

Hi all,

as we move forward in our development and produce new and stable code,
we also face the big and time consuming challenge to maintain old code.
This overhead can't last forever as we simply don't have the resources
to test all possible combinations.

It's time to set some expectations on what will happen to our STABLE2
branch that is maintained for our community.

The STABLE2 branch will be maintained and supported by Red Hat
developers for 6 months after STABLE3 branch will be declared
"production ready" (this should happen in approx 2 months from now, that
means STABLE2 will see EOS in about 8 months - dates to be confirmed).

As I am sure everybody knows, we welcome every contributor to take over
this branch at any time. It means that STABLE2 branch can continue to
live as long as there will be people taking care of it. Of course we
commit to assist any new contributor to become familiar with the
on-going maintain process and release management.

Cheers,
Fabio


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090126/7ff27374/attachment.sig>

From pillai at mathstat.dal.ca  Mon Jan 26 11:56:33 2009
From: pillai at mathstat.dal.ca (Balagopal Pillai)
Date: Mon, 26 Jan 2009 07:56:33 -0400
Subject: [Linux-cluster] virtualization on top of Red Hat cluster
In-Reply-To: <4979E197.3080605@serioustechnology.com>
References: <4979E197.3080605@serioustechnology.com>
Message-ID: <497DA4F1.4060300@mathstat.dal.ca>

It is a good idea to also check out whether there are any issues with 
running these services in a cluster in an active-active configuration.
For example with samba, if you are going to serve the same gfs volume as 
an smb volume using multiple vm's, then you would need
to use the cluster implementation - http://ctdb.samba.org/

Balagopal

Geoffrey wrote:
> We would like to find out if there are other folks attempting to do 
> what we are trying to do.  We are having all kinds of problems.
>
> We are trying to set up an 8 node cluster and run a number of virtual 
> machines on top of this cluster.  The boxes are running Red Hat 5 and 
> we are using Xen.
>
> Before going into the actual issues we are running into, I thought we 
> should first find out if there are others who are doing this, or 
> attempting to do this.
>
> This includes a separate virtual server for the following services: 
> squid, ldap, mail server, dns, samba, email and a handful of 
> Xservers/application servers.
>
> The specific hardware includes:
>
> 8 Dell 19150 nodes w/ rh5.2 xen
> Each node has 32GB of ram and 8 cores (xeon 2.66)
>
> EMC San CX-310
> 2 Brocade 5000 fibre switches
>
> Any feedback from anyone attempting or currently running a similar 
> solution (virtualization on top of cluster) would be greatly appreciated.
>



From fdinitto at redhat.com  Mon Jan 26 12:05:13 2009
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 26 Jan 2009 13:05:13 +0100
Subject: [Linux-cluster] Cluster IRC meeting - Monday 2nd of Feb 2pm UTC/GMT
Message-ID: <1232971513.10253.16.camel@cerberus.int.fabbione.net>

Hi everybody,

When  : Monday 2nd of Feb 2pm UTC/GMT (*)(**)
Where : irc.freenode.net #linux-cluster
Who   : everybody interested is invited to participate
Agenda: http://sources.redhat.com/cluster/wiki/Meetings/2009-Feb-02

This is our second IRC meeting and it is supposed to last approx. 1
hour. Be prepared that it might last longer.

In order to give more space to open discussion please prepare yourself:

- everybody can add topics to the Agenda in the "Voice to the community"
  section.
- prepare a 5/10 lines summary for your topic before the meeting. Typing
  realtime takes too long.

- How it should work -

For each item on the agenda:

- fabbione (me on IRC) will make a call on the person that should talk
  about the item.
- The intersted person will copy/paste her/his own summary.
- Discussion/question on the specific item. If the discussion takes
  longer than 5 minutes, we will postpone it at the end of the meeting.
  This is to allow everybody to cast her/his own voice in the agenda.
  The meeting moderator will call on the time.

Some golden rules:

- please respect the person that is talking and give her/him time to
  complete what she/he is saying. Communication on IRC can be slow.
- Avoid to interrupt the discussion with Off-Topic items. We don't
  want to use moderation powers if we can avoid it.
- if, for any reason, your internet connection drops during the meeting,
  please message me and I will copy paste the last bits of the
  conversation.

After the meeting:

- please update the wiki for Actions (in case there are any to take)
- mail me with comments/suggestions on how was the meeting. It is
  perfectly ok to say "it's really bad" but it would also be nice to 
  know if it was a good experience.

Fabio

(*) To convert from UTC/GMT to your local time:
http://www.timeanddate.com/worldclock/converter.html
http://www.timezoneconverter.com/cgi-bin/tzc.tzc

(**) some tools report UTC, others GMT.
For our simple usage of time, they can be considered the same. 
http://en.wikipedia.org/wiki/Coordinated_Universal_Time for the curious 
people out there.

--
I'm going to make him an offer he can't refuse.




From lists at serioustechnology.com  Mon Jan 26 12:59:53 2009
From: lists at serioustechnology.com (Geoffrey)
Date: Mon, 26 Jan 2009 07:59:53 -0500
Subject: [Linux-cluster] virtualization on top of Red Hat cluster
In-Reply-To: <497DA4F1.4060300@mathstat.dal.ca>
References: <4979E197.3080605@serioustechnology.com>
	<497DA4F1.4060300@mathstat.dal.ca>
Message-ID: <497DB3C9.8090409@serioustechnology.com>

Balagopal Pillai wrote:
> It is a good idea to also check out whether there are any issues with 
> running these services in a cluster in an active-active configuration.
> For example with samba, if you are going to serve the same gfs volume as 
> an smb volume using multiple vm's, then you would need
> to use the cluster implementation - http://ctdb.samba.org/

We are not using gfs.

> 
> Balagopal
> 
> Geoffrey wrote:
>> We would like to find out if there are other folks attempting to do 
>> what we are trying to do.  We are having all kinds of problems.
>>
>> We are trying to set up an 8 node cluster and run a number of virtual 
>> machines on top of this cluster.  The boxes are running Red Hat 5 and 
>> we are using Xen.
>>
>> Before going into the actual issues we are running into, I thought we 
>> should first find out if there are others who are doing this, or 
>> attempting to do this.
>>
>> This includes a separate virtual server for the following services: 
>> squid, ldap, mail server, dns, samba, email and a handful of 
>> Xservers/application servers.
>>
>> The specific hardware includes:
>>
>> 8 Dell 19150 nodes w/ rh5.2 xen
>> Each node has 32GB of ram and 8 cores (xeon 2.66)
>>
>> EMC San CX-310
>> 2 Brocade 5000 fibre switches
>>
>> Any feedback from anyone attempting or currently running a similar 
>> solution (virtualization on top of cluster) would be greatly appreciated.
>>
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


-- 
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin



From pillai at mathstat.dal.ca  Mon Jan 26 13:13:21 2009
From: pillai at mathstat.dal.ca (Balagopal Pillai)
Date: Mon, 26 Jan 2009 09:13:21 -0400
Subject: [Linux-cluster] virtualization on top of Red Hat cluster
In-Reply-To: <497DB4F0.2060602@serioustechnology.com>
References: <4979E197.3080605@serioustechnology.com>
	<497DA4F1.4060300@mathstat.dal.ca>
	<497DB4F0.2060602@serioustechnology.com>
Message-ID: <497DB6F1.2000508@mathstat.dal.ca>

As long as you are not trying to serve the same shared file system 
through the same service, like samba
for example through multiple vm's for load balancing and/or redundancy, 
this shouldn't apply.

Balagopal

Geoffrey wrote:
> Balagopal Pillai wrote:
>> It is a good idea to also check out whether there are any issues with 
>> running these services in a cluster in an active-active configuration.
>> For example with samba, if you are going to serve the same gfs volume 
>> as an smb volume using multiple vm's, then you would need
>> to use the cluster implementation - http://ctdb.samba.org/
>
> Let me try this again, as I fat fingered the send...
>
> We are not using gfs, does this still apply?
>
>>
>> Balagopal
>>
>> Geoffrey wrote:
>>> We would like to find out if there are other folks attempting to do 
>>> what we are trying to do.  We are having all kinds of problems.
>>>
>>> We are trying to set up an 8 node cluster and run a number of 
>>> virtual machines on top of this cluster.  The boxes are running Red 
>>> Hat 5 and we are using Xen.
>>>
>>> Before going into the actual issues we are running into, I thought 
>>> we should first find out if there are others who are doing this, or 
>>> attempting to do this.
>>>
>>> This includes a separate virtual server for the following services: 
>>> squid, ldap, mail server, dns, samba, email and a handful of 
>>> Xservers/application servers.
>>>
>>> The specific hardware includes:
>>>
>>> 8 Dell 19150 nodes w/ rh5.2 xen
>>> Each node has 32GB of ram and 8 cores (xeon 2.66)
>>>
>>> EMC San CX-310
>>> 2 Brocade 5000 fibre switches
>>>
>>> Any feedback from anyone attempting or currently running a similar 
>>> solution (virtualization on top of cluster) would be greatly 
>>> appreciated.
>>>
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>



From lists at serioustechnology.com  Mon Jan 26 13:04:48 2009
From: lists at serioustechnology.com (Geoffrey)
Date: Mon, 26 Jan 2009 08:04:48 -0500
Subject: [Linux-cluster] virtualization on top of Red Hat cluster
In-Reply-To: <497DA4F1.4060300@mathstat.dal.ca>
References: <4979E197.3080605@serioustechnology.com>
	<497DA4F1.4060300@mathstat.dal.ca>
Message-ID: <497DB4F0.2060602@serioustechnology.com>

Balagopal Pillai wrote:
> It is a good idea to also check out whether there are any issues with 
> running these services in a cluster in an active-active configuration.
> For example with samba, if you are going to serve the same gfs volume as 
> an smb volume using multiple vm's, then you would need
> to use the cluster implementation - http://ctdb.samba.org/

Let me try this again, as I fat fingered the send...

We are not using gfs, does this still apply?

> 
> Balagopal
> 
> Geoffrey wrote:
>> We would like to find out if there are other folks attempting to do 
>> what we are trying to do.  We are having all kinds of problems.
>>
>> We are trying to set up an 8 node cluster and run a number of virtual 
>> machines on top of this cluster.  The boxes are running Red Hat 5 and 
>> we are using Xen.
>>
>> Before going into the actual issues we are running into, I thought we 
>> should first find out if there are others who are doing this, or 
>> attempting to do this.
>>
>> This includes a separate virtual server for the following services: 
>> squid, ldap, mail server, dns, samba, email and a handful of 
>> Xservers/application servers.
>>
>> The specific hardware includes:
>>
>> 8 Dell 19150 nodes w/ rh5.2 xen
>> Each node has 32GB of ram and 8 cores (xeon 2.66)
>>
>> EMC San CX-310
>> 2 Brocade 5000 fibre switches
>>
>> Any feedback from anyone attempting or currently running a similar 
>> solution (virtualization on top of cluster) would be greatly appreciated.
>>
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


-- 
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin



From carlopmart at gmail.com  Tue Jan 27 08:41:41 2009
From: carlopmart at gmail.com (carlopmart)
Date: Tue, 27 Jan 2009 09:41:41 +0100
Subject: [Linux-cluster] Re: Node2 kills node1 when it is booting ...
In-Reply-To: <497D798D.20905@gmail.com>
References: <497D798D.20905@gmail.com>
Message-ID: <497EC8C5.7090701@gmail.com>

carlopmart wrote:
> Hi all,
> 
>  I need to setup another rhcs today with two nodes. But every times that 
> I start second node, node1 returns this error:
> 
> cman killed by node 2 because we rejoined the cluster without a full 
> restart
> 
>  .. and cman stops on node1. Why?? I didn't find any solution under 
> http://sources.redhat.com/cluster/wiki/FAQ/
> 
>  My nodes are rhel5.3
> 
>  Many thanks.
> 

Please, I need your help ... Any ideas???

-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From stewart at epits.com.au  Tue Jan 27 09:48:43 2009
From: stewart at epits.com.au (Stewart Walters)
Date: Tue, 27 Jan 2009 18:48:43 +0900
Subject: [Linux-cluster] Re: Node2 kills node1 when it is booting ...
In-Reply-To: <497EC8C5.7090701@gmail.com>
References: <497D798D.20905@gmail.com> <497EC8C5.7090701@gmail.com>
Message-ID: <497ED87B.5000103@epits.com.au>

carlopmart wrote:
> carlopmart wrote:
>> Hi all,
>>
>>  I need to setup another rhcs today with two nodes. But every times 
>> that I start second node, node1 returns this error:
>>
>> cman killed by node 2 because we rejoined the cluster without a full 
>> restart
>>
>>  .. and cman stops on node1. Why?? I didn't find any solution under 
>> http://sources.redhat.com/cluster/wiki/FAQ/
>>
>>  My nodes are rhel5.3
>>
>>  Many thanks.
>>
>
> Please, I need your help ... Any ideas???
>

Sounds like node1 fenced node2, and node2 hasn't been rebooted since 
being fenced. Either that, or node2 uses manual fencing and you haven't 
yet manually acknowledged that it was rebooted.

Check your logs in /var/log/messages on node1, I'm pretty sure you'll 
see a reference there that node2 has been fenced.

You'll probably also see somewhere in the logs on node1, that it 
detected node2 did not leave the cluster after being fenced, and as a 
result node1 itself has decided to stop itself to prevent data 
corruption (the message will be something like that anyway).

If you are using manual fencing on a node2, after you reboot it you need 
to run "fence_manual_ack -n <node2>" from node1.  Do this only after 
you've restarted node2 but before cman starts back up on it in the next 
boot sequence.  At this point node1 will stop fencing node2 and both 
nodes should be able to join the cluster succesfully.

Manual fencing is evil :-)

Try to avoid it if you can - as you'll get this scenario on your cluster 
every time a node is fenced.  This is the reason why Red Hat write in 
their documentation numerous times that manual fencing is not supported 
in Production clusters (it's almost as if they're trying to tell us 
something...). ;-)

Also, you mentioned that the solution was not found in the FAQ.  While 
it might not include reference to this specific symptoms, I'm pretty 
sure the FAQ, the man pages for fence_manual and the RHCS documentation 
from Red Hat all cover the requirements of having to manually 
acknowleging nodes that use manual fencing.  If you do in fact employ 
manual fencing in your cluster, you might want to go over this 
documentation again.

If you don't use manual fencing, please accept my apologies for 
expressing my general distaste for manual fencing instead of actually 
helping you!! :-)

Kind Regards,

Stewart



From stewart at epits.com.au  Tue Jan 27 09:53:34 2009
From: stewart at epits.com.au (Stewart Walters)
Date: Tue, 27 Jan 2009 18:53:34 +0900
Subject: [Linux-cluster] Re: Problems setting up a three node cluster
In-Reply-To: <497A3533.5050504@gmail.com>
References: <497A0157.9040407@gmail.com> <497A3533.5050504@gmail.com>
Message-ID: <497ED99E.2070203@epits.com.au>

carlopmart wrote:
> carlopmart wrote:
>> Hi all,
>>
>>  I have several problems to setup a three nodes cluster under 
>> rhel5.3. Under some circumstances, I need to startup only one node, 
>> but clustat every time shows me: "Member Status: Inquorate". Another 
>> times, when I startup the third node, all cluster nodes go offline.
>>
>>  I use quorum disk to accomplish this pourpose, but what is wrong on 
>> my cluster.conf??
>>
>>  Many thanks.
>>
>
> Please, any hints??
>

I didn't see this previous post when I replied to your "Node2 kills 
node1" email.  I assumed you were using a 2 node cluster.

Can you please post your cluster.conf and any error messages relevant in 
/var/log/messages?

Regards,

Stewart



From jakub.suchy at enlogit.cz  Tue Jan 27 10:10:28 2009
From: jakub.suchy at enlogit.cz (Jakub Suchy)
Date: Tue, 27 Jan 2009 11:10:28 +0100
Subject: [Linux-cluster] Node2 kills node1 when it is booting ...
In-Reply-To: <497D798D.20905@gmail.com>
References: <497D798D.20905@gmail.com>
Message-ID: <20090127101028.GA10243@aaron>

Hello,
this is a common problem which arised in past months in RHCS.

The usual solution is to let the nodes solve the problem naturally -
after the node is killed, it is usually fenced and rejoins back in OK
state after a reboot. You only have a problem if you are using manual
fencing...Don't...

See /etc/init.d/cman also, there is a new variable introduced in RHEL5.3
(and 5.2 errata), called FENCE_DELAY (or similar), try setting it to a
bit higher value. It tells how many seconds to wait for the nodes to
join the same fence domain before killing them.

Jakub

carlopmart wrote:
> Hi all,
>
>  I need to setup another rhcs today with two nodes. But every times that 
> I start second node, node1 returns this error:
>
> cman killed by node 2 because we rejoined the cluster without a full restart
>
>  .. and cman stops on node1. Why?? I didn't find any solution under  
> http://sources.redhat.com/cluster/wiki/FAQ/
>
>  My nodes are rhel5.3

-- 
Jakub Such? <jakub.suchy at enlogit.cz>
GSM: +420 - 777 817 949

Enlogit s.r.o, U Cukrovaru 509/4, 400 07 ?st? nad Labem
tel.: +420 - 474 745 159, fax: +420 - 474 745 160
e-mail: info at enlogit.cz, web: http://www.enlogit.cz

Energy & Logic in IT



From stewart at epits.com.au  Tue Jan 27 10:11:06 2009
From: stewart at epits.com.au (Stewart Walters)
Date: Tue, 27 Jan 2009 19:11:06 +0900
Subject: [Linux-cluster] Re: Problems setting up a three node cluster
In-Reply-To: <497ED99E.2070203@epits.com.au>
References: <497A0157.9040407@gmail.com> <497A3533.5050504@gmail.com>
	<497ED99E.2070203@epits.com.au>
Message-ID: <497EDDBA.7010909@epits.com.au>

Stewart Walters wrote:
> carlopmart wrote:
>> carlopmart wrote:
>>> Hi all,
>>>
>>>  I have several problems to setup a three nodes cluster under 
>>> rhel5.3. Under some circumstances, I need to startup only one node, 
>>> but clustat every time shows me: "Member Status: Inquorate". Another 
>>> times, when I startup the third node, all cluster nodes go offline.
>>>
>>>  I use quorum disk to accomplish this pourpose, but what is wrong on 
>>> my cluster.conf??
>>>
>>>  Many thanks.
>>>
>>
>> Please, any hints??
>>
>
> I didn't see this previous post when I replied to your "Node2 kills 
> node1" email.  I assumed you were using a 2 node cluster.
>
> Can you please post your cluster.conf and any error messages relevant 
> in /var/log/messages?
>
> Regards,
>
> Stewart
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


Argh!!!  You already posted it in a previous post.

Sorry, I'm having a bad email day.  I'll just go scamper off to the 
corner until I can redeem myself!!

Stewart



From stewart at epits.com.au  Tue Jan 27 10:18:19 2009
From: stewart at epits.com.au (Stewart Walters)
Date: Tue, 27 Jan 2009 19:18:19 +0900
Subject: [Linux-cluster] Trouble adding back in an old node
In-Reply-To: <497A3E19.6070904@sph.emory.edu>
References: <497A3E19.6070904@sph.emory.edu>
Message-ID: <497EDF6B.8020806@epits.com.au>

Vernard C. Martin wrote:
> I'm running Centos 5.2 and using the the cluster suite + GFS1. I have 
> an EMC CX600 providing shared storage to some LUNs. Im using broacde 
> port fencing.
>
> I'm experiencing a problem trying to add a previously removed node 
> back into the cluster. The node was having hardare RAM issues so it 
> was removed from the cluster completely (i.e. removed from the 
> cluster.conf and removed from the storage zoning as well).  I then 
> added 3 more nodes to the cluster. Now that the bad RAM has been 
> identified and removed, I wanted to add the node back in. I followed 
> the instructions that I had used on the previous 3 nodes (i.e. used 
> system-config-cluster to configure the node, save and propagate the 
> cluster.conf, manually propagate the cluster.conf to the newly added 
> node, and then start up cman and clvmd). However when I tried to start 
> up cman with "service cman start". The process hangs when actually 
> starting up cman. I did some digging and in the /var/log/messages of 
> the node I'm attempting to add, I get the following:
>
> Jan 23 15:41:39 node004 ccsd[9342]: Initial status:: Inquorate
> Jan 23 15:41:40 node004 ccsd[9342]: Cluster is not quorate.  Refusing 
> connection.
> Jan 23 15:41:40 node004 ccsd[9342]: Error while processing connect: 
> Connection refused
> Jan 23 15:41:45 node004 ccsd[9342]: Cluster is not quorate.  Refusing 
> connection.
> Jan 23 15:41:45 node004 ccsd[9342]: Error while processing connect: 
> Connection refused
> Jan 23 15:41:50 node004 ccsd[9342]: Cluster is not quorate.  Refusing 
> connection.
> Jan 23 15:41:50 node004 ccsd[9342]: Error while processing connect: 
> Connection refused
>
> I suspect that this is at least part of the problem. However, I'm a 
> bit confused because the cluster its attempting to join is most 
> definitely quorate.  At least according to clustat -f
>
> Cluster Status for rsph_centos_5 @ Fri Jan 23 17:00:45 2009
> Member Status: Quorate
>
> Member Name                                                  ID   Status
> ------ ----                                                  ---- ------
> head1.clus.sph.emory.edu                                         1 
> Online, Local
> node002.clus.sph.emory.edu                                       2 Online
> node003.clus.sph.emory.edu                                       3 Online
> node004.clus.sph.emory.edu                                       4 
> Offline
> node005.clus.sph.emory.edu                                       5 Online
> node006.clus.sph.emory.edu                                       6 Online
> node007.clus.sph.emory.edu                                       7 Online
>
>
> I'm thinking that there is something subtlet that I am missing that I 
> can change to make this work. I really don't want to have to 
> re-install and reconfigure the machine to get this to work. That is 
> something that you do in the Windows world :-)
>
>
> So here is my cluster.conf file. Passwords changed to protect the guilty.
>
> <?xml version="2.0"?>
> <cluster alias="rsph_centos_5" config_version="41" name="rsph_centos_5">
>        <fence_daemon clean_start="1" post_fail_delay="30" 
> post_join_delay="90"/>
>        <clusternodes>
>                <clusternode name="head1.clus.sph.emory.edu" nodeid="1" 
> votes="7">
>                        <fence>
>                                <method name="1">
>                                        <device 
> name="sanclusa1.sph.emory.edu" port="1"/>
>                                        <device 
> name="sanclusb1.sph.emory.edu" port="1"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="node002.clus.sph.emory.edu" 
> nodeid="2" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device 
> name="sanclusa1.sph.emory.edu" port="2"/>
>                                        <device 
> name="sanclusb1.sph.emory.edu" port="2"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="node003.clus.sph.emory.edu" 
> nodeid="3" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device 
> name="sanclusa1.sph.emory.edu" port="3"/>
>                                        <device 
> name="sanclusb1.sph.emory.edu" port="3"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="node005.clus.sph.emory.edu" 
> nodeid="5" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device 
> name="sanclusa1.sph.emory.edu" port="5"/>
>                                        <device 
> name="sanclusb1.sph.emory.edu" port="5"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="node006.clus.sph.emory.edu" 
> nodeid="6" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device 
> name="sanclusa1.sph.emory.edu" port="6"/>
>                                        <device 
> name="sanclusb1.sph.emory.edu" port="6"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="node007.clus.sph.emory.edu" 
> nodeid="7" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device 
> name="sanclusa1.sph.emory.edu" port="7"/>
>                                        <device 
> name="sanclusb1.sph.emory.edu" port="7"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="node004.clus.sph.emory.edu" 
> nodeid="4" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device 
> name="sanclusa1.sph.emory.edu" port="4"/>
>                                        <device 
> name="sanclusb1.sph.emory.edu" port="4"/>
>                                </method>
>                        </fence>
>                </clusternode>
>        </clusternodes>
>        <cman/>
>        <fencedevices>
>                <fencedevice agent="fence_brocade" 
> ipaddr="170.140.183.87" login="admin" name="sanclusa1.sph.emory.edu" 
> passwd="mypasshere"/>
>                <fencedevice agent="fence_brocade" 
> ipaddr="170.140.183.88" login="admin" name="sanclusb1.sph.emory.edu" 
> passwd="mypasshere"/>
>        </fencedevices>
>        <rm>
>                <failoverdomains/>
>                <resources/>
>        </rm>
> </cluster>
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


You have a <cman/> to close the cman stanza in cluster.conf, but no 
actual <cman parameter1=1 parameter2=2> to open it.  Is this correct?

The cman stanza is where you would define expected_votes on the cluster, 
so not having this present is perhaps the reason why ccsd believes the 
cluster is inquorate?

Regards,

Stewart



From carlopmart at gmail.com  Tue Jan 27 10:19:07 2009
From: carlopmart at gmail.com (carlopmart)
Date: Tue, 27 Jan 2009 11:19:07 +0100
Subject: [Linux-cluster] Re: Node2 kills node1 when it is booting ...
In-Reply-To: <497ED87B.5000103@epits.com.au>
References: <497D798D.20905@gmail.com> <497EC8C5.7090701@gmail.com>
	<497ED87B.5000103@epits.com.au>
Message-ID: <497EDF9B.80609@gmail.com>

Stewart Walters wrote:
> carlopmart wrote:
>> carlopmart wrote:
>>> Hi all,
>>>
>>>  I need to setup another rhcs today with two nodes. But every times 
>>> that I start second node, node1 returns this error:
>>>
>>> cman killed by node 2 because we rejoined the cluster without a full 
>>> restart
>>>
>>>  .. and cman stops on node1. Why?? I didn't find any solution under 
>>> http://sources.redhat.com/cluster/wiki/FAQ/
>>>
>>>  My nodes are rhel5.3
>>>
>>>  Many thanks.
>>>
>>
>> Please, I need your help ... Any ideas???
>>
> 
> Sounds like node1 fenced node2, and node2 hasn't been rebooted since 
> being fenced. Either that, or node2 uses manual fencing and you haven't 
> yet manually acknowledged that it was rebooted.
> 
> Check your logs in /var/log/messages on node1, I'm pretty sure you'll 
> see a reference there that node2 has been fenced.
> 
> You'll probably also see somewhere in the logs on node1, that it 
> detected node2 did not leave the cluster after being fenced, and as a 
> result node1 itself has decided to stop itself to prevent data 
> corruption (the message will be something like that anyway).
> 
> If you are using manual fencing on a node2, after you reboot it you need 
> to run "fence_manual_ack -n <node2>" from node1.  Do this only after 
> you've restarted node2 but before cman starts back up on it in the next 
> boot sequence.  At this point node1 will stop fencing node2 and both 
> nodes should be able to join the cluster succesfully.
> 
> Manual fencing is evil :-)
> 
> Try to avoid it if you can - as you'll get this scenario on your cluster 
> every time a node is fenced.  This is the reason why Red Hat write in 
> their documentation numerous times that manual fencing is not supported 
> in Production clusters (it's almost as if they're trying to tell us 
> something...). ;-)
> 
> Also, you mentioned that the solution was not found in the FAQ.  While 
> it might not include reference to this specific symptoms, I'm pretty 
> sure the FAQ, the man pages for fence_manual and the RHCS documentation 
> from Red Hat all cover the requirements of having to manually 
> acknowleging nodes that use manual fencing.  If you do in fact employ 
> manual fencing in your cluster, you might want to go over this 
> documentation again.
> 
> If you don't use manual fencing, please accept my apologies for 
> expressing my general distaste for manual fencing instead of actually 
> helping you!! :-)
> 
> Kind Regards,
> 
> Stewart
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

Many thanks for your help Stewart, but I don't use manual fence as fence device 
in this cluster. I am using gnbd to do this.

I post my cluster.conf

-- 
CL Martinez
carlopmart {at} gmail {d0t} com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: text/xml
Size: 1675 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090127/4040c3d2/attachment.xml>

From carlopmart at gmail.com  Tue Jan 27 10:20:03 2009
From: carlopmart at gmail.com (carlopmart)
Date: Tue, 27 Jan 2009 11:20:03 +0100
Subject: [Linux-cluster] Re: Problems setting up a three node cluster
In-Reply-To: <497EDDBA.7010909@epits.com.au>
References: <497A0157.9040407@gmail.com>
	<497A3533.5050504@gmail.com>	<497ED99E.2070203@epits.com.au>
	<497EDDBA.7010909@epits.com.au>
Message-ID: <497EDFD3.2030906@gmail.com>

Stewart Walters wrote:
> Stewart Walters wrote:
>> carlopmart wrote:
>>> carlopmart wrote:
>>>> Hi all,
>>>>
>>>>  I have several problems to setup a three nodes cluster under 
>>>> rhel5.3. Under some circumstances, I need to startup only one node, 
>>>> but clustat every time shows me: "Member Status: Inquorate". Another 
>>>> times, when I startup the third node, all cluster nodes go offline.
>>>>
>>>>  I use quorum disk to accomplish this pourpose, but what is wrong on 
>>>> my cluster.conf??
>>>>
>>>>  Many thanks.
>>>>
>>>
>>> Please, any hints??
>>>
>>
>> I didn't see this previous post when I replied to your "Node2 kills 
>> node1" email.  I assumed you were using a 2 node cluster.
>>
>> Can you please post your cluster.conf and any error messages relevant 
>> in /var/log/messages?
>>
>> Regards,
>>
>> Stewart
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> Argh!!!  You already posted it in a previous post.
> 
> Sorry, I'm having a bad email day.  I'll just go scamper off to the 
> corner until I can redeem myself!!
> 
> Stewart

No, this is a different rhel cluster ... and different problems than previous ...

> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From stewart at epits.com.au  Tue Jan 27 10:26:10 2009
From: stewart at epits.com.au (Stewart Walters)
Date: Tue, 27 Jan 2009 19:26:10 +0900
Subject: [Linux-cluster] Re: Node2 kills node1 when it is booting ...
In-Reply-To: <497EDF9B.80609@gmail.com>
References: <497D798D.20905@gmail.com>
	<497EC8C5.7090701@gmail.com>	<497ED87B.5000103@epits.com.au>
	<497EDF9B.80609@gmail.com>
Message-ID: <497EE142.5040509@epits.com.au>

carlopmart wrote:
> Stewart Walters wrote:
>> carlopmart wrote:
>>> carlopmart wrote:
>>>> Hi all,
>>>>
>>>>  I need to setup another rhcs today with two nodes. But every times 
>>>> that I start second node, node1 returns this error:
>>>>
>>>> cman killed by node 2 because we rejoined the cluster without a 
>>>> full restart
>>>>
>>>>  .. and cman stops on node1. Why?? I didn't find any solution under 
>>>> http://sources.redhat.com/cluster/wiki/FAQ/
>>>>
>>>>  My nodes are rhel5.3
>>>>
>>>>  Many thanks.
>>>>
>>>
>>> Please, I need your help ... Any ideas???
>>>
>>
>> Sounds like node1 fenced node2, and node2 hasn't been rebooted since 
>> being fenced. Either that, or node2 uses manual fencing and you 
>> haven't yet manually acknowledged that it was rebooted.
>>
>> Check your logs in /var/log/messages on node1, I'm pretty sure you'll 
>> see a reference there that node2 has been fenced.
>>
>> You'll probably also see somewhere in the logs on node1, that it 
>> detected node2 did not leave the cluster after being fenced, and as a 
>> result node1 itself has decided to stop itself to prevent data 
>> corruption (the message will be something like that anyway).
>>
>> If you are using manual fencing on a node2, after you reboot it you 
>> need to run "fence_manual_ack -n <node2>" from node1.  Do this only 
>> after you've restarted node2 but before cman starts back up on it in 
>> the next boot sequence.  At this point node1 will stop fencing node2 
>> and both nodes should be able to join the cluster succesfully.
>>
>> Manual fencing is evil :-)
>>
>> Try to avoid it if you can - as you'll get this scenario on your 
>> cluster every time a node is fenced.  This is the reason why Red Hat 
>> write in their documentation numerous times that manual fencing is 
>> not supported in Production clusters (it's almost as if they're 
>> trying to tell us something...). ;-)
>>
>> Also, you mentioned that the solution was not found in the FAQ.  
>> While it might not include reference to this specific symptoms, I'm 
>> pretty sure the FAQ, the man pages for fence_manual and the RHCS 
>> documentation from Red Hat all cover the requirements of having to 
>> manually acknowleging nodes that use manual fencing.  If you do in 
>> fact employ manual fencing in your cluster, you might want to go over 
>> this documentation again.
>>
>> If you don't use manual fencing, please accept my apologies for 
>> expressing my general distaste for manual fencing instead of actually 
>> helping you!! :-)
>>
>> Kind Regards,
>>
>> Stewart
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> Many thanks for your help Stewart, but I don't use manual fence as 
> fence device in this cluster. I am using gnbd to do this.
>
> I post my cluster.conf
>
> ------------------------------------------------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
Silly question then, have you actually restarted (i.e. actually 
rebooted) the cluster node1?

Regards,

Stewart



From carlopmart at gmail.com  Tue Jan 27 10:26:11 2009
From: carlopmart at gmail.com (carlopmart)
Date: Tue, 27 Jan 2009 11:26:11 +0100
Subject: [Linux-cluster] Node2 kills node1 when it is booting ...
In-Reply-To: <20090127101028.GA10243@aaron>
References: <497D798D.20905@gmail.com> <20090127101028.GA10243@aaron>
Message-ID: <497EE143.4000809@gmail.com>

Jakub Suchy wrote:
> Hello,
> this is a common problem which arised in past months in RHCS.
> 
> The usual solution is to let the nodes solve the problem naturally -
> after the node is killed, it is usually fenced and rejoins back in OK
> state after a reboot. You only have a problem if you are using manual
> fencing...Don't...
> 
> See /etc/init.d/cman also, there is a new variable introduced in RHEL5.3
> (and 5.2 errata), called FENCE_DELAY (or similar), try setting it to a
> bit higher value. It tells how many seconds to wait for the nodes to
> join the same fence domain before killing them.
> 
> Jakub

Thanks jakub, but if I change FENCED_MEMBER_DELAY param to, for example, 3600 
seconds, when fenced daemon starts up ?? automatically or after 3600 seconds??


> 
> carlopmart wrote:
>> Hi all,
>>
>>  I need to setup another rhcs today with two nodes. But every times that 
>> I start second node, node1 returns this error:
>>
>> cman killed by node 2 because we rejoined the cluster without a full restart
>>
>>  .. and cman stops on node1. Why?? I didn't find any solution under  
>> http://sources.redhat.com/cluster/wiki/FAQ/
>>
>>  My nodes are rhel5.3
> 


-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From fdinitto at redhat.com  Tue Jan 27 10:28:20 2009
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 27 Jan 2009 11:28:20 +0100
Subject: [Linux-cluster] Cluster 3.0.0.alpha3 released
Message-ID: <1233052100.9769.26.camel@cerberus.int.fabbione.net>

The cluster team and its community are proud to announce the
3.0.0.alpha3 release from the STABLE3 branch.

The development cycle for 3.0.0 is about to end. With the new STABLE3
branch that will collect only bug fixes and minimal update required to
build on top of the latest upstream kernel, we are getting closer and
closer to a shiny new stable release.

Everybody with test equipment and time to spare, is highly encouraged to
download, install and test the 3.0.0.alpha releases and more important
report problems. This is the time for people to make a difference and
help us testing as much as possible.

In order to build the 3.0.0.alpha3 release you will need:

- corosync svn r1750.
- openais svn r1682.
- linux kernel 2.6.28.

The new source tarball can be downloaded here:

ftp://sources.redhat.com/pub/cluster/releases/cluster-3.0.0.alpha3.tar.gz
https://fedorahosted.org/releases/c/l/cluster/cluster-3.0.0.alpha3.tar.gz

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Happy clustering,
Fabio

Under the hood (from 3.0.0.alpha2):

Abhijith Das (1):
      gfs-kernel: Changes to gfs to work with 2.6.28

Bob Peterson (5):
      Add optional human-readable output to gfs_tool df
      Non-default block size confuses gfs2_grow
      Remove bogus TODO file from gfs2_fsck.
      Remove silly comment from log.c
      GFS2: Add human readable output to gfs2_tool df

Christine Caulfield (12):
      cman: Rename Dirty to HaveState
      cman: Remove cman_dump_objdb
      cman: Add more node display options to 'cman_tool nodes'
      cman: don't double free if a key doesn't exist while reloading
      cman: don't delete /totem when re-reading config.
      cman: Don't sent a RECONFIGURE notification until we have a valid
config
      cman: Allow cman_tool version -r0 to update to an arbitrary
version
      cman: Make sure we have the same version on all nodes
      cman: send fewer messages for each state transition.
      cman: Cope with nodes that have the same ipv6 & ipv6 names
      cman: make sure we load corosync's quorum service
      Merge branch 'STABLE3' of git://git.fedorahosted.org/git/cluster
into STABLE3

David Teigland (3):
      dlm_tool: use NEWEXCL flag
      groupd: reduce messages in compat detection
      libdlm: uncomment the NEWEXCL flag

Fabio M. Di Nitto (14):
      misc: Update copyright for 2009
      build: adapt Makefile to expand for EVENT_TARGETS
      rgmanager: fix again randomization of temp files
      rgmanager: remove debugging printf
      build: allow flexibility around fence agents
      build: add --without_resource_agents build option
      build: add release script to help release management task
      build: bump kernel requirements to 2.6.28
      gfs: fix binary and manpage names
      gfs2: fix binary and manpage names
      man: Update man pages to new names, fix refs etc
      build: prepare infrastructure for fence-agents and resource-agents
tarballs
      logt: don't hold fd open when not required
      build: improve release.mk

Jan Friesse (4):
      fence: Fusion of vmware_vi and vmware_vmrun into one agent
      fence: Fix bad name for vmware helper
      fence_vmware: Improve speed for status operation => whole fencing
      fence: vmware helper better handling power on/off errors

Lon Hohberger (4):
      rgmanager: Add XSLT scripts for extracting RA metadata
      Merge branch 'STABLE3' of
ssh://lon at git.fedorahosted.org/git/cluster into stable3
      rgmanager: Fix DTD so that it actually works
      rgmanager: Make 'make check' work for resource-agents

Marek 'marx' Grac (4):
      [FENCING] long options correction
      [RGMANAGER] Resolves #481058 - Add option startup_wait for mysql
RA
      [FENCE] Resolves #472781 - Properly report status on systems in
Open Firmware
      [FENCE] Resolves #472781 - typo

Mark Hlawatschek (1):
      [rgmanager] Add no_unmount to netfs.sh

Steven Whitehouse (12):
      tool: Remove old perl scripts
      gfs2_tool: remove unused code
      mkfs: Remove unused code
      libgfs2: Remove unused code
      man: Update man pages to new names, fix refs etc
      include: Remove almost unused headers
      libgfs2: Remove three unused functions
      libgfs2: Remove unused functions from misc.c
      libgfs2: More fixes in bitmap.c and block_list.c
      libgfs2: Remove unused code, general clean up
      libgfs2.h: add externs
      libgfs2: Fix typo in previous patch

marx (2):
      [RGMANAGER] Resolves: #474444 - Zero-length pid files cause
resource start failures
      [RGMANAGER] Resolves #449394 - Recovery policy of type restart
doesn't work

 cman/cman_tool/cman_tool.h                       |    3 +
 cman/cman_tool/main.c                            |   74 +-
 cman/daemon/cman-preconfig.c                     |   38 +-
 cman/daemon/cnxman-socket.h                      |   11 +
 cman/daemon/commands.c                           |   92 ++-
 cman/lib/libcman.c                               |   24 +-
 cman/lib/libcman.h                               |   36 +-
 cman/man/cman_tool.8                             |   11 +-
 common/liblogthread/liblogthread.c               |   14 +-
 configure                                        |   39 +-
 dlm/libdlm/libdlm.h                              |    2 +-
 dlm/tool/main.c                                  |    6 -
 doc/COPYRIGHT                                    |   65 +-
 fence/agents/Makefile                            |    2 +
 fence/agents/ifmib/fence_ifmib.py                |    2 +-
 fence/agents/lib/fencing.py.py                   |   51 +-
 fence/agents/lpar/fence_lpar.py                  |    2 +-
 fence/agents/vmware/Makefile                     |    5 +-
 fence/agents/vmware/fence_vmware.py              |  357 +++++++---
 fence/agents/vmware/fence_vmware_helper.pl       |  267 +++++++
 fence/agents/vmware_vi/Makefile                  |    5 -
 fence/agents/vmware_vi/fence_vmware_vi.py        |  149 ----
 fence/agents/vmware_vi/fence_vmware_vi_helper.pl |  237 ------
 fence/agents/vmware_vmrun/Makefile               |    4 -
 fence/agents/vmware_vmrun/fence_vmware_vmrun.py  |  154 ----
 fence/man/Makefile                               |    1 -
 fence/man/fence_ifmib.8                          |    2 +-
 fence/man/fence_vmware.8                         |  134 ++--
 fence/man/fence_vmware_vmrun.8                   |  137 ----
 gfs-kernel/src/gfs/acl.c                         |    3 +-
 gfs-kernel/src/gfs/inode.c                       |   12 +-
 gfs-kernel/src/gfs/ioctl.c                       |    5 +-
 gfs-kernel/src/gfs/log.c                         |   20 -
 gfs-kernel/src/gfs/ops_address.c                 |  119 ++--
 gfs-kernel/src/gfs/ops_export.c                  |   20 +-
 gfs-kernel/src/gfs/ops_inode.c                   |    9 +-
 gfs-kernel/src/gfs/quota.c                       |    3 +-
 gfs/gfs_fsck/Makefile                            |    9 +-
 gfs/gfs_mkfs/Makefile                            |    9 +-
 gfs/gfs_tool/df.c                                |   87 ++-
 gfs/gfs_tool/gfs_tool.h                          |    6 +-
 gfs/gfs_tool/main.c                              |   12 +-
 gfs/man/Makefile                                 |    6 +-
 gfs/man/fsck.gfs.8                               |   59 ++
 gfs/man/gfs.8                                    |    6 +-
 gfs/man/gfs_fsck.8                               |   59 --
 gfs/man/gfs_grow.8                               |    2 +-
 gfs/man/gfs_jadd.8                               |    2 +-
 gfs/man/gfs_mkfs.8                               |   82 ---
 gfs/man/gfs_mount.8                              |  193 -----
 gfs/man/gfs_tool.8                               |   24 +-
 gfs/man/mkfs.gfs.8                               |   82 +++
 gfs/man/mount.gfs.8                              |  193 +++++
 gfs2/convert/gfs2_convert.c                      |    9 +-
 gfs2/fsck/TODO                                   |   49 --
 gfs2/fsck/fs_bits.h                              |    6 -
 gfs2/include/global.h                            |   42 --
 gfs2/include/osi_user.h                          |  421 -----------
 gfs2/libgfs2/bitmap.c                            |    4 -
 gfs2/libgfs2/block_list.c                        |    2 +-
 gfs2/libgfs2/buf.c                               |   17 -
 gfs2/libgfs2/device_geometry.c                   |    6 +-
 gfs2/libgfs2/fs_bits.c                           |  114 +---
 gfs2/libgfs2/fs_geometry.c                       |    9 +-
 gfs2/libgfs2/fs_ops.c                            |   82 +--
 gfs2/libgfs2/gfs2_log.c                          |    8 +-
 gfs2/libgfs2/libgfs2.h                           |  364 +++++-----
 gfs2/libgfs2/locking.c                           |    3 +-
 gfs2/libgfs2/misc.c                              |  251 +------
 gfs2/libgfs2/size.c                              |   19 +-
 gfs2/libgfs2/structures.c                        |  131 +---
 gfs2/man/Makefile                                |    4 +-
 gfs2/man/fsck.gfs2.8                             |   59 ++
 gfs2/man/gfs2.8                                  |    4 +-
 gfs2/man/gfs2_fsck.8                             |   59 --
 gfs2/man/gfs2_grow.8                             |    2 +-
 gfs2/man/gfs2_jadd.8                             |    2 +-
 gfs2/man/gfs2_mount.8                            |  202 -----
 gfs2/man/mount.gfs2.8                            |  202 +++++
 gfs2/mkfs/main_grow.c                            |    2 +-
 gfs2/mkfs/main_jadd.c                            |   33 -
 gfs2/tool/Makefile                               |    1 -
 gfs2/tool/decipher_lockstate_dump                |  175 -----
 gfs2/tool/df.c                                   |   74 ++-
 gfs2/tool/gfs2_tool.h                            |    6 +-
 gfs2/tool/layout.c                               |  848
----------------------
 gfs2/tool/main.c                                 |   52 +-
 gfs2/tool/misc.c                                 |  329 ---------
 gfs2/tool/parse_lockdump                         |  158 ----
 group/daemon/cpg.c                               |   16 +-
 make/copyright.cf                                |    2 +-
 make/defines.mk.input                            |    1 +
 make/release.mk                                  |  118 +++
 rgmanager/src/Makefile                           |    6 +-
 rgmanager/src/resources/Makefile                 |   13 +-
 rgmanager/src/resources/SAPDatabase              |    2 +-
 rgmanager/src/resources/follow-service.sl        |    2 +-
 rgmanager/src/resources/mysql.metadata           |   10 +
 rgmanager/src/resources/mysql.sh                 |    8 +-
 rgmanager/src/resources/netfs.sh                 |   22 +
 rgmanager/src/resources/ra-api-1-modified.dtd    |    2 +-
 rgmanager/src/resources/ra2csv.xsl               |   19 +
 rgmanager/src/resources/ra2man.xsl               |  158 ++++
 rgmanager/src/resources/ra2oid.xsl               |   68 ++
 rgmanager/src/resources/utils/config-utils.sh.in |    8 +
 rgmanager/src/resources/utils/ra-skelet.sh       |    5 +-
 rgmanager/src/utils/clulog.c                     |    3 -
 107 files changed, 2486 insertions(+), 4672 deletions(-)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090127/7e4030f5/attachment.sig>

From carlopmart at gmail.com  Tue Jan 27 10:30:55 2009
From: carlopmart at gmail.com (carlopmart)
Date: Tue, 27 Jan 2009 11:30:55 +0100
Subject: [Linux-cluster] Re: Node2 kills node1 when it is booting ...
In-Reply-To: <497EE142.5040509@epits.com.au>
References: <497D798D.20905@gmail.com>	<497EC8C5.7090701@gmail.com>	<497ED87B.5000103@epits.com.au>	<497EDF9B.80609@gmail.com>
	<497EE142.5040509@epits.com.au>
Message-ID: <497EE25F.90105@gmail.com>

Stewart Walters wrote:
> carlopmart wrote:
>> Stewart Walters wrote:
>>> carlopmart wrote:
>>>> carlopmart wrote:
>>>>> Hi all,
>>>>>
>>>>>  I need to setup another rhcs today with two nodes. But every times 
>>>>> that I start second node, node1 returns this error:
>>>>>
>>>>> cman killed by node 2 because we rejoined the cluster without a 
>>>>> full restart
>>>>>
>>>>>  .. and cman stops on node1. Why?? I didn't find any solution under 
>>>>> http://sources.redhat.com/cluster/wiki/FAQ/
>>>>>
>>>>>  My nodes are rhel5.3
>>>>>
>>>>>  Many thanks.
>>>>>
>>>>
>>>> Please, I need your help ... Any ideas???
>>>>
>>>
>>> Sounds like node1 fenced node2, and node2 hasn't been rebooted since 
>>> being fenced. Either that, or node2 uses manual fencing and you 
>>> haven't yet manually acknowledged that it was rebooted.
>>>
>>> Check your logs in /var/log/messages on node1, I'm pretty sure you'll 
>>> see a reference there that node2 has been fenced.
>>>
>>> You'll probably also see somewhere in the logs on node1, that it 
>>> detected node2 did not leave the cluster after being fenced, and as a 
>>> result node1 itself has decided to stop itself to prevent data 
>>> corruption (the message will be something like that anyway).
>>>
>>> If you are using manual fencing on a node2, after you reboot it you 
>>> need to run "fence_manual_ack -n <node2>" from node1.  Do this only 
>>> after you've restarted node2 but before cman starts back up on it in 
>>> the next boot sequence.  At this point node1 will stop fencing node2 
>>> and both nodes should be able to join the cluster succesfully.
>>>
>>> Manual fencing is evil :-)
>>>
>>> Try to avoid it if you can - as you'll get this scenario on your 
>>> cluster every time a node is fenced.  This is the reason why Red Hat 
>>> write in their documentation numerous times that manual fencing is 
>>> not supported in Production clusters (it's almost as if they're 
>>> trying to tell us something...). ;-)
>>>
>>> Also, you mentioned that the solution was not found in the FAQ.  
>>> While it might not include reference to this specific symptoms, I'm 
>>> pretty sure the FAQ, the man pages for fence_manual and the RHCS 
>>> documentation from Red Hat all cover the requirements of having to 
>>> manually acknowleging nodes that use manual fencing.  If you do in 
>>> fact employ manual fencing in your cluster, you might want to go over 
>>> this documentation again.
>>>
>>> If you don't use manual fencing, please accept my apologies for 
>>> expressing my general distaste for manual fencing instead of actually 
>>> helping you!! :-)
>>>
>>> Kind Regards,
>>>
>>> Stewart
>>>
>>> -- 
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> Many thanks for your help Stewart, but I don't use manual fence as 
>> fence device in this cluster. I am using gnbd to do this.
>>
>> I post my cluster.conf
>>
>> ------------------------------------------------------------------------
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> Silly question then, have you actually restarted (i.e. actually 
> rebooted) the cluster node1?
> 
> Regards,
> 
> Stewart
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
Yes, and then works, but when I need to do an ordered shutdown (first node1), 
fenced daemon on node2 doesn't stops ....



-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From denisb+gmane at gmail.com  Tue Jan 27 12:31:19 2009
From: denisb+gmane at gmail.com (denis)
Date: Tue, 27 Jan 2009 13:31:19 +0100
Subject: [Linux-cluster] changed mount options GFS
Message-ID: <glmuqo$6da$1@ger.gmane.org>

Hi,

Upgraded clusternodes from RHEL5.2 -> 5.3 today, and the GFS(1) mount I
had was not mounted as usual.

It turns out that GFS no longer accepts noatime or noquota? Removing
these mountoptions and I could again mount my GFS volume. Which one is
now deprecated and why?

Regards
--
Denis



From billpp at gmail.com  Tue Jan 27 12:43:32 2009
From: billpp at gmail.com (Flavio Junior)
Date: Tue, 27 Jan 2009 10:43:32 -0200
Subject: [Linux-cluster] Re: Node2 kills node1 when it is booting ...
In-Reply-To: <497EE25F.90105@gmail.com>
References: <497D798D.20905@gmail.com> <497EC8C5.7090701@gmail.com>
	<497ED87B.5000103@epits.com.au> <497EDF9B.80609@gmail.com>
	<497EE142.5040509@epits.com.au> <497EE25F.90105@gmail.com>
Message-ID: <58aa8d780901270443p10bbcc8dtbd3e8b40af92dbb6@mail.gmail.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Did you disable ACPI daemon on linux ?
You need a instantly shutdown, if you system is rebooting using acpid
cluster will not detect it as a fence successful.


2c

- --

Fl?vio do Carmo J?nior aka waKKu

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: http://getfiregpg.org

iEYEARECAAYFAkl/AXgACgkQgyuXjr6dykvKhwCeMAMqzxWzosyC0WdQTgAMWcPh
zboAni5LKe6pVO2LHa4jndI/UEZKQGaR
=Dw+3
-----END PGP SIGNATURE-----

On Tue, Jan 27, 2009 at 8:30 AM, carlopmart <carlopmart at gmail.com> wrote:
> Stewart Walters wrote:
>>
>> carlopmart wrote:
>>>
>>> Stewart Walters wrote:
>>>>
>>>> carlopmart wrote:
>>>>>
>>>>> carlopmart wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>>  I need to setup another rhcs today with two nodes. But every times
>>>>>> that I start second node, node1 returns this error:
>>>>>>
>>>>>> cman killed by node 2 because we rejoined the cluster without a full
>>>>>> restart
>>>>>>
>>>>>>  .. and cman stops on node1. Why?? I didn't find any solution under
>>>>>> http://sources.redhat.com/cluster/wiki/FAQ/
>>>>>>
>>>>>>  My nodes are rhel5.3
>>>>>>
>>>>>>  Many thanks.
>>>>>>
>>>>>
>>>>> Please, I need your help ... Any ideas???
>>>>>
>>>>
>>>> Sounds like node1 fenced node2, and node2 hasn't been rebooted since
>>>> being fenced. Either that, or node2 uses manual fencing and you haven't yet
>>>> manually acknowledged that it was rebooted.
>>>>
>>>> Check your logs in /var/log/messages on node1, I'm pretty sure you'll
>>>> see a reference there that node2 has been fenced.
>>>>
>>>> You'll probably also see somewhere in the logs on node1, that it
>>>> detected node2 did not leave the cluster after being fenced, and as a result
>>>> node1 itself has decided to stop itself to prevent data corruption (the
>>>> message will be something like that anyway).
>>>>
>>>> If you are using manual fencing on a node2, after you reboot it you need
>>>> to run "fence_manual_ack -n <node2>" from node1.  Do this only after you've
>>>> restarted node2 but before cman starts back up on it in the next boot
>>>> sequence.  At this point node1 will stop fencing node2 and both nodes should
>>>> be able to join the cluster succesfully.
>>>>
>>>> Manual fencing is evil :-)
>>>>
>>>> Try to avoid it if you can - as you'll get this scenario on your cluster
>>>> every time a node is fenced.  This is the reason why Red Hat write in their
>>>> documentation numerous times that manual fencing is not supported in
>>>> Production clusters (it's almost as if they're trying to tell us
>>>> something...). ;-)
>>>>
>>>> Also, you mentioned that the solution was not found in the FAQ.  While
>>>> it might not include reference to this specific symptoms, I'm pretty sure
>>>> the FAQ, the man pages for fence_manual and the RHCS documentation from Red
>>>> Hat all cover the requirements of having to manually acknowleging nodes that
>>>> use manual fencing.  If you do in fact employ manual fencing in your
>>>> cluster, you might want to go over this documentation again.
>>>>
>>>> If you don't use manual fencing, please accept my apologies for
>>>> expressing my general distaste for manual fencing instead of actually
>>>> helping you!! :-)
>>>>
>>>> Kind Regards,
>>>>
>>>> Stewart
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> Many thanks for your help Stewart, but I don't use manual fence as fence
>>> device in this cluster. I am using gnbd to do this.
>>>
>>> I post my cluster.conf
>>>
>>> ------------------------------------------------------------------------
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> Silly question then, have you actually restarted (i.e. actually rebooted)
>> the cluster node1?
>>
>> Regards,
>>
>> Stewart
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
> Yes, and then works, but when I need to do an ordered shutdown (first
> node1), fenced daemon on node2 doesn't stops ....
>
>
>
> --
> CL Martinez
> carlopmart {at} gmail {d0t} com
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From carlopmart at gmail.com  Tue Jan 27 13:28:26 2009
From: carlopmart at gmail.com (carlopmart)
Date: Tue, 27 Jan 2009 14:28:26 +0100
Subject: [Linux-cluster] Re: Node2 kills node1 when it is booting ...
In-Reply-To: <58aa8d780901270443p10bbcc8dtbd3e8b40af92dbb6@mail.gmail.com>
References: <497D798D.20905@gmail.com>
	<497EC8C5.7090701@gmail.com>	<497ED87B.5000103@epits.com.au>
	<497EDF9B.80609@gmail.com>	<497EE142.5040509@epits.com.au>
	<497EE25F.90105@gmail.com>
	<58aa8d780901270443p10bbcc8dtbd3e8b40af92dbb6@mail.gmail.com>
Message-ID: <497F0BFA.2010503@gmail.com>

Yes, acpid is disabled ...

Flavio Junior wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Did you disable ACPI daemon on linux ?
> You need a instantly shutdown, if you system is rebooting using acpid
> cluster will not detect it as a fence successful.
> 
> 
> 2c
> 
> - --
> 



-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From rpeterso at redhat.com  Tue Jan 27 14:33:15 2009
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 27 Jan 2009 09:33:15 -0500 (EST)
Subject: [Linux-cluster] changed mount options GFS
In-Reply-To: <glmuqo$6da$1@ger.gmane.org>
Message-ID: <1126758194.601541233066795021.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>

----- "denis" <denisb+gmane at gmail.com> wrote:
| Hi,
| 
| Upgraded clusternodes from RHEL5.2 -> 5.3 today, and the GFS(1) mount
| I
| had was not mounted as usual.
| 
| It turns out that GFS no longer accepts noatime or noquota? Removing
| these mountoptions and I could again mount my GFS volume. Which one
| is
| now deprecated and why?
| 
| Regards
| --
| Denis

Hi Denis,

This sounds like a bug.  Can you open a bugzilla record for it?
AFAIK, it was not our intent to remove those mount options.

Regards,

Bob Peterson
Red Hat GFS



From rhurst at bidmc.harvard.edu  Tue Jan 27 14:33:18 2009
From: rhurst at bidmc.harvard.edu (Robert Hurst)
Date: Tue, 27 Jan 2009 09:33:18 -0500
Subject: [Linux-cluster] system-config-cluster Error
In-Reply-To: <1232549232.5189.81.camel@WSBID06223.bidmc.harvard.edu>
References: <1034294076.526991231365436502.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
	<1232549232.5189.81.camel@WSBID06223.bidmc.harvard.edu>
Message-ID: <1233066798.3663.18.camel@WSBID06223.bidmc.harvard.edu>

FYI to customers suffering the same fate we are:

I have Red Hat Support telling me that this problem is not reproducible
on their system -- and it must be because we are not running the latest
system-config-cluster package (1.0.55-1.0) ... alas, from a customer
perspective, that latest package  is NOT in our RHEL 4 subscription
channel.  It is only in the RHEL 5 channel.  So, we will wait for
another round of responses ... because we won't just arbitrarily apply
"other" packages.

It would also be nice if the RHEL 4 CS / GFS channels were kept aligned
with RHEL 4 kernel releases, too.  Sigh.


On Wed, 2009-01-21 at 09:47 -0500, Robert Hurst wrote:

>  This appears to be new behavior for us, too, ever since applying the
> new updates for kernel-smp-2.6.9-78.0.8.EL and the subsequent Cluster
> Suite / GFS for it.
> 
> 
> $ rpm -q system-config-cluster
> system-config-cluster-1.0.54-2.0
> 
>  
> With this version, regardless, I can no longer ADD / CHANGE working
> fence device configurations, from any existing production cluster or
> by starting a new one.  The management tab also does not display the
> NAME of the node any longer -- it places the node STATUS, i.e. "M" or
> "X", in that column now.
> 
> Exmaple of EXISTING RUNNING CLUSTER, editing a fence device:
> 
> 
> [root at phoenix-1 cluster]# system-config-cluster
> Traceback (most recent call last):
>   File "/usr/share/system-config-cluster/ConfigTabController.py", line 1129, in on_fd
>     self.fence_handler.populate_fd_form(agent_type, attrs)
>   File "/usr/share/system-config-cluster/FenceHandler.py", line 274, in populate_fd_form
>     apply(self.fd_populate[agent_type], attrs)
>   File "/usr/share/system-config-cluster/FenceHandler.py", line 503, in pop_ilo_fd
>     self.ilo_ssh.set_active(False)
> AttributeError: 'NoneType' object has no attribute 'set_active'
> 
> 
>  Example of NEW CLUSTER, creating a new fence device:
> 
> 
> [root at atlantia ~]# system-config-cluster
> Traceback (most recent call last):
>   File "/usr/share/system-config-cluster/ConfigTabController.py", line 1232, in on_fd_panel_ok
>     return_list = self.fence_handler.validate_fencedevice(agent_type, None)
>   File "/usr/share/system-config-cluster/FenceHandler.py", line 713, in validate_fencedevice
>     returnlist = apply(self.fd_validate[agent_type], args)
>   File "/usr/share/system-config-cluster/FenceHandler.py", line 1188, in val_bladecenter_fd
>     if self.bc_ssh.get_active == True:
> AttributeError: 'NoneType' object has no attribute 'get_active'
> 
> 
>  I also tried the new GUI in luci, did the luci_admin init, but its
> service won't startup either -- it fails at:
> 
> 
> [root at atlantia ~]# sh -x /var/lib/luci/bin/runzope
> + PYTHON=/usr/bin/python
> + ZOPE_HOME=/usr/lib64/luci/zope
> + INSTANCE_HOME=/var/lib/luci
> + CONFIG_FILE=/var/lib/luci/etc/zope.conf
> + '[' -d /usr/lib64/luci/zope/lib64/python ']'
> + SOFTWARE_HOME=/usr/lib64/luci/zope/lib/python
> + PYTHONPATH=/usr/lib64/luci/zope/lib/python
> + export PYTHONPATH INSTANCE_HOME SOFTWARE_HOME
> + ZOPE_RUN=/usr/lib64/luci/zope/lib/python/Zope2/Startup/run.py
> + /bin/grep True /var/lib/luci/.default_password_has_been_reset
> + exec /usr/bin/python /usr/lib64/luci/zope/lib/python/Zope2/Startup/run.py -C /var/lib/luci/etc/zope.conf
> Traceback (most recent call last):
>   File "/usr/lib64/luci/zope/lib/python/Zope2/Startup/run.py", line 56, in ?
>   File "/usr/lib64/luci/zope/lib/python/Zope2/Startup/run.py", line 21, in run
>   File "/var/tmp/conga-0.11.1-5.el4-root-brewbuilder/usr/lib64/luci/zope/lib/python/Zope2/Startup/__init__.py", line 95, in prepare
>   File "/var/tmp/conga-0.11.1-5.el4-root-brewbuilder/usr/lib64/luci/zope/lib/python/Zope2/Startup/__init__.py", line 272, in makeLockFile
> ImportError: No module named misc.lock_file
> 
>  
> 
> What gives??
> 
> 
> 
> Robert Hurst, Sr. Cach? Administrator
> Beth Israel Deaconess Medical Center
> 1135 Tremont Street, REN-7
> Boston, Massachusetts   02120-2140
> 617-754-8754 ? Fax: 617-754-8730 ? Cell: 401-787-3154
> Any technology distinguishable from magic is insufficiently advanced.
>  
> 
> On Wed, 2009-01-07 at 16:57 -0500, Bob Peterson wrote: 
> 
> > ----- "Gary Romo" <garromo at us.ibm.com> wrote:
> > | When I opened system-config-cluster today, I got this error;
> > | 
> > | Poorly Formed XML Error
> > | 
> > | A problem was encountered while reading configuration file
> > | /etc/cluster/cluster.conf
> > | Details or the error appear below. Click the `New` button to create a
> > | new configuration file.
> > | To continue anyway (Not recommended), click the `Ok` button
> > | 
> > | Relax-NG validity error : Extra element rm in interleave
> > | /etc/cluster/cluster.conf:2: element cluster: Relax-NG validity error
> > | : Element cluster failed to validate content
> > | /etc/cluster/cluster.conf fails to validate
> > | 
> > | Can anyone tell me what this is and how to correct? Thanks!
> > | 
> > | Gary Romo 
> > 
> > Hi Gary,
> > 
> > Could it be:
> > 
> > http://sources.redhat.com/cluster/wiki/FAQ/GUI#gui_validityerror
> > 
> > Without seeing your cluster.conf it's hard to tell if it's a "real" error.
> > 
> > Regards,
> > 
> > Bob Peterson
> > Red Hat GFS
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> 
>  
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090127/cf0a3b08/attachment.htm>

From denisb+gmane at gmail.com  Tue Jan 27 14:48:42 2009
From: denisb+gmane at gmail.com (denis)
Date: Tue, 27 Jan 2009 15:48:42 +0100
Subject: [Linux-cluster] Re: changed mount options GFS
In-Reply-To: <1126758194.601541233066795021.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
References: <glmuqo$6da$1@ger.gmane.org>
	<1126758194.601541233066795021.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
Message-ID: <gln6sd$2lh$1@ger.gmane.org>

Bob Peterson wrote:
> | It turns out that GFS no longer accepts noatime or noquota? Removing
> | these mountoptions and I could again mount my GFS volume. Which one
> | is
> | now deprecated and why?
 > Hi Denis,
> 
> This sounds like a bug.  Can you open a bugzilla record for it?
> AFAIK, it was not our intent to remove those mount options.

Thanks for the quick reply.

I have opened bug#481762 [1]. As there is a simple workaround (removing
the mount-options) which has low-impact on the setup I will not open a
support-case in this matter. Would be nice to know what happened here
though.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=481762

Regards
-- 
Denis Braekhus
Team Lead Managed Services
Redpill Linpro AS - Changing the game



From theophanis_kontogiannis at yahoo.gr  Tue Jan 27 15:04:04 2009
From: theophanis_kontogiannis at yahoo.gr (Theophanis Kontogiannis)
Date: Tue, 27 Jan 2009 17:04:04 +0200
Subject: [Linux-cluster] Problem with Apache resource
Message-ID: <000001c98090$80487d10$80d97730$@gr>

Hello all,

 

I have 

 

2.6.18-92.1.10.el5.centos.plus #1 SMP 

system-config-cluster-1.0.52-1.1

modcluster-0.12.0-7.el5.centos

cluster-cim-0.12.0-7.el5.centos

cluster-snmp-0.12.0-7.el5.centos

lvm2-cluster-2.02.32-4.el5

 

I try to configure an apache resource and service.

 

My cluster.conf looks like this:

 

<?xml version="1.0"?>

<cluster alias="tweety" config_version="55" name="tweety">

        <fence_daemon clean_start="1" post_fail_delay="0"
post_join_delay="3"/>

        <clusternodes>

                <clusternode name="tweety1" nodeid="1" votes="1">

                        <fence>

                                <method name="1">

                                        <device name="PDU-1" port="4"/>

                                </method>

                                <method name="2">

                                        <device name="human-fence"
nodename="tweety1"/>

                                </method>

                        </fence>

                </clusternode>

                <clusternode name="tweety2" nodeid="2" votes="1">

                        <fence>

                                <method name="1">

                                        <device name="PDU-1" port="8"/>

                                </method>

                                <method name="2">

                                        <device name="human-fence"
nodename="tweety2"/>

                                </method>

                        </fence>

                </clusternode>

        </clusternodes>

        <cman expected_votes="1" two_node="1"/>

        <fencedevices>

                <fencedevice agent="fence_manual" name="human-fence"/>

                <fencedevice agent="fence_wti" ipaddr="192.168.1.10"
name="PDU-1" passwd="*****"/>

        </fencedevices>

        <rm>

                <failoverdomains>

                        <failoverdomain name="tweety-2" ordered="0"
restricted="1">

                                <failoverdomainnode name="tweety2"
priority="1"/>

                        </failoverdomain>

                        <failoverdomain name="tweety-1" ordered="0"
restricted="1">

                                <failoverdomainnode name="tweety1"
priority="1"/>

                        </failoverdomain>

                        <failoverdomain name="tweety-1-2" ordered="1"
restricted="1">

                                <failoverdomainnode name="tweety1"
priority="1"/>

                                <failoverdomainnode name="tweety2"
priority="2"/>

                        </failoverdomain>

                        <failoverdomain name="tweety-2-1" ordered="1"
restricted="1">

                                <failoverdomainnode name="tweety1"
priority="2"/>

                                <failoverdomainnode name="tweety2"
priority="1"/>

                        </failoverdomain>

                        <failoverdomain name="tweety" ordered="0"
restricted="1">

                                <failoverdomainnode name="tweety1"
priority="1"/>

                                <failoverdomainnode name="tweety2"
priority="1"/>

                        </failoverdomain>

                </failoverdomains>

                <resources>

                        <script file="/etc/init.d/boinc" name="BOINC"/>

                        <mysql
config_file="/mounts/DBs/mysql_c/config/my_cluster_tweety1.cnf"
listen_address="3306" mysql_options="" name="mysql-t1" shutdown_wait="30"/>

                        <apache config_file="/mounts/http/conf/httpd.conf"
httpd_options="" name="WebServer" server_root="/mounts/http/"
shutdown_wait="20"/>

                </resources>

                <service autostart="1" domain="tweety-1" exclusive="0"
name="BOINC-t1" recovery="restart">

                        <script ref="BOINC"/>

                </service>

                <service autostart="1" domain="tweety-2" exclusive="0"
name="BOINC-t2" recovery="restart">

                        <script ref="BOINC"/>

                </service>

                <service autostart="1" domain="tweety-1" name="mysqld_t1"
recovery="restart">

                        <mysql ref="mysql-t1"/>

                </service>

                <service autostart="1" domain="tweety-1-2"
name="WebServer-t1-t2" recovery="relocate">

                        <apache ref="WebServer">

                                <ip address="192.168.1.242"
monitor_link="1"/>

                        </apache>

                </service>

        </rm>

</cluster>

 

 

However whenever I try to start the WebServer service I get the following
errors:

 

Jan 27 17:00:54 localhost clurgmgrd[6884]: <notice> Starting disabled
service service:WebServer-t1-t2

Jan 27 17:00:54 localhost clurgmgrd: [6884]: <err> Checking Syntax Of The
File  [apache:WebServer] > Failed

Jan 27 17:00:54 localhost clurgmgrd[6884]: <notice> start on apache
"WebServer" returned 1 (generic error)

Jan 27 17:00:54 localhost clurgmgrd[6884]: <warning> #68: Failed to start
service:WebServer-t1-t2; return value: 1

Jan 27 17:00:54 localhost clurgmgrd[6884]: <notice> Stopping service
service:WebServer-t1-t2

Jan 27 17:00:54 localhost clurgmgrd: [6884]: <err> Checking Syntax Of The
File  [apache:WebServer] > Failed

Jan 27 17:00:54 localhost clurgmgrd[6884]: <notice> stop on apache
"WebServer" returned 1 (generic error)

Jan 27 17:00:54 localhost clurgmgrd[6884]: <crit> #12: RG
service:WebServer-t1-t2 failed to stop; intervention required

Jan 27 17:00:54 localhost clurgmgrd[6884]: <notice> Service
service:WebServer-t1-t2 is failed

Jan 27 17:00:54 localhost clurgmgrd[6884]: <crit> #13: Service
service:WebServer-t1-t2 failed to stop cleanly

 

 

Is this due to a bug or I really do something wrong?

 

 

Thank You All for Your Time,

 

Theophanis Kontogiannis

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090127/c1317986/attachment.htm>

From gordan at bobich.net  Tue Jan 27 15:24:31 2009
From: gordan at bobich.net (Gordan Bobic)
Date: Tue, 27 Jan 2009 15:24:31 +0000
Subject: [Linux-cluster] Re: changed mount options GFS
In-Reply-To: <gln6sd$2lh$1@ger.gmane.org>
References: <glmuqo$6da$1@ger.gmane.org>	<1126758194.601541233066795021.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
	<gln6sd$2lh$1@ger.gmane.org>
Message-ID: <6d6e9c2e890ef0335ce63b5d1722ba4d@localhost>

On Tue, 27 Jan 2009 15:48:42 +0100, denis <denisb+gmane at gmail.com> wrote:
> Bob Peterson wrote:
>> | It turns out that GFS no longer accepts noatime or noquota? Removing
>> | these mountoptions and I could again mount my GFS volume. Which one
>> | is
>> | now deprecated and why?
>> 
>> This sounds like a bug.  Can you open a bugzilla record for it?
>> AFAIK, it was not our intent to remove those mount options.

That's potentially quite a serious bug for those of us using GFS for the
root fs, as it renders the cluster unbootable. Without being aware of it in
advance it could lead to a whole world of pain. Thanks for reporting this,
Denis.

Gordan



From denisb+gmane at gmail.com  Tue Jan 27 15:30:12 2009
From: denisb+gmane at gmail.com (denis)
Date: Tue, 27 Jan 2009 16:30:12 +0100
Subject: [Linux-cluster] Less than smooth upgrade experience from
	RHEL5.2->5.3
Message-ID: <gln9a5$cb0$1@ger.gmane.org>

Hi,

I'll just describe my upgrade process today. Cluster is back to a
quorate and operational status, but I don't fully understand what
happened and any input on what to do differently next time would be nice.

This is a two-node cluster running qdisk (so 3 total votes), resources
are mysql and haproxy with SAN backed storage. Both nodes mount /var/www
with GFS on a SAN multipathed device.

1. Migrated all services to node B
2. Upgraded node A with yum
3. Rebooted node A
4. Node A rejoins cluster, and takes ownership of resource with
failoverdomain priority
5. I notice /var/www is not mounted on node A
6. The errormessage is descriptive enough so I remove mountoptions until
I can mount /var/www (remove noatime, noquota from fstab)
7. Migrated remaining service to node A
8. Upgraded node B with yum
9. Rebooted node B
10. When node B shuts down, node A instantly claims quorum lost and
dissolves the cluster
11. Upon rebooting, node B hangs as the cluster is inquorate
12. Eventually rebooting both nodes re-establishes quorum and cluster
services come up


The messages on node A from the point where cluster quorum was dissolved
say :

Jan 27 14:57:57 nodeb qdiskd[3806]: <info> Node 1 shutdown
Jan 27 14:58:03 nodeb clurgmgrd[4465]: <emerg> #1: Quorum Dissolved
Jan 27 14:58:03 nodeb kernel: dlm: closing connection to node 1
Jan 27 14:58:03 nodeb openais[3755]: [CMAN ] lost contact with quorum
device
Jan 27 14:58:03 nodeb openais[3755]: [CMAN ] quorum lost, blocking activity
Jan 27 14:58:03 nodeb ccsd[3681]: Cluster is not quorate.  Refusing
connection.
Jan 27 14:58:03 nodeb ccsd[3681]: Error while processing connect:
Connection refused
Jan 27 14:58:03 nodeb ccsd[3681]: Invalid descriptor specified (-111).
Jan 27 14:58:03 nodeb ccsd[3681]: Someone may be attempting something evil.


I am still scratching my head over why quorum was dissolved over booting
node B.

Regards
-- 
Denis Braekhus
Team Lead Managed Services
Redpill Linpro AS - Changing the game



From grimme at atix.de  Tue Jan 27 16:34:06 2009
From: grimme at atix.de (Marc Grimme)
Date: Tue, 27 Jan 2009 17:34:06 +0100
Subject: [Linux-cluster] Re: changed mount options GFS
In-Reply-To: <6d6e9c2e890ef0335ce63b5d1722ba4d@localhost>
References: <glmuqo$6da$1@ger.gmane.org> <gln6sd$2lh$1@ger.gmane.org>
	<6d6e9c2e890ef0335ce63b5d1722ba4d@localhost>
Message-ID: <200901271734.06963.grimme@atix.de>

Gordan,
we are aware of this bug already. If it is a bug we won't change anything and 
if not we'll detect u3 and then remove the default mountopts 
noatime,nodiratime.

In the meantime you can overwrite the defaultmountopts. But yes, this is 
serious for us.

Regards Marc.
On Tuesday 27 January 2009 16:24:31 Gordan Bobic wrote:
> On Tue, 27 Jan 2009 15:48:42 +0100, denis <denisb+gmane at gmail.com> wrote:
> > Bob Peterson wrote:
> >> | It turns out that GFS no longer accepts noatime or noquota? Removing
> >> | these mountoptions and I could again mount my GFS volume. Which one
> >> | is
> >> | now deprecated and why?
> >>
> >> This sounds like a bug.  Can you open a bugzilla record for it?
> >> AFAIK, it was not our intent to remove those mount options.
>
> That's potentially quite a serious bug for those of us using GFS for the
> root fs, as it renders the cluster unbootable. Without being aware of it in
> advance it could lead to a whole world of pain. Thanks for reporting this,
> Denis.
>
> Gordan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



-- 
Gruss / Regards,

Marc Grimme
Phone: +49-89 452 3538-14
http://www.atix.de/               http://www.open-sharedroot.org/

ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 |
85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org

------------------------------------------------------------
*** Besuchen Sie uns auf dem ATIX IT Solution Day: Linux Cluster-Technolgien, 
am 05.02.2009 in Neuss b. Koeln/Duesseldorf!
www.atix.de/event-archiv/atix-it-solution-day-linux-neuss ***
------------------------------------------------------------

Registergericht: Amtsgericht Muenchen, Registernummer: HRB 168930, USt.-Id.: 
DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas Merz (Vors.) |
Vorsitzender des Aufsichtsrats: Dr. Martin Buss



From gordan at bobich.net  Tue Jan 27 17:14:14 2009
From: gordan at bobich.net (Gordan Bobic)
Date: Tue, 27 Jan 2009 17:14:14 +0000
Subject: [Linux-cluster] Re: changed mount options GFS
In-Reply-To: <200901271734.06963.grimme@atix.de>
References: <glmuqo$6da$1@ger.gmane.org>
	<gln6sd$2lh$1@ger.gmane.org>	<6d6e9c2e890ef0335ce63b5d1722ba4d@localhost>
	<200901271734.06963.grimme@atix.de>
Message-ID: <76f9dda27863b963f8f7c8b80aba6815@localhost>

Marc,

Thanks for the workaround. I didn't get bitten by this, but it was close -
I was planning to update some of my clusters in the near future. I think
I'll now wait until this is fixed.

Gordan

On Tue, 27 Jan 2009 17:34:06 +0100, Marc Grimme <grimme at atix.de> wrote:
> Gordan,
> we are aware of this bug already. If it is a bug we won't change anything
> and if not we'll detect u3 and then remove the default mountopts 
> noatime,nodiratime.
> 
> In the meantime you can overwrite the defaultmountopts. But yes, this is 
> serious for us.
> 
> Regards Marc.
> On Tuesday 27 January 2009 16:24:31 Gordan Bobic wrote:
>> On Tue, 27 Jan 2009 15:48:42 +0100, denis <denisb+gmane at gmail.com>
wrote:
>> > Bob Peterson wrote:
>> >> | It turns out that GFS no longer accepts noatime or noquota?
Removing
>> >> | these mountoptions and I could again mount my GFS volume. Which one
>> >> | is
>> >> | now deprecated and why?
>> >>
>> >> This sounds like a bug.  Can you open a bugzilla record for it?
>> >> AFAIK, it was not our intent to remove those mount options.
>>
>> That's potentially quite a serious bug for those of us using GFS for the
>> root fs, as it renders the cluster unbootable. Without being aware of it
>> in
>> advance it could lead to a whole world of pain. Thanks for reporting
>> this,
>> Denis.
>>
>> Gordan
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster



From shang at ubuntu.com  Tue Jan 27 17:14:36 2009
From: shang at ubuntu.com (Shang Wu)
Date: Tue, 27 Jan 2009 12:14:36 -0500
Subject: [Linux-cluster] Problem with Apache resource
In-Reply-To: <000001c98090$80487d10$80d97730$@gr>
References: <000001c98090$80487d10$80d97730$@gr>
Message-ID: <c988e1dc0901270914i2cdaf0ch16e84bd218ef54b9@mail.gmail.com>

2009/1/27 Theophanis Kontogiannis <theophanis_kontogiannis at yahoo.gr>:
> Hello all,
>
>
>
> I have
>
>
>
> 2.6.18-92.1.10.el5.centos.plus #1 SMP
>
> system-config-cluster-1.0.52-1.1
>
> modcluster-0.12.0-7.el5.centos
>
> cluster-cim-0.12.0-7.el5.centos
>
> cluster-snmp-0.12.0-7.el5.centos
>
> lvm2-cluster-2.02.32-4.el5
>
>
>
> I try to configure an apache resource and service.
>
>
>
> My cluster.conf looks like this:
>
>
>
> <?xml version="1.0"?>
>
> <cluster alias="tweety" config_version="55" name="tweety">
>
>         <fence_daemon clean_start="1" post_fail_delay="0"
> post_join_delay="3"/>
>
>         <clusternodes>
>
>                 <clusternode name="tweety1" nodeid="1" votes="1">
>
>                         <fence>
>
>                                 <method name="1">
>
>                                         <device name="PDU-1" port="4"/>
>
>                                 </method>
>
>                                 <method name="2">
>
>                                         <device name="human-fence"
> nodename="tweety1"/>
>
>                                 </method>
>
>                         </fence>
>
>                 </clusternode>
>
>                 <clusternode name="tweety2" nodeid="2" votes="1">
>
>                         <fence>
>
>                                 <method name="1">
>
>                                         <device name="PDU-1" port="8"/>
>
>                                 </method>
>
>                                 <method name="2">
>
>                                         <device name="human-fence"
> nodename="tweety2"/>
>
>                                 </method>
>
>                         </fence>
>
>                 </clusternode>
>
>         </clusternodes>
>
>         <cman expected_votes="1" two_node="1"/>
>
>         <fencedevices>
>
>                 <fencedevice agent="fence_manual" name="human-fence"/>
>
>                 <fencedevice agent="fence_wti" ipaddr="192.168.1.10"
> name="PDU-1" passwd="*****"/>
>
>         </fencedevices>
>
>         <rm>
>
>                 <failoverdomains>
>
>                         <failoverdomain name="tweety-2" ordered="0"
> restricted="1">
>
>                                 <failoverdomainnode name="tweety2"
> priority="1"/>
>
>                         </failoverdomain>
>
>                         <failoverdomain name="tweety-1" ordered="0"
> restricted="1">
>
>                                 <failoverdomainnode name="tweety1"
> priority="1"/>
>
>                         </failoverdomain>
>
>                         <failoverdomain name="tweety-1-2" ordered="1"
> restricted="1">
>
>                                 <failoverdomainnode name="tweety1"
> priority="1"/>
>
>                                 <failoverdomainnode name="tweety2"
> priority="2"/>
>
>                         </failoverdomain>
>
>                         <failoverdomain name="tweety-2-1" ordered="1"
> restricted="1">
>
>                                 <failoverdomainnode name="tweety1"
> priority="2"/>
>
>                                 <failoverdomainnode name="tweety2"
> priority="1"/>
>
>                         </failoverdomain>
>
>                         <failoverdomain name="tweety" ordered="0"
> restricted="1">
>
>                                 <failoverdomainnode name="tweety1"
> priority="1"/>
>
>                                 <failoverdomainnode name="tweety2"
> priority="1"/>
>
>                         </failoverdomain>
>
>                 </failoverdomains>
>
>                 <resources>
>
>                         <script file="/etc/init.d/boinc" name="BOINC"/>
>
>                         <mysql
> config_file="/mounts/DBs/mysql_c/config/my_cluster_tweety1.cnf"
> listen_address="3306" mysql_options="" name="mysql-t1" shutdown_wait="30"/>
>
>                         <apache config_file="/mounts/http/conf/httpd.conf"
> httpd_options="" name="WebServer" server_root="/mounts/http/"
> shutdown_wait="20"/>
>
>                 </resources>
>
>                 <service autostart="1" domain="tweety-1" exclusive="0"
> name="BOINC-t1" recovery="restart">
>
>                         <script ref="BOINC"/>
>
>                 </service>
>
>                 <service autostart="1" domain="tweety-2" exclusive="0"
> name="BOINC-t2" recovery="restart">
>
>                         <script ref="BOINC"/>
>
>                 </service>
>
>                 <service autostart="1" domain="tweety-1" name="mysqld_t1"
> recovery="restart">
>
>                         <mysql ref="mysql-t1"/>
>
>                 </service>
>
>                 <service autostart="1" domain="tweety-1-2"
> name="WebServer-t1-t2" recovery="relocate">
>
>                         <apache ref="WebServer">
>
>                                 <ip address="192.168.1.242"
> monitor_link="1"/>
>
>                         </apache>
>
>                 </service>
>
>         </rm>
>
> </cluster>
>
>
>
>
>
> However whenever I try to start the WebServer service I get the following
> errors:
>
>
>
> Jan 27 17:00:54 localhost clurgmgrd[6884]: <notice> Starting disabled
> service service:WebServer-t1-t2
>
> Jan 27 17:00:54 localhost clurgmgrd: [6884]: <err> Checking Syntax Of The
> File  [apache:WebServer] > Failed
>
> Jan 27 17:00:54 localhost clurgmgrd[6884]: <notice> start on apache
> "WebServer" returned 1 (generic error)
>
> Jan 27 17:00:54 localhost clurgmgrd[6884]: <warning> #68: Failed to start
> service:WebServer-t1-t2; return value: 1
>
> Jan 27 17:00:54 localhost clurgmgrd[6884]: <notice> Stopping service
> service:WebServer-t1-t2
>
> Jan 27 17:00:54 localhost clurgmgrd: [6884]: <err> Checking Syntax Of The
> File  [apache:WebServer] > Failed
>
> Jan 27 17:00:54 localhost clurgmgrd[6884]: <notice> stop on apache
> "WebServer" returned 1 (generic error)
>
> Jan 27 17:00:54 localhost clurgmgrd[6884]: <crit> #12: RG
> service:WebServer-t1-t2 failed to stop; intervention required
>
> Jan 27 17:00:54 localhost clurgmgrd[6884]: <notice> Service
> service:WebServer-t1-t2 is failed
>
> Jan 27 17:00:54 localhost clurgmgrd[6884]: <crit> #13: Service
> service:WebServer-t1-t2 failed to stop cleanly
>
>
>
>
>
> Is this due to a bug or I really do something wrong?
>

Just my 2 cents, but did you modify the httpd.conf file to listen to
the IP address 192.168.1.242?

>
>
>
>
> Thank You All for Your Time,
>
>
>
> Theophanis Kontogiannis
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Shang Wu
----------------------------------------------------------------
Public Key: keyserver.ubuntu.com
Key ID: 4B2BCA02
Fingerprint: 4832 D5D0 D124 CE1D FD07  167A 3E93 FF44 4B2B CA02



From theophanis_kontogiannis at yahoo.gr  Tue Jan 27 17:28:36 2009
From: theophanis_kontogiannis at yahoo.gr (Theophanis Kontogiannis)
Date: Tue, 27 Jan 2009 19:28:36 +0200
Subject: [Linux-cluster] Problem with Apache resource
In-Reply-To: <c988e1dc0901270914i2cdaf0ch16e84bd218ef54b9@mail.gmail.com>
References: <000001c98090$80487d10$80d97730$@gr>
	<c988e1dc0901270914i2cdaf0ch16e84bd218ef54b9@mail.gmail.com>
Message-ID: <000301c980a4$b11d5050$1357f0f0$@gr>

Hi Shang,

Thank you for your time.

Yes I did so.

However you give me the opportunity to put down some more details.

The cluster is two nodes.

The plan is to load an httpd, with a number of IP & Name based virtual
servers, and attached to the cluster httpd service, all the IPs of the
virtual http severs.

So the resource, will start an httpd with a number of virtual servers, and
will also launch the related IPs.

Thank you all for your time.

T.K.


> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-
> bounces at redhat.com] On Behalf Of Shang Wu
> Sent: Tuesday, January 27, 2009 7:15 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] Problem with Apache resource
> 
> 2009/1/27 Theophanis Kontogiannis <theophanis_kontogiannis at yahoo.gr>:
> > Hello all,
> >
> >
> >
> > I have
> >
> >
> >
> > 2.6.18-92.1.10.el5.centos.plus #1 SMP
> >
> > system-config-cluster-1.0.52-1.1
> >
> > modcluster-0.12.0-7.el5.centos
> >
> > cluster-cim-0.12.0-7.el5.centos
> >
> > cluster-snmp-0.12.0-7.el5.centos
> >
> > lvm2-cluster-2.02.32-4.el5
> >
> >
> >
> > I try to configure an apache resource and service.
> >
> >
> >
> > My cluster.conf looks like this:
> >
> >
> >
> > <?xml version="1.0"?>
> >
> > <cluster alias="tweety" config_version="55" name="tweety">
> >
> >         <fence_daemon clean_start="1" post_fail_delay="0"
> > post_join_delay="3"/>
> >
> >         <clusternodes>
> >
> >                 <clusternode name="tweety1" nodeid="1" votes="1">
> >
> >                         <fence>
> >
> >                                 <method name="1">
> >
> >                                         <device name="PDU-1" port="4"/>
> >
> >                                 </method>
> >
> >                                 <method name="2">
> >
> >                                         <device name="human-fence"
> > nodename="tweety1"/>
> >
> >                                 </method>
> >
> >                         </fence>
> >
> >                 </clusternode>
> >
> >                 <clusternode name="tweety2" nodeid="2" votes="1">
> >
> >                         <fence>
> >
> >                                 <method name="1">
> >
> >                                         <device name="PDU-1" port="8"/>
> >
> >                                 </method>
> >
> >                                 <method name="2">
> >
> >                                         <device name="human-fence"
> > nodename="tweety2"/>
> >
> >                                 </method>
> >
> >                         </fence>
> >
> >                 </clusternode>
> >
> >         </clusternodes>
> >
> >         <cman expected_votes="1" two_node="1"/>
> >
> >         <fencedevices>
> >
> >                 <fencedevice agent="fence_manual" name="human-fence"/>
> >
> >                 <fencedevice agent="fence_wti" ipaddr="192.168.1.10"
> > name="PDU-1" passwd="*****"/>
> >
> >         </fencedevices>
> >
> >         <rm>
> >
> >                 <failoverdomains>
> >
> >                         <failoverdomain name="tweety-2" ordered="0"
> > restricted="1">
> >
> >                                 <failoverdomainnode name="tweety2"
> > priority="1"/>
> >
> >                         </failoverdomain>
> >
> >                         <failoverdomain name="tweety-1" ordered="0"
> > restricted="1">
> >
> >                                 <failoverdomainnode name="tweety1"
> > priority="1"/>
> >
> >                         </failoverdomain>
> >
> >                         <failoverdomain name="tweety-1-2" ordered="1"
> > restricted="1">
> >
> >                                 <failoverdomainnode name="tweety1"
> > priority="1"/>
> >
> >                                 <failoverdomainnode name="tweety2"
> > priority="2"/>
> >
> >                         </failoverdomain>
> >
> >                         <failoverdomain name="tweety-2-1" ordered="1"
> > restricted="1">
> >
> >                                 <failoverdomainnode name="tweety1"
> > priority="2"/>
> >
> >                                 <failoverdomainnode name="tweety2"
> > priority="1"/>
> >
> >                         </failoverdomain>
> >
> >                         <failoverdomain name="tweety" ordered="0"
> > restricted="1">
> >
> >                                 <failoverdomainnode name="tweety1"
> > priority="1"/>
> >
> >                                 <failoverdomainnode name="tweety2"
> > priority="1"/>
> >
> >                         </failoverdomain>
> >
> >                 </failoverdomains>
> >
> >                 <resources>
> >
> >                         <script file="/etc/init.d/boinc" name="BOINC"/>
> >
> >                         <mysql
> > config_file="/mounts/DBs/mysql_c/config/my_cluster_tweety1.cnf"
> > listen_address="3306" mysql_options="" name="mysql-t1"
> shutdown_wait="30"/>
> >
> >                         <apache
> config_file="/mounts/http/conf/httpd.conf"
> > httpd_options="" name="WebServer" server_root="/mounts/http/"
> > shutdown_wait="20"/>
> >
> >                 </resources>
> >
> >                 <service autostart="1" domain="tweety-1" exclusive="0"
> > name="BOINC-t1" recovery="restart">
> >
> >                         <script ref="BOINC"/>
> >
> >                 </service>
> >
> >                 <service autostart="1" domain="tweety-2" exclusive="0"
> > name="BOINC-t2" recovery="restart">
> >
> >                         <script ref="BOINC"/>
> >
> >                 </service>
> >
> >                 <service autostart="1" domain="tweety-1"
> name="mysqld_t1"
> > recovery="restart">
> >
> >                         <mysql ref="mysql-t1"/>
> >
> >                 </service>
> >
> >                 <service autostart="1" domain="tweety-1-2"
> > name="WebServer-t1-t2" recovery="relocate">
> >
> >                         <apache ref="WebServer">
> >
> >                                 <ip address="192.168.1.242"
> > monitor_link="1"/>
> >
> >                         </apache>
> >
> >                 </service>
> >
> >         </rm>
> >
> > </cluster>
> >
> >
> >
> >
> >
> > However whenever I try to start the WebServer service I get the
> following
> > errors:
> >
> >
> >
> > Jan 27 17:00:54 localhost clurgmgrd[6884]: <notice> Starting disabled
> > service service:WebServer-t1-t2
> >
> > Jan 27 17:00:54 localhost clurgmgrd: [6884]: <err> Checking Syntax Of
> The
> > File  [apache:WebServer] > Failed
> >
> > Jan 27 17:00:54 localhost clurgmgrd[6884]: <notice> start on apache
> > "WebServer" returned 1 (generic error)
> >
> > Jan 27 17:00:54 localhost clurgmgrd[6884]: <warning> #68: Failed to
> start
> > service:WebServer-t1-t2; return value: 1
> >
> > Jan 27 17:00:54 localhost clurgmgrd[6884]: <notice> Stopping service
> > service:WebServer-t1-t2
> >
> > Jan 27 17:00:54 localhost clurgmgrd: [6884]: <err> Checking Syntax Of
> The
> > File  [apache:WebServer] > Failed
> >
> > Jan 27 17:00:54 localhost clurgmgrd[6884]: <notice> stop on apache
> > "WebServer" returned 1 (generic error)
> >
> > Jan 27 17:00:54 localhost clurgmgrd[6884]: <crit> #12: RG
> > service:WebServer-t1-t2 failed to stop; intervention required
> >
> > Jan 27 17:00:54 localhost clurgmgrd[6884]: <notice> Service
> > service:WebServer-t1-t2 is failed
> >
> > Jan 27 17:00:54 localhost clurgmgrd[6884]: <crit> #13: Service
> > service:WebServer-t1-t2 failed to stop cleanly
> >
> >
> >
> >
> >
> > Is this due to a bug or I really do something wrong?
> >
> 
> Just my 2 cents, but did you modify the httpd.conf file to listen to
> the IP address 192.168.1.242?
> 
> >
> >
> >
> >
> > Thank You All for Your Time,
> >
> >
> >
> > Theophanis Kontogiannis
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> 
> 
> 
> --
> Shang Wu
> ----------------------------------------------------------------
> Public Key: keyserver.ubuntu.com
> Key ID: 4B2BCA02
> Fingerprint: 4832 D5D0 D124 CE1D FD07  167A 3E93 FF44 4B2B CA02
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From vcmarti at sph.emory.edu  Tue Jan 27 17:29:54 2009
From: vcmarti at sph.emory.edu (Vernard C. Martin)
Date: Tue, 27 Jan 2009 12:29:54 -0500
Subject: [Linux-cluster] Trouble adding back in an old node
In-Reply-To: <497EDF6B.8020806@epits.com.au>
References: <497A3E19.6070904@sph.emory.edu> <497EDF6B.8020806@epits.com.au>
Message-ID: <497F4492.4060407@sph.emory.edu>


>> <?xml version="2.0"?>
>> <cluster alias="rsph_centos_5" config_version="41" name="rsph_centos_5">
>>        <fence_daemon clean_start="1" post_fail_delay="30" 
>> post_join_delay="90"/>
>>   [ ... lines deleted for brevity ... ]   
>>        <cman/>

> You have a <cman/> to close the cman stanza in cluster.conf, but no 
> actual <cman parameter1=1 parameter2=2> to open it.  Is this correct?
>
> The cman stanza is where you would define expected_votes on the 
> cluster, so not having this present is perhaps the reason why ccsd 
> believes the cluster is inquorate?
thanks. I didn't see that.  I've been using the GUI tools to manipulate 
the cluster.conf file so that I could hopefully avoid omissions just 
like this one. Since obviously that isn't working, it seem to me that 
I'm going to have to shut all the cluster services down, edit the 
clutser.conf file by hand, propagate it out there. or is there a quicker 
way to do that with cman_tool or somesuch?

Thanks in advance




-- 
Vernard Martin 	Applications Developer/Analyst
Email: vcmarti at sph.emory.edu 	Desk:404.727.2076
Office of Information Technology 	-Rollins School of Public Health



From schlegel at riege.com  Tue Jan 27 21:03:35 2009
From: schlegel at riege.com (Gunther Schlegel)
Date: Tue, 27 Jan 2009 22:03:35 +0100
Subject: [Linux-cluster] cman startup after after update to 5.3
Message-ID: <497F76A7.1040702@riege.com>

Hello,

I updated one node from 5.2 to 5.3 using yum update and now cman does 
not start up anymore -- looks like ccsd has some problems:

[root at motel6 /]# /sbin/ccsd -4 -n
Starting ccsd 2.0.98:
  Built: Dec  3 2008 16:32:30
  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
   IP Protocol:: IPv4 only
   No Daemon:: SET

Cluster is not quorate.  Refusing connection.
Error while processing connect: Connection refused
Cluster is not quorate.  Refusing connection.
Error while processing connect: Connection refused
Unable to connect to cluster infrastructure after 30 seconds.
Unable to connect to cluster infrastructure after 60 seconds.


When starting ccsd using /etc/init.d/cman it reports all three nodes to 
be on cluster.conf version 78, so I guess it is not a network 
connectivity problem.

The other two nodes (still on 5.2z) of the cluster are up and running 
with quorum. Openais is talking to those 2 other nodes and it looks fine 
to me:

Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] Members Joined:
Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0) ip(10.11.5.22)
Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0) ip(10.11.5.23)
Jan 27 21:05:26 motel6 openais[1278]: [SYNC ] This node is within the 
primary component and will provide service.
Jan 27 21:05:26 motel6 openais[1278]: [TOTEM] entering OPERATIONAL state.
Jan 27 21:05:26 motel6 openais[1278]: [CMAN ] quorum regained, resuming 
activity
Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message 
10.11.5.21
Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message 
10.11.5.22
Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message 
10.11.5.23


I am a bit lost...

cluster.conf:
[root at motel6 init.d]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="RSIXENCluster2" config_version="87" name="RSIXENCluster2">
	<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="concorde.riege.de" nodeid="1" votes="1">
			<fence>
				<method name="1">
					<device name="Concorde_IPMI"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="motel6.riege.de" nodeid="2" votes="1">
			<fence>
				<method name="1">
					<device name="Motel6_IPMI"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="mercure.riege.de" nodeid="3" votes="1">
			<fence>
				<method name="1">
					<device name="Mercure_IPMI"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<fencedevices>
		<fencedevice agent="fence_ipmilan" ipaddr="10.11.5.132" login="root" 
name="Concorde_IPMI" passwd="XXX"/>
		<fencedevice agent="fence_ipmilan" ipaddr="10.11.5.131" login="root" 
name="Motel6_IPMI" passwd="xxx"/>
		<fencedevice agent="fence_ipmilan" ipaddr="10.11.5.133" login="root" 
name="Mercure_IPMI" passwd="XXX"/>
	</fencedevices>
	<rm>
		<failoverdomains>
			<failoverdomain name="Earth" nofailback="1" ordered="1" restricted="1">
				<failoverdomainnode name="concorde.riege.de" priority="1"/>
				<failoverdomainnode name="motel6.riege.de" priority="1"/>
				<failoverdomainnode name="mercure.riege.de" priority="1"/>
			</failoverdomain>
			<failoverdomain name="Europe" nofailback="0" ordered="1" restricted="0">
				<failoverdomainnode name="concorde.riege.de" priority="2"/>
			</failoverdomain>
			<failoverdomain name="North America" nofailback="0" ordered="1" 
restricted="0">
				<failoverdomainnode name="motel6.riege.de" priority="2"/>
			</failoverdomain>
			<failoverdomain name="Africa" nofailback="0" ordered="1" restricted="0">
				<failoverdomainnode name="mercure.riege.de" priority="1"/>
			</failoverdomain>
		</failoverdomains>
		<resources/>
		<vm autostart="1" domain="Africa" exclusive="0" migrate="live" 
name="vm64.test.riege.de_64" path="/etc/xen" recovery="restart"/>
		<vm autostart="1" domain="North America" exclusive="0" migrate="pause" 
name="rt.test.riege.de_32" path="/etc/xen" recovery="restart"/>
		<vm autostart="1" domain="Africa" exclusive="0" migrate="pause" 
name="poincare.riege.de_32" path="/etc/xen" recovery="restart"/>
		<vm autostart="1" domain="North America" exclusive="0" migrate="live" 
name="jboss.dev.riege.de_64" path="/etc/xen" recovery="relocate"/>
		<vm autostart="1" domain="Africa" exclusive="0" migrate="live" 
name="master.cc3.dev.riege.de_64" path="/etc/xen" recovery="relocate"/>
		<vm autostart="1" domain="Europe" exclusive="0" migrate="pause" 
name="test.alphatrans.scope.riege.com_32" path="/etc/xen" 
recovery="relocate"/>
		<vm autostart="1" domain="North America" exclusive="0" migrate="live" 
name="slave.cc3.dev.riege.de_64" path="/etc/xen" recovery="restart"/>
		<vm autostart="1" domain="North America" exclusive="0" migrate="live" 
name="webmail.riege.com_64" path="/etc/xen" recovery="relocate"/>
		<vm autostart="1" domain="Europe" exclusive="0" migrate="live" 
name="live.rsi.scope.riege.com_64" path="/etc/xen" recovery="relocate"/>
		<vm autostart="1" domain="Europe" exclusive="0" migrate="pause" 
name="qa-16.rsi.scope.riege.com_32" path="/etc/xen" recovery="relocate"/>
		<vm autostart="1" domain="Africa" exclusive="0" migrate="pause" 
name="qa-18.rsi.scope.riege.com_32" path="/etc/xen" recovery="relocate"/>
		<vm autostart="1" domain="Africa" exclusive="0" migrate="pause" 
name="vm32.test.riege.de_32" path="/etc/xen" recovery="restart"/>
		<vm autostart="1" domain="Europe" exclusive="0" migrate="pause" 
name="qa-head.rsi.scope.riege.com_32" path="/etc/xen" recovery="restart"/>
		<vm autostart="1" domain="North America" exclusive="0" migrate="live" 
name="mq.dev.riege.de_64" path="/etc/xen" recovery="relocate"/>
		<vm autostart="1" domain="Europe" exclusive="0" migrate="live" 
name="archive.dev.riege.de_64" path="/etc/xen" recovery="restart"/>
	</rm>
	<cman quorum_dev_poll="50000"/>
	<totem consensus="4800" join="60" token="60000" 
token_retransmits_before_loss_const="20"/>
	<quorumd device="/dev/mapper/Quorum_Partition" interval="3" 
min_score="1" tko="10" votes="2"/>
</cluster>

best regards, Gunther

-- 
.............................................................
Riege Software International GmbH  Fon: +49 (2159) 9148 0
Mollsfeld 10                       Fax: +49 (2159) 9148 11
40670 Meerbusch                    Web: www.riege.com
Germany                            E-Mail: schlegel at riege.com
---                                ---
Handelsregister:                   Managing Directors:
Amtsgericht Neuss HRB-NR 4207      Christian Riege
USt-ID-Nr.: DE120585842            Gabriele  Riege
                                   Johannes  Riege
.............................................................
           YOU CARE FOR FREIGHT, WE CARE FOR YOU          



-------------- next part --------------
A non-text attachment was scrubbed...
Name: schlegel.vcf
Type: text/x-vcard
Size: 344 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090127/ec0ca44f/attachment.vcf>

From alan.zg at gmail.com  Tue Jan 27 21:25:38 2009
From: alan.zg at gmail.com (Alan A)
Date: Tue, 27 Jan 2009 15:25:38 -0600
Subject: [Linux-cluster] cman startup after after update to 5.3
In-Reply-To: <497F76A7.1040702@riege.com>
References: <497F76A7.1040702@riege.com>
Message-ID: <fac531740901271325l58ea7e47leff51e67c1351992@mail.gmail.com>

I just opened RHEL case number 1890184 regarding the same issue. First
Kernel would not start due to the HP ILO driver conflict, but at the same
time CMAN broke, and fencing fails. I rolled back cman rpm to the previous
version but problem persists. Something else changed to affect CMAN not
starting again.

2009/1/27 Gunther Schlegel <schlegel at riege.com>

> Hello,
>
> I updated one node from 5.2 to 5.3 using yum update and now cman does not
> start up anymore -- looks like ccsd has some problems:
>
> [root at motel6 /]# /sbin/ccsd -4 -n
> Starting ccsd 2.0.98:
>  Built: Dec  3 2008 16:32:30
>  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
>  IP Protocol:: IPv4 only
>  No Daemon:: SET
>
> Cluster is not quorate.  Refusing connection.
> Error while processing connect: Connection refused
> Cluster is not quorate.  Refusing connection.
> Error while processing connect: Connection refused
> Unable to connect to cluster infrastructure after 30 seconds.
> Unable to connect to cluster infrastructure after 60 seconds.
>
>
> When starting ccsd using /etc/init.d/cman it reports all three nodes to be
> on cluster.conf version 78, so I guess it is not a network connectivity
> problem.
>
> The other two nodes (still on 5.2z) of the cluster are up and running with
> quorum. Openais is talking to those 2 other nodes and it looks fine to me:
>
> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] Members Joined:
> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0) ip(10.11.5.22)
> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0) ip(10.11.5.23)
> Jan 27 21:05:26 motel6 openais[1278]: [SYNC ] This node is within the
> primary component and will provide service.
> Jan 27 21:05:26 motel6 openais[1278]: [TOTEM] entering OPERATIONAL state.
> Jan 27 21:05:26 motel6 openais[1278]: [CMAN ] quorum regained, resuming
> activity
> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message
> 10.11.5.21
> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message
> 10.11.5.22
> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message
> 10.11.5.23
>
>
> I am a bit lost...
>
> cluster.conf:
> [root at motel6 init.d]# cat /etc/cluster/cluster.conf
> <?xml version="1.0"?>
> <cluster alias="RSIXENCluster2" config_version="87" name="RSIXENCluster2">
>        <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="3"/>
>        <clusternodes>
>                <clusternode name="concorde.riege.de" nodeid="1" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device name="Concorde_IPMI"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="motel6.riege.de" nodeid="2" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device name="Motel6_IPMI"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="mercure.riege.de" nodeid="3" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device name="Mercure_IPMI"/>
>                                </method>
>                        </fence>
>                </clusternode>
>        </clusternodes>
>        <fencedevices>
>                <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.132"
> login="root" name="Concorde_IPMI" passwd="XXX"/>
>                <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.131"
> login="root" name="Motel6_IPMI" passwd="xxx"/>
>                <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.133"
> login="root" name="Mercure_IPMI" passwd="XXX"/>
>        </fencedevices>
>        <rm>
>                <failoverdomains>
>                        <failoverdomain name="Earth" nofailback="1"
> ordered="1" restricted="1">
>                                <failoverdomainnode name="concorde.riege.de"
> priority="1"/>
>                                <failoverdomainnode name="motel6.riege.de"
> priority="1"/>
>                                <failoverdomainnode name="mercure.riege.de"
> priority="1"/>
>                        </failoverdomain>
>                        <failoverdomain name="Europe" nofailback="0"
> ordered="1" restricted="0">
>                                <failoverdomainnode name="concorde.riege.de"
> priority="2"/>
>                        </failoverdomain>
>                        <failoverdomain name="North America" nofailback="0"
> ordered="1" restricted="0">
>                                <failoverdomainnode name="motel6.riege.de"
> priority="2"/>
>                        </failoverdomain>
>                        <failoverdomain name="Africa" nofailback="0"
> ordered="1" restricted="0">
>                                <failoverdomainnode name="mercure.riege.de"
> priority="1"/>
>                        </failoverdomain>
>                </failoverdomains>
>                <resources/>
>                <vm autostart="1" domain="Africa" exclusive="0"
> migrate="live" name="vm64.test.riege.de_64" path="/etc/xen"
> recovery="restart"/>
>                <vm autostart="1" domain="North America" exclusive="0"
> migrate="pause" name="rt.test.riege.de_32" path="/etc/xen"
> recovery="restart"/>
>                <vm autostart="1" domain="Africa" exclusive="0"
> migrate="pause" name="poincare.riege.de_32" path="/etc/xen"
> recovery="restart"/>
>                <vm autostart="1" domain="North America" exclusive="0"
> migrate="live" name="jboss.dev.riege.de_64" path="/etc/xen"
> recovery="relocate"/>
>                <vm autostart="1" domain="Africa" exclusive="0"
> migrate="live" name="master.cc3.dev.riege.de_64" path="/etc/xen"
> recovery="relocate"/>
>                <vm autostart="1" domain="Europe" exclusive="0"
> migrate="pause" name="test.alphatrans.scope.riege.com_32" path="/etc/xen"
> recovery="relocate"/>
>                <vm autostart="1" domain="North America" exclusive="0"
> migrate="live" name="slave.cc3.dev.riege.de_64" path="/etc/xen"
> recovery="restart"/>
>                <vm autostart="1" domain="North America" exclusive="0"
> migrate="live" name="webmail.riege.com_64" path="/etc/xen"
> recovery="relocate"/>
>                <vm autostart="1" domain="Europe" exclusive="0"
> migrate="live" name="live.rsi.scope.riege.com_64" path="/etc/xen"
> recovery="relocate"/>
>                <vm autostart="1" domain="Europe" exclusive="0"
> migrate="pause" name="qa-16.rsi.scope.riege.com_32" path="/etc/xen"
> recovery="relocate"/>
>                <vm autostart="1" domain="Africa" exclusive="0"
> migrate="pause" name="qa-18.rsi.scope.riege.com_32" path="/etc/xen"
> recovery="relocate"/>
>                <vm autostart="1" domain="Africa" exclusive="0"
> migrate="pause" name="vm32.test.riege.de_32" path="/etc/xen"
> recovery="restart"/>
>                <vm autostart="1" domain="Europe" exclusive="0"
> migrate="pause" name="qa-head.rsi.scope.riege.com_32" path="/etc/xen"
> recovery="restart"/>
>                <vm autostart="1" domain="North America" exclusive="0"
> migrate="live" name="mq.dev.riege.de_64" path="/etc/xen"
> recovery="relocate"/>
>                <vm autostart="1" domain="Europe" exclusive="0"
> migrate="live" name="archive.dev.riege.de_64" path="/etc/xen"
> recovery="restart"/>
>        </rm>
>        <cman quorum_dev_poll="50000"/>
>        <totem consensus="4800" join="60" token="60000"
> token_retransmits_before_loss_const="20"/>
>        <quorumd device="/dev/mapper/Quorum_Partition" interval="3"
> min_score="1" tko="10" votes="2"/>
> </cluster>
>
> best regards, Gunther
>
> --
> .............................................................
> Riege Software International GmbH  Fon: +49 (2159) 9148 0
> Mollsfeld 10                       Fax: +49 (2159) 9148 11
> 40670 Meerbusch                    Web: www.riege.com
> Germany                            E-Mail: schlegel at riege.com
> ---                                ---
> Handelsregister:                   Managing Directors:
> Amtsgericht Neuss HRB-NR 4207      Christian Riege
> USt-ID-Nr.: DE120585842            Gabriele  Riege
>                                  Johannes  Riege
> .............................................................
>          YOU CARE FOR FREIGHT, WE CARE FOR YOU
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090127/e41b438f/attachment.htm>

From stewart at epits.com.au  Wed Jan 28 09:40:35 2009
From: stewart at epits.com.au (Stewart Walters)
Date: Wed, 28 Jan 2009 18:40:35 +0900
Subject: [Linux-cluster] Trouble adding back in an old node
In-Reply-To: <497F4492.4060407@sph.emory.edu>
References: <497A3E19.6070904@sph.emory.edu> <497EDF6B.8020806@epits.com.au>
	<497F4492.4060407@sph.emory.edu>
Message-ID: <49802813.9070005@epits.com.au>

Vernard C. Martin wrote:
>
>>> <?xml version="2.0"?>
>>> <cluster alias="rsph_centos_5" config_version="41" 
>>> name="rsph_centos_5">
>>>        <fence_daemon clean_start="1" post_fail_delay="30" 
>>> post_join_delay="90"/>
>>>   [ ... lines deleted for brevity ... ]          <cman/>
>
>> You have a <cman/> to close the cman stanza in cluster.conf, but no 
>> actual <cman parameter1=1 parameter2=2> to open it.  Is this correct?
>>
>> The cman stanza is where you would define expected_votes on the 
>> cluster, so not having this present is perhaps the reason why ccsd 
>> believes the cluster is inquorate?
> thanks. I didn't see that.  I've been using the GUI tools to 
> manipulate the cluster.conf file so that I could hopefully avoid 
> omissions just like this one. Since obviously that isn't working, it 
> seem to me that I'm going to have to shut all the cluster services 
> down, edit the clutser.conf file by hand, propagate it out there. or 
> is there a quicker way to do that with cman_tool or somesuch?
>
> Thanks in advance
>
>
>
>
I tend to use the GUI tools myself for that same reason, but it's 
happened to me before as well (and by the looks of one or two of the 
posts here, we're not the only ones).

So I guess conga and the system-config-cluster aren't 100% at building 
config files that work.  Sometimes they get in a twist and you have to 
get your hands dirty :-)

To fix this, I think (from memory) you need to copy 
/etc/cluster/cluster.conf to an alternate location on any one of the 
nodes (i.e. cp /etc/cluster/cluster.conf /tmp)

Then modify the alternate version from having <cluster 
config_version=xxx> to be config_version=xxx+1 (or otherwise increment 
the number to one that is higher than has been previously used).  For 
example you would change <cluster config_version=37> to <cluster 
config_version=38>.

Then add the appropriate <cman> line with all the options that are 
appropriate (including expected_votes).  To give you an example of where 
the line should sit, here is an excert from a working two-node cluster:

        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>

In otherwords, put the <cman> after </clusternodes> but before 
<fencedevices>.

Then run:

[root at node01 ~]# ccs_tool update /path/to/fixed/cluster.conf

It should distribute the new file to all nodes in the cluster.

Regards,

Stewart



From pasik at iki.fi  Wed Jan 28 12:57:06 2009
From: pasik at iki.fi (Pasi =?iso-8859-1?Q?K=E4rkk=E4inen?=)
Date: Wed, 28 Jan 2009 14:57:06 +0200
Subject: [Linux-cluster] Online resizing CLVM PVs (with RHEL 5.3)
In-Reply-To: <20081204125554.GD15052@edu.joroinen.fi>
References: <20081001132641.GO9714@edu.joroinen.fi>
	<20081204125554.GD15052@edu.joroinen.fi>
Message-ID: <20090128125705.GD15052@edu.joroinen.fi>

On Thu, Dec 04, 2008 at 02:55:54PM +0200, Pasi K?rkk?inen wrote:
> On Wed, Oct 01, 2008 at 04:26:41PM +0300, Pasi K?rkk?inen wrote:
> > Hello list!
> > 
> > I've been trying out RHEL 5.3 beta/test packages for kernel and
> > device-mapper-multipath that allows online resizing of SCSI and dm-mpath
> > devices.
> > 
> > I've been able to successfully online resize:
> > 
> > - iSCSI LUNs
> > - dm-mpath device that is on top of those iSCSI LUNs
> > - LVM PV on top of that dm-mpath device
> > - LVM volume from that PV/VG
> > - And ext3 filesystem on top of that LVM volume
> > 
> > Now I'm wondering about online resizing shared/clustered CLVM PV.
> > 
> > Should it work just like that..? Make sure all servers in the cluster see 
> > SCSI and dm-mpath devices resized, and after that just run pvresize? 
> > 
> > Thanks!
> > 
> 
> Any comments about this? 
> 

And now when RHEL 5.3 is out.. ? :)

Has anyone done this already? 

-- Pasi



From peyrardj at bull.net  Wed Jan 28 13:56:45 2009
From: peyrardj at bull.net (Johann Peyrard)
Date: Wed, 28 Jan 2009 14:56:45 +0100
Subject: [Linux-cluster] Cluster reboot problems
In-Reply-To: <200901231432.29333.m.watts@eris.qinetiq.com>
References: <200901231432.29333.m.watts@eris.qinetiq.com>
Message-ID: <20090128135645.GA4954@FRECB007985.frec.bull.fr>

I have found this on cluster-2.03.11/doc/usage.txt : 

- To avoid unnecessary fencing when starting the cluster, it's best for
  all nodes to join the cluster (complete cman_tool join) before any
  of them do fence_tool join.



I think something should be fix to resolve this issue.
It is a real problem on a "production" system.

When the fencing domain is closed node (after a fence_tool join),  node could not enter the cluster.
You have to do at the same time on all nodes : 
	#cman_tool join
	#fence_tool join

Strange behaviour... i have this problem on RHEL 5.3.


On Fri, Jan 23, 2009 at 02:32:25PM +0000, Mark Watts wrote:
> 
> Hi,
> 
> I've got a 3-node RHEL 5.3 cluster. I'm running the cluster nodes as XEN Dom0 
> domains so I can deploy DomU domains as vm services within  the cluster.
> Hardware is:
> 
> 3 x Dell PowerEdge 1855 blades
> 2 x Dell PowerConnect 5316M Ethernet modules (for eth0 and eth1)
> 
> I have a 4th blade acting as an iSCSI target, exporting a 2GB and two 20GB 
> targets. The 2GB target is used as /etc/xen/ on the cluster nodes, mounted as 
> a _netdev mount in /etc/fstab on the cluster nodes (mounted on /xen, with 
> symlinks from /etc/xen to /xen/xen).
> All network traffic uses the same switch module, since I'm only using eth0 at 
> this time.
> 
> To install the nodes, I'm kickstarting from a Satellite, and doing a "yum 
> update" followed by a reboot to get to RHEL 5.3.
> I also deploy the same cluster.conf to each node (appended to this email).
> I then bring up cman, rgmanager. clvmd and gfs on all nodes (using the "Send 
> input to all sessions" feature of Konsole to start the services at the same 
> time on all nodes). This brings up the cluster, and allows me to mount the 
> iSCSI target for /xen.
> Starting xend allows me to enable the vm service listed in cluster.conf 
> (clusvcadm -e vm:node1)
> Oh, I also log *.* to a syslog server so I can see all the logs in one place.
> 
> Nodes are:
> 	c1.eris.qinetiq.com
> 	c2.eris.qinetiq.com
> 	c3.eris.qinetiq.com
> 
> "So far so good", I think.
> 
> So, I enable cman, rgmanager, clvmd, gfs and xend to start on boot and reboot 
> the cluster (all three nodes at the same time)
> 
> At which point everything starts to fall apart.
> 
> As the nodes come up and try and create a cluster, nodes c1 and c2 appear to 
> form a cluster, and then fence node c3 when it joins.
> 
> When node c3 comes back up and tries to join the cluster, node c1 decides the 
> cluster is no-longer quorate, and fences node c2.
> When node c2 comes back up and tries to join the cluster, node c1 decides the 
> cluster is no-longer quorate, and fences node c3.
> 
> This then continues for as long as I'm entertained watching the logs, and 
> switch off all three servers.
> 
> 
> Does anyone have any insight as to what the difference is between starting the 
> cluster services manually, and starting them at boot is, and why that 
> difference (because I can't think of any other difference between the two 
> states) would cause me to never gain a stable cluster?
> 
> I'm at a bit of a loss really - I moved from a 2-node cluster to a 3-node one 
> to try and avoid exactly these problems.
> I've also had the same problem with a CentOS 5.2 cluster on the same 
> hardware - in that case the nodes were still fencing each other the following 
> morning, 18 hours later!
> 
> 
> Regards,
> 
> Mark.
> 
> -- 
> Mark Watts BSc RHCE MBCS
> Senior Systems Engineer
> QinetiQ Applied Technologies
> GPG Key: http://www.linux-corner.info/mwatts.gpg

> <?xml version="1.0"?>
> <cluster alias="WebFarmTest" config_version="1" name="WebFarmTest">
>         <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="c1.eris.qinetiq.com" nodeid="1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="DRACMC" modulename="Server-1" action="Off"/>
>                                         <device name="DRACMC" modulename="Server-1" action="On"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="c2.eris.qinetiq.com" nodeid="2" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="DRACMC" modulename="Server-2" action="Off"/>
>                                         <device name="DRACMC" modulename="Server-2" action="On"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="c3.eris.qinetiq.com" nodeid="3" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="DRACMC" modulename="Server-3" action="Off"/>
>                                         <device name="DRACMC" modulename="Server-3" action="On"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="2"/>
>         <fencedevices>
>                 <fencedevice agent="fence_drac" ipaddr="XXX" login="XXX" name="DRACMC" passwd="XXX"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="webfarm-fd" nofailback="0" ordered="0" restricted="1">
>                                 <failoverdomainnode name="c1.eris.qinetiq.com" priority="1"/>
>                                 <failoverdomainnode name="c2.eris.qinetiq.com" priority="1"/>
>                                 <failoverdomainnode name="c3.eris.qinetiq.com" priority="1"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources/>
>                 <vm autostart="1" domain="webfarm-fd" exclusive="1" migrate="live" name="node1" path="/etc/xen/" recovery="relocate"/>
>         </rm>
> </cluster>




> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From jeder at invision.net  Wed Jan 28 15:03:36 2009
From: jeder at invision.net (Jeremy Eder)
Date: Wed, 28 Jan 2009 10:03:36 -0500
Subject: [Linux-cluster] cluster.conf linting
Message-ID: <1734CA24F5FC1848880E6B1AB788DD7701E3DEE6@INV-EX1.ad.invision.net>

i remember there is a python or other script that lints cluster.conf...
anyone remember the name, and/or rpm that provides it ?


--jer




From jparsons at redhat.com  Wed Jan 28 16:30:15 2009
From: jparsons at redhat.com (jim parsons)
Date: Wed, 28 Jan 2009 11:30:15 -0500
Subject: [Linux-cluster] cluster.conf linting
In-Reply-To: <1734CA24F5FC1848880E6B1AB788DD7701E3DEE6@INV-EX1.ad.invision.net>
References: <1734CA24F5FC1848880E6B1AB788DD7701E3DEE6@INV-EX1.ad.invision.net>
Message-ID: <1233160215.3330.6.camel@localhost.localdomain>

On Wed, 2009-01-28 at 10:03 -0500, Jeremy Eder wrote:
> i remember there is a python or other script that lints cluster.conf...
> anyone remember the name, and/or rpm that provides it ?
> 
> 
> --jer
> 
It is called cluster.ng. It is installed with the system-config-cluster
package, and lives in a misc directory beneath the src directory. As you
can tell from the file extension, it is a relaxng file, which can
sometimes be frustrating when finding a particular error. I put the file
in an editor and remove big chunks of it then run the relaxng script on
it.....helps to localize the problem.

hth,

-Jim



From david.costakos at gmail.com  Wed Jan 28 16:55:26 2009
From: david.costakos at gmail.com (Dave Costakos)
Date: Wed, 28 Jan 2009 08:55:26 -0800
Subject: [Linux-cluster] cman startup after after update to 5.3
In-Reply-To: <fac531740901271325l58ea7e47leff51e67c1351992@mail.gmail.com>
References: <497F76A7.1040702@riege.com>
	<fac531740901271325l58ea7e47leff51e67c1351992@mail.gmail.com>
Message-ID: <6b6836c60901280855p6476ef15p643ef9d652406dc4@mail.gmail.com>

Like you, I've run into this same issue.  I have 2 clusters that I'm trying
to update in our lab.  On one, I only updated the cman and rgmanager
packages: this update was successful.  On another I did a full update to 5.3
and ran into what appears to be this same problem.  II've noticed that
manually attempting to start cman via 'cman_tool -d join' prints out this
message right before cman fails.

aisexec: ckpt.c:3961:
message_handler_req_exec_ckpt_sync_checkpoint_refcount:Assertion
`checkpoint != ((void *)0)' failed


I suspect an openais issue, would someone be able to confirm that?

Also, II'm going to try downgrading openais back to the version from
RHEL 5.2 to see if that fixes it (though I won't get to that until the
end of today).  If that works, I'll report back.



2009/1/27 Alan A <alan.zg at gmail.com>

> I just opened RHEL case number 1890184 regarding the same issue. First
> Kernel would not start due to the HP ILO driver conflict, but at the same
> time CMAN broke, and fencing fails. I rolled back cman rpm to the previous
> version but problem persists. Something else changed to affect CMAN not
> starting again.
>
> 2009/1/27 Gunther Schlegel <schlegel at riege.com>
>
>> Hello,
>>
>> I updated one node from 5.2 to 5.3 using yum update and now cman does not
>> start up anymore -- looks like ccsd has some problems:
>>
>> [root at motel6 /]# /sbin/ccsd -4 -n
>> Starting ccsd 2.0.98:
>>  Built: Dec  3 2008 16:32:30
>>  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
>>  IP Protocol:: IPv4 only
>>  No Daemon:: SET
>>
>> Cluster is not quorate.  Refusing connection.
>> Error while processing connect: Connection refused
>> Cluster is not quorate.  Refusing connection.
>> Error while processing connect: Connection refused
>> Unable to connect to cluster infrastructure after 30 seconds.
>> Unable to connect to cluster infrastructure after 60 seconds.
>>
>>
>> When starting ccsd using /etc/init.d/cman it reports all three nodes to be
>> on cluster.conf version 78, so I guess it is not a network connectivity
>> problem.
>>
>> The other two nodes (still on 5.2z) of the cluster are up and running with
>> quorum. Openais is talking to those 2 other nodes and it looks fine to me:
>>
>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] Members Joined:
>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0) ip(10.11.5.22)
>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0) ip(10.11.5.23)
>> Jan 27 21:05:26 motel6 openais[1278]: [SYNC ] This node is within the
>> primary component and will provide service.
>> Jan 27 21:05:26 motel6 openais[1278]: [TOTEM] entering OPERATIONAL state.
>> Jan 27 21:05:26 motel6 openais[1278]: [CMAN ] quorum regained, resuming
>> activity
>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message
>> 10.11.5.21
>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message
>> 10.11.5.22
>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message
>> 10.11.5.23
>>
>>
>> I am a bit lost...
>>
>> cluster.conf:
>> [root at motel6 init.d]# cat /etc/cluster/cluster.conf
>> <?xml version="1.0"?>
>> <cluster alias="RSIXENCluster2" config_version="87" name="RSIXENCluster2">
>>        <fence_daemon clean_start="0" post_fail_delay="0"
>> post_join_delay="3"/>
>>        <clusternodes>
>>                <clusternode name="concorde.riege.de" nodeid="1"
>> votes="1">
>>                        <fence>
>>                                <method name="1">
>>                                        <device name="Concorde_IPMI"/>
>>                                </method>
>>                        </fence>
>>                </clusternode>
>>                <clusternode name="motel6.riege.de" nodeid="2" votes="1">
>>                        <fence>
>>                                <method name="1">
>>                                        <device name="Motel6_IPMI"/>
>>                                </method>
>>                        </fence>
>>                </clusternode>
>>                <clusternode name="mercure.riege.de" nodeid="3" votes="1">
>>                        <fence>
>>                                <method name="1">
>>                                        <device name="Mercure_IPMI"/>
>>                                </method>
>>                        </fence>
>>                </clusternode>
>>        </clusternodes>
>>        <fencedevices>
>>                <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.132"
>> login="root" name="Concorde_IPMI" passwd="XXX"/>
>>                <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.131"
>> login="root" name="Motel6_IPMI" passwd="xxx"/>
>>                <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.133"
>> login="root" name="Mercure_IPMI" passwd="XXX"/>
>>        </fencedevices>
>>        <rm>
>>                <failoverdomains>
>>                        <failoverdomain name="Earth" nofailback="1"
>> ordered="1" restricted="1">
>>                                <failoverdomainnode name="
>> concorde.riege.de" priority="1"/>
>>                                <failoverdomainnode name="motel6.riege.de"
>> priority="1"/>
>>                                <failoverdomainnode name="mercure.riege.de"
>> priority="1"/>
>>                        </failoverdomain>
>>                        <failoverdomain name="Europe" nofailback="0"
>> ordered="1" restricted="0">
>>                                <failoverdomainnode name="
>> concorde.riege.de" priority="2"/>
>>                        </failoverdomain>
>>                        <failoverdomain name="North America" nofailback="0"
>> ordered="1" restricted="0">
>>                                <failoverdomainnode name="motel6.riege.de"
>> priority="2"/>
>>                        </failoverdomain>
>>                        <failoverdomain name="Africa" nofailback="0"
>> ordered="1" restricted="0">
>>                                <failoverdomainnode name="mercure.riege.de"
>> priority="1"/>
>>                        </failoverdomain>
>>                </failoverdomains>
>>                <resources/>
>>                <vm autostart="1" domain="Africa" exclusive="0"
>> migrate="live" name="vm64.test.riege.de_64" path="/etc/xen"
>> recovery="restart"/>
>>                <vm autostart="1" domain="North America" exclusive="0"
>> migrate="pause" name="rt.test.riege.de_32" path="/etc/xen"
>> recovery="restart"/>
>>                <vm autostart="1" domain="Africa" exclusive="0"
>> migrate="pause" name="poincare.riege.de_32" path="/etc/xen"
>> recovery="restart"/>
>>                <vm autostart="1" domain="North America" exclusive="0"
>> migrate="live" name="jboss.dev.riege.de_64" path="/etc/xen"
>> recovery="relocate"/>
>>                <vm autostart="1" domain="Africa" exclusive="0"
>> migrate="live" name="master.cc3.dev.riege.de_64" path="/etc/xen"
>> recovery="relocate"/>
>>                <vm autostart="1" domain="Europe" exclusive="0"
>> migrate="pause" name="test.alphatrans.scope.riege.com_32" path="/etc/xen"
>> recovery="relocate"/>
>>                <vm autostart="1" domain="North America" exclusive="0"
>> migrate="live" name="slave.cc3.dev.riege.de_64" path="/etc/xen"
>> recovery="restart"/>
>>                <vm autostart="1" domain="North America" exclusive="0"
>> migrate="live" name="webmail.riege.com_64" path="/etc/xen"
>> recovery="relocate"/>
>>                <vm autostart="1" domain="Europe" exclusive="0"
>> migrate="live" name="live.rsi.scope.riege.com_64" path="/etc/xen"
>> recovery="relocate"/>
>>                <vm autostart="1" domain="Europe" exclusive="0"
>> migrate="pause" name="qa-16.rsi.scope.riege.com_32" path="/etc/xen"
>> recovery="relocate"/>
>>                <vm autostart="1" domain="Africa" exclusive="0"
>> migrate="pause" name="qa-18.rsi.scope.riege.com_32" path="/etc/xen"
>> recovery="relocate"/>
>>                <vm autostart="1" domain="Africa" exclusive="0"
>> migrate="pause" name="vm32.test.riege.de_32" path="/etc/xen"
>> recovery="restart"/>
>>                <vm autostart="1" domain="Europe" exclusive="0"
>> migrate="pause" name="qa-head.rsi.scope.riege.com_32" path="/etc/xen"
>> recovery="restart"/>
>>                <vm autostart="1" domain="North America" exclusive="0"
>> migrate="live" name="mq.dev.riege.de_64" path="/etc/xen"
>> recovery="relocate"/>
>>                <vm autostart="1" domain="Europe" exclusive="0"
>> migrate="live" name="archive.dev.riege.de_64" path="/etc/xen"
>> recovery="restart"/>
>>        </rm>
>>        <cman quorum_dev_poll="50000"/>
>>        <totem consensus="4800" join="60" token="60000"
>> token_retransmits_before_loss_const="20"/>
>>        <quorumd device="/dev/mapper/Quorum_Partition" interval="3"
>> min_score="1" tko="10" votes="2"/>
>> </cluster>
>>
>> best regards, Gunther
>>
>> --
>> .............................................................
>> Riege Software International GmbH  Fon: +49 (2159) 9148 0
>> Mollsfeld 10                       Fax: +49 (2159) 9148 11
>> 40670 Meerbusch                    Web: www.riege.com
>> Germany                            E-Mail: schlegel at riege.com
>> ---                                ---
>> Handelsregister:                   Managing Directors:
>> Amtsgericht Neuss HRB-NR 4207      Christian Riege
>> USt-ID-Nr.: DE120585842            Gabriele  Riege
>>                                  Johannes  Riege
>> .............................................................
>>          YOU CARE FOR FREIGHT, WE CARE FOR YOU
>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
> Alan A.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Dave Costakos
mailto:david.costakos at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090128/6b3ac4ce/attachment.htm>

From alan.zg at gmail.com  Wed Jan 28 17:56:33 2009
From: alan.zg at gmail.com (Alan A)
Date: Wed, 28 Jan 2009 11:56:33 -0600
Subject: [Linux-cluster] cman startup after after update to 5.3
In-Reply-To: <6b6836c60901280855p6476ef15p643ef9d652406dc4@mail.gmail.com>
References: <497F76A7.1040702@riege.com>
	<fac531740901271325l58ea7e47leff51e67c1351992@mail.gmail.com>
	<6b6836c60901280855p6476ef15p643ef9d652406dc4@mail.gmail.com>
Message-ID: <fac531740901280956r593d3d63x7cd9d7771386d3a@mail.gmail.com>

Rolling back to previous openais package allowed me to restart cman. From
openais-0.80.3-22el5 to
openais-0.80.3-15.el5.


2009/1/28 Dave Costakos <david.costakos at gmail.com>

> Like you, I've run into this same issue.  I have 2 clusters that I'm trying
> to update in our lab.  On one, I only updated the cman and rgmanager
> packages: this update was successful.  On another I did a full update to 5.3
> and ran into what appears to be this same problem.  II've noticed that
> manually attempting to start cman via 'cman_tool -d join' prints out this
> message right before cman fails.
>
> aisexec: ckpt.c:3961: message_handler_req_exec_ckpt_sync_checkpoint_refcount:Assertion `checkpoint != ((void *)0)' failed
>
>
> I suspect an openais issue, would someone be able to confirm that?
>
> Also, II'm going to try downgrading openais back to the version from RHEL 5.2 to see if that fixes it (though I won't get to that until the end of today).  If that works, I'll report back.
>
>
>
> 2009/1/27 Alan A <alan.zg at gmail.com>
>
> I just opened RHEL case number 1890184 regarding the same issue. First
>> Kernel would not start due to the HP ILO driver conflict, but at the same
>> time CMAN broke, and fencing fails. I rolled back cman rpm to the previous
>> version but problem persists. Something else changed to affect CMAN not
>> starting again.
>>
>> 2009/1/27 Gunther Schlegel <schlegel at riege.com>
>>
>>>  Hello,
>>>
>>> I updated one node from 5.2 to 5.3 using yum update and now cman does not
>>> start up anymore -- looks like ccsd has some problems:
>>>
>>> [root at motel6 /]# /sbin/ccsd -4 -n
>>> Starting ccsd 2.0.98:
>>>  Built: Dec  3 2008 16:32:30
>>>  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
>>>  IP Protocol:: IPv4 only
>>>  No Daemon:: SET
>>>
>>> Cluster is not quorate.  Refusing connection.
>>> Error while processing connect: Connection refused
>>> Cluster is not quorate.  Refusing connection.
>>> Error while processing connect: Connection refused
>>> Unable to connect to cluster infrastructure after 30 seconds.
>>> Unable to connect to cluster infrastructure after 60 seconds.
>>>
>>>
>>> When starting ccsd using /etc/init.d/cman it reports all three nodes to
>>> be on cluster.conf version 78, so I guess it is not a network connectivity
>>> problem.
>>>
>>> The other two nodes (still on 5.2z) of the cluster are up and running
>>> with quorum. Openais is talking to those 2 other nodes and it looks fine to
>>> me:
>>>
>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] Members Joined:
>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0) ip(10.11.5.22)
>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0) ip(10.11.5.23)
>>> Jan 27 21:05:26 motel6 openais[1278]: [SYNC ] This node is within the
>>> primary component and will provide service.
>>> Jan 27 21:05:26 motel6 openais[1278]: [TOTEM] entering OPERATIONAL state.
>>> Jan 27 21:05:26 motel6 openais[1278]: [CMAN ] quorum regained, resuming
>>> activity
>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message
>>> 10.11.5.21
>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message
>>> 10.11.5.22
>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message
>>> 10.11.5.23
>>>
>>>
>>> I am a bit lost...
>>>
>>> cluster.conf:
>>> [root at motel6 init.d]# cat /etc/cluster/cluster.conf
>>> <?xml version="1.0"?>
>>> <cluster alias="RSIXENCluster2" config_version="87"
>>> name="RSIXENCluster2">
>>>        <fence_daemon clean_start="0" post_fail_delay="0"
>>> post_join_delay="3"/>
>>>        <clusternodes>
>>>                <clusternode name="concorde.riege.de" nodeid="1"
>>> votes="1">
>>>                        <fence>
>>>                                <method name="1">
>>>                                        <device name="Concorde_IPMI"/>
>>>                                </method>
>>>                        </fence>
>>>                </clusternode>
>>>                <clusternode name="motel6.riege.de" nodeid="2" votes="1">
>>>                        <fence>
>>>                                <method name="1">
>>>                                        <device name="Motel6_IPMI"/>
>>>                                </method>
>>>                        </fence>
>>>                </clusternode>
>>>                <clusternode name="mercure.riege.de" nodeid="3"
>>> votes="1">
>>>                        <fence>
>>>                                <method name="1">
>>>                                        <device name="Mercure_IPMI"/>
>>>                                </method>
>>>                        </fence>
>>>                </clusternode>
>>>        </clusternodes>
>>>        <fencedevices>
>>>                <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.132"
>>> login="root" name="Concorde_IPMI" passwd="XXX"/>
>>>                <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.131"
>>> login="root" name="Motel6_IPMI" passwd="xxx"/>
>>>                <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.133"
>>> login="root" name="Mercure_IPMI" passwd="XXX"/>
>>>        </fencedevices>
>>>        <rm>
>>>                <failoverdomains>
>>>                        <failoverdomain name="Earth" nofailback="1"
>>> ordered="1" restricted="1">
>>>                                <failoverdomainnode name="
>>> concorde.riege.de" priority="1"/>
>>>                                <failoverdomainnode name="motel6.riege.de"
>>> priority="1"/>
>>>                                <failoverdomainnode name="
>>> mercure.riege.de" priority="1"/>
>>>                        </failoverdomain>
>>>                        <failoverdomain name="Europe" nofailback="0"
>>> ordered="1" restricted="0">
>>>                                <failoverdomainnode name="
>>> concorde.riege.de" priority="2"/>
>>>                        </failoverdomain>
>>>                        <failoverdomain name="North America"
>>> nofailback="0" ordered="1" restricted="0">
>>>                                <failoverdomainnode name="motel6.riege.de"
>>> priority="2"/>
>>>                        </failoverdomain>
>>>                        <failoverdomain name="Africa" nofailback="0"
>>> ordered="1" restricted="0">
>>>                                <failoverdomainnode name="
>>> mercure.riege.de" priority="1"/>
>>>                        </failoverdomain>
>>>                </failoverdomains>
>>>                <resources/>
>>>                <vm autostart="1" domain="Africa" exclusive="0"
>>> migrate="live" name="vm64.test.riege.de_64" path="/etc/xen"
>>> recovery="restart"/>
>>>                <vm autostart="1" domain="North America" exclusive="0"
>>> migrate="pause" name="rt.test.riege.de_32" path="/etc/xen"
>>> recovery="restart"/>
>>>                <vm autostart="1" domain="Africa" exclusive="0"
>>> migrate="pause" name="poincare.riege.de_32" path="/etc/xen"
>>> recovery="restart"/>
>>>                <vm autostart="1" domain="North America" exclusive="0"
>>> migrate="live" name="jboss.dev.riege.de_64" path="/etc/xen"
>>> recovery="relocate"/>
>>>                <vm autostart="1" domain="Africa" exclusive="0"
>>> migrate="live" name="master.cc3.dev.riege.de_64" path="/etc/xen"
>>> recovery="relocate"/>
>>>                <vm autostart="1" domain="Europe" exclusive="0"
>>> migrate="pause" name="test.alphatrans.scope.riege.com_32" path="/etc/xen"
>>> recovery="relocate"/>
>>>                <vm autostart="1" domain="North America" exclusive="0"
>>> migrate="live" name="slave.cc3.dev.riege.de_64" path="/etc/xen"
>>> recovery="restart"/>
>>>                <vm autostart="1" domain="North America" exclusive="0"
>>> migrate="live" name="webmail.riege.com_64" path="/etc/xen"
>>> recovery="relocate"/>
>>>                <vm autostart="1" domain="Europe" exclusive="0"
>>> migrate="live" name="live.rsi.scope.riege.com_64" path="/etc/xen"
>>> recovery="relocate"/>
>>>                <vm autostart="1" domain="Europe" exclusive="0"
>>> migrate="pause" name="qa-16.rsi.scope.riege.com_32" path="/etc/xen"
>>> recovery="relocate"/>
>>>                <vm autostart="1" domain="Africa" exclusive="0"
>>> migrate="pause" name="qa-18.rsi.scope.riege.com_32" path="/etc/xen"
>>> recovery="relocate"/>
>>>                <vm autostart="1" domain="Africa" exclusive="0"
>>> migrate="pause" name="vm32.test.riege.de_32" path="/etc/xen"
>>> recovery="restart"/>
>>>                <vm autostart="1" domain="Europe" exclusive="0"
>>> migrate="pause" name="qa-head.rsi.scope.riege.com_32" path="/etc/xen"
>>> recovery="restart"/>
>>>                <vm autostart="1" domain="North America" exclusive="0"
>>> migrate="live" name="mq.dev.riege.de_64" path="/etc/xen"
>>> recovery="relocate"/>
>>>                <vm autostart="1" domain="Europe" exclusive="0"
>>> migrate="live" name="archive.dev.riege.de_64" path="/etc/xen"
>>> recovery="restart"/>
>>>        </rm>
>>>        <cman quorum_dev_poll="50000"/>
>>>        <totem consensus="4800" join="60" token="60000"
>>> token_retransmits_before_loss_const="20"/>
>>>        <quorumd device="/dev/mapper/Quorum_Partition" interval="3"
>>> min_score="1" tko="10" votes="2"/>
>>> </cluster>
>>>
>>> best regards, Gunther
>>>
>>> --
>>> .............................................................
>>> Riege Software International GmbH  Fon: +49 (2159) 9148 0
>>> Mollsfeld 10                       Fax: +49 (2159) 9148 11
>>> 40670 Meerbusch                    Web: www.riege.com
>>> Germany                            E-Mail: schlegel at riege.com
>>> ---                                ---
>>> Handelsregister:                   Managing Directors:
>>> Amtsgericht Neuss HRB-NR 4207      Christian Riege
>>> USt-ID-Nr.: DE120585842            Gabriele  Riege
>>>                                  Johannes  Riege
>>> .............................................................
>>>          YOU CARE FOR FREIGHT, WE CARE FOR YOU
>>>
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>>
>> --
>> Alan A.
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
> Dave Costakos
> mailto:david.costakos at gmail.com
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090128/4ab18055/attachment.htm>

From bergman at merctech.com  Wed Jan 28 23:04:14 2009
From: bergman at merctech.com (bergman at merctech.com)
Date: Wed, 28 Jan 2009 18:04:14 -0500
Subject: [Linux-cluster] cluster.conf linting
In-Reply-To: Your message of "Wed, 28 Jan 2009 11:30:15 EST."
	<1233160215.3330.6.camel@localhost.localdomain>
References: <1233160215.3330.6.camel@localhost.localdomain>
	<1734CA24F5FC1848880E6B1AB788DD7701E3DEE6@INV-EX1.ad.invision.net>
Message-ID: <24981.1233183854@mirchi>



In the message dated: Wed, 28 Jan 2009 11:30:15 EST,
The pithy ruminations from jim parsons on 
<Re: [Linux-cluster] cluster.conf linting> were:
=> On Wed, 2009-01-28 at 10:03 -0500, Jeremy Eder wrote:
=> > i remember there is a python or other script that lints cluster.conf...
=> > anyone remember the name, and/or rpm that provides it ?

While not cluster-aware, the program "xmllint" is an excellent way to do a 
quick syntax check on any XML file. I prefer to use the option "-noout" to 
supress any non-error output, as in:

	xmllint -noout /etc/cluster/cluster.conf

If it returns nothing (and sets the return value to 0), then there were no 
parsing errors (no mis-matched braces, for example).

Mark

=> > 
=> > 
=> > --jer
=> > 
=> It is called cluster.ng. It is installed with the system-config-cluster
=> package, and lives in a misc directory beneath the src directory. As you
=> can tell from the file extension, it is a relaxng file, which can
=> sometimes be frustrating when finding a particular error. I put the file
=> in an editor and remove big chunks of it then run the relaxng script on
=> it.....helps to localize the problem.
=> 
=> hth,
=> 
=> -Jim
=> 



From david.costakos at gmail.com  Wed Jan 28 23:26:45 2009
From: david.costakos at gmail.com (Dave Costakos)
Date: Wed, 28 Jan 2009 15:26:45 -0800
Subject: [Linux-cluster] cman startup after after update to 5.3
In-Reply-To: <fac531740901280956r593d3d63x7cd9d7771386d3a@mail.gmail.com>
References: <497F76A7.1040702@riege.com>
	<fac531740901271325l58ea7e47leff51e67c1351992@mail.gmail.com>
	<6b6836c60901280855p6476ef15p643ef9d652406dc4@mail.gmail.com>
	<fac531740901280956r593d3d63x7cd9d7771386d3a@mail.gmail.com>
Message-ID: <6b6836c60901281526i7816f50cp2cf73428b47f1469@mail.gmail.com>

Confirmed.  Same here.  Seems like a bug to me still though.  I would hope
we have to ability to do rolling upgrades on openais in our RHEL clusters.

2009/1/28 Alan A <alan.zg at gmail.com>

> Rolling back to previous openais package allowed me to restart cman. From
> openais-0.80.3-22el5 to
> openais-0.80.3-15.el5.
>
>
> 2009/1/28 Dave Costakos <david.costakos at gmail.com>
>
> Like you, I've run into this same issue.  I have 2 clusters that I'm trying
>> to update in our lab.  On one, I only updated the cman and rgmanager
>> packages: this update was successful.  On another I did a full update to 5.3
>> and ran into what appears to be this same problem.  II've noticed that
>> manually attempting to start cman via 'cman_tool -d join' prints out this
>> message right before cman fails.
>>
>> aisexec: ckpt.c:3961: message_handler_req_exec_ckpt_sync_checkpoint_refcount:Assertion `checkpoint != ((void *)0)' failed
>>
>>
>>
>> I suspect an openais issue, would someone be able to confirm that?
>>
>> Also, II'm going to try downgrading openais back to the version from RHEL 5.2 to see if that fixes it (though I won't get to that until the end of today).  If that works, I'll report back.
>>
>>
>>
>> 2009/1/27 Alan A <alan.zg at gmail.com>
>>
>> I just opened RHEL case number 1890184 regarding the same issue. First
>>> Kernel would not start due to the HP ILO driver conflict, but at the same
>>> time CMAN broke, and fencing fails. I rolled back cman rpm to the previous
>>> version but problem persists. Something else changed to affect CMAN not
>>> starting again.
>>>
>>> 2009/1/27 Gunther Schlegel <schlegel at riege.com>
>>>
>>>>  Hello,
>>>>
>>>> I updated one node from 5.2 to 5.3 using yum update and now cman does
>>>> not start up anymore -- looks like ccsd has some problems:
>>>>
>>>> [root at motel6 /]# /sbin/ccsd -4 -n
>>>> Starting ccsd 2.0.98:
>>>>  Built: Dec  3 2008 16:32:30
>>>>  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
>>>>  IP Protocol:: IPv4 only
>>>>  No Daemon:: SET
>>>>
>>>> Cluster is not quorate.  Refusing connection.
>>>> Error while processing connect: Connection refused
>>>> Cluster is not quorate.  Refusing connection.
>>>> Error while processing connect: Connection refused
>>>> Unable to connect to cluster infrastructure after 30 seconds.
>>>> Unable to connect to cluster infrastructure after 60 seconds.
>>>>
>>>>
>>>> When starting ccsd using /etc/init.d/cman it reports all three nodes to
>>>> be on cluster.conf version 78, so I guess it is not a network connectivity
>>>> problem.
>>>>
>>>> The other two nodes (still on 5.2z) of the cluster are up and running
>>>> with quorum. Openais is talking to those 2 other nodes and it looks fine to
>>>> me:
>>>>
>>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] Members Joined:
>>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0) ip(10.11.5.22)
>>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0) ip(10.11.5.23)
>>>> Jan 27 21:05:26 motel6 openais[1278]: [SYNC ] This node is within the
>>>> primary component and will provide service.
>>>> Jan 27 21:05:26 motel6 openais[1278]: [TOTEM] entering OPERATIONAL
>>>> state.
>>>> Jan 27 21:05:26 motel6 openais[1278]: [CMAN ] quorum regained, resuming
>>>> activity
>>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message
>>>> 10.11.5.21
>>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message
>>>> 10.11.5.22
>>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got nodejoin message
>>>> 10.11.5.23
>>>>
>>>>
>>>> I am a bit lost...
>>>>
>>>> cluster.conf:
>>>> [root at motel6 init.d]# cat /etc/cluster/cluster.conf
>>>> <?xml version="1.0"?>
>>>> <cluster alias="RSIXENCluster2" config_version="87"
>>>> name="RSIXENCluster2">
>>>>        <fence_daemon clean_start="0" post_fail_delay="0"
>>>> post_join_delay="3"/>
>>>>        <clusternodes>
>>>>                <clusternode name="concorde.riege.de" nodeid="1"
>>>> votes="1">
>>>>                        <fence>
>>>>                                <method name="1">
>>>>                                        <device name="Concorde_IPMI"/>
>>>>                                </method>
>>>>                        </fence>
>>>>                </clusternode>
>>>>                <clusternode name="motel6.riege.de" nodeid="2"
>>>> votes="1">
>>>>                        <fence>
>>>>                                <method name="1">
>>>>                                        <device name="Motel6_IPMI"/>
>>>>                                </method>
>>>>                        </fence>
>>>>                </clusternode>
>>>>                <clusternode name="mercure.riege.de" nodeid="3"
>>>> votes="1">
>>>>                        <fence>
>>>>                                <method name="1">
>>>>                                        <device name="Mercure_IPMI"/>
>>>>                                </method>
>>>>                        </fence>
>>>>                </clusternode>
>>>>        </clusternodes>
>>>>        <fencedevices>
>>>>                <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.132"
>>>> login="root" name="Concorde_IPMI" passwd="XXX"/>
>>>>                <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.131"
>>>> login="root" name="Motel6_IPMI" passwd="xxx"/>
>>>>                <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.133"
>>>> login="root" name="Mercure_IPMI" passwd="XXX"/>
>>>>        </fencedevices>
>>>>        <rm>
>>>>                <failoverdomains>
>>>>                        <failoverdomain name="Earth" nofailback="1"
>>>> ordered="1" restricted="1">
>>>>                                <failoverdomainnode name="
>>>> concorde.riege.de" priority="1"/>
>>>>                                <failoverdomainnode name="
>>>> motel6.riege.de" priority="1"/>
>>>>                                <failoverdomainnode name="
>>>> mercure.riege.de" priority="1"/>
>>>>                        </failoverdomain>
>>>>                        <failoverdomain name="Europe" nofailback="0"
>>>> ordered="1" restricted="0">
>>>>                                <failoverdomainnode name="
>>>> concorde.riege.de" priority="2"/>
>>>>                        </failoverdomain>
>>>>                        <failoverdomain name="North America"
>>>> nofailback="0" ordered="1" restricted="0">
>>>>                                <failoverdomainnode name="
>>>> motel6.riege.de" priority="2"/>
>>>>                        </failoverdomain>
>>>>                        <failoverdomain name="Africa" nofailback="0"
>>>> ordered="1" restricted="0">
>>>>                                <failoverdomainnode name="
>>>> mercure.riege.de" priority="1"/>
>>>>                        </failoverdomain>
>>>>                </failoverdomains>
>>>>                <resources/>
>>>>                <vm autostart="1" domain="Africa" exclusive="0"
>>>> migrate="live" name="vm64.test.riege.de_64" path="/etc/xen"
>>>> recovery="restart"/>
>>>>                <vm autostart="1" domain="North America" exclusive="0"
>>>> migrate="pause" name="rt.test.riege.de_32" path="/etc/xen"
>>>> recovery="restart"/>
>>>>                <vm autostart="1" domain="Africa" exclusive="0"
>>>> migrate="pause" name="poincare.riege.de_32" path="/etc/xen"
>>>> recovery="restart"/>
>>>>                <vm autostart="1" domain="North America" exclusive="0"
>>>> migrate="live" name="jboss.dev.riege.de_64" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="Africa" exclusive="0"
>>>> migrate="live" name="master.cc3.dev.riege.de_64" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="Europe" exclusive="0"
>>>> migrate="pause" name="test.alphatrans.scope.riege.com_32" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="North America" exclusive="0"
>>>> migrate="live" name="slave.cc3.dev.riege.de_64" path="/etc/xen"
>>>> recovery="restart"/>
>>>>                <vm autostart="1" domain="North America" exclusive="0"
>>>> migrate="live" name="webmail.riege.com_64" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="Europe" exclusive="0"
>>>> migrate="live" name="live.rsi.scope.riege.com_64" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="Europe" exclusive="0"
>>>> migrate="pause" name="qa-16.rsi.scope.riege.com_32" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="Africa" exclusive="0"
>>>> migrate="pause" name="qa-18.rsi.scope.riege.com_32" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="Africa" exclusive="0"
>>>> migrate="pause" name="vm32.test.riege.de_32" path="/etc/xen"
>>>> recovery="restart"/>
>>>>                <vm autostart="1" domain="Europe" exclusive="0"
>>>> migrate="pause" name="qa-head.rsi.scope.riege.com_32" path="/etc/xen"
>>>> recovery="restart"/>
>>>>                <vm autostart="1" domain="North America" exclusive="0"
>>>> migrate="live" name="mq.dev.riege.de_64" path="/etc/xen"
>>>> recovery="relocate"/>
>>>>                <vm autostart="1" domain="Europe" exclusive="0"
>>>> migrate="live" name="archive.dev.riege.de_64" path="/etc/xen"
>>>> recovery="restart"/>
>>>>        </rm>
>>>>        <cman quorum_dev_poll="50000"/>
>>>>        <totem consensus="4800" join="60" token="60000"
>>>> token_retransmits_before_loss_const="20"/>
>>>>        <quorumd device="/dev/mapper/Quorum_Partition" interval="3"
>>>> min_score="1" tko="10" votes="2"/>
>>>> </cluster>
>>>>
>>>> best regards, Gunther
>>>>
>>>> --
>>>> .............................................................
>>>> Riege Software International GmbH  Fon: +49 (2159) 9148 0
>>>> Mollsfeld 10                       Fax: +49 (2159) 9148 11
>>>> 40670 Meerbusch                    Web: www.riege.com
>>>> Germany                            E-Mail: schlegel at riege.com
>>>> ---                                ---
>>>> Handelsregister:                   Managing Directors:
>>>> Amtsgericht Neuss HRB-NR 4207      Christian Riege
>>>> USt-ID-Nr.: DE120585842            Gabriele  Riege
>>>>                                  Johannes  Riege
>>>> .............................................................
>>>>          YOU CARE FOR FREIGHT, WE CARE FOR YOU
>>>>
>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>>
>>>
>>> --
>>> Alan A.
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>>
>> --
>> Dave Costakos
>> mailto:david.costakos at gmail.com
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
> Alan A.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Dave Costakos
mailto:david.costakos at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090128/12a1fe8d/attachment.htm>

From keith at eddystone.com  Thu Jan 29 09:48:29 2009
From: keith at eddystone.com (Keith Denby)
Date: Thu, 29 Jan 2009 09:48:29 +0000
Subject: [Linux-cluster] Minimalist cluster
Message-ID: <49817B6D.5010400@eddystone.com>

What is the minimum arrangement of hardware to run a cluster to provide 
HA web services including MySQL?

Can it be done using just two machines with no NAS - each acting as a 
node with storage mirrored across the two machines and conga able to run 
on both?

Can IPMI fencing be used in such a case or is it necessary to have a 
power-controller that serialises requests to control the power?

What risks are inherent in such a minimalist arrangement?

Thanks

-- 
Keith Denby

keith at eddystone.com
01598 763455
07763 483168




From Etienne.Sammut at vodafone.com  Thu Jan 29 12:45:58 2009
From: Etienne.Sammut at vodafone.com (Sammut, Etienne, VF-MT)
Date: Thu, 29 Jan 2009 13:45:58 +0100
Subject: [Linux-cluster] Problem with NDB Engine (Dropping tables
	automatically with show tables)
Message-ID: <058BD4EC778BFE47B6ACDF35BB215797013DA954@MT-WIMXC03.vf-mt.internal.vodafone.com>

Hi Guys/Gals

 

I have a problem with my cluster. I am creating a Database and creating
some tables in it. I am using NDB disk based tables. The database
consists of about 7 tables. 

 

When I create the tables and then perform a show table command all the
tables are dropped L

 

I tried to create the tables no disk based and the same thing happened.
When creating the tables using MYISAM engine the problem does not
happen.

 

Also I noticed that the show tables does not drop the tables if only for
example 4 of the tables. Thus I thought that it was some sort of problem
with max attributes parameters, I tried to increase the param but the
problem occurred just the same. 

 

I am using mysql-5.1.29 ndb-6.3.19

 

 

Your help is much appreciated.   

 

Thanks and Regards

Etienne Sammut

-------------------------------------------------------------------------------------
Vodafone Life is Now
-------------------------------------------------------------------------------------

This email is intended only for the use of individuals to whom it is addressed, as it may contain confidential or privileged information. If you are not a named addressee, intended recipient, or the person responsible for delivering the message to the named addressee, be advised that you have received this email in error and that you should not disseminate, distribute, print, copy this mail or otherwise divulge its contents. In such instances, please notify Vodafone Malta Limited on telephone number +356 99999247 and delete this email from your system. Since this transmission was affected via email, Vodafone Malta Limited cannot guarantee that it is secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Vodafone Malta Limited does not accept liability for any errors or omissions in the contents of this message which arise as a result of email transmission.

Save the environment for our children - Print e-mail only when necessary.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090129/5231cd17/attachment.htm>

From sunhux at gmail.com  Thu Jan 29 14:51:39 2009
From: sunhux at gmail.com (sunhux G)
Date: Thu, 29 Jan 2009 22:51:39 +0800
Subject: [Linux-cluster] Set up Satellite on notebook/laptop or server?
Message-ID: <60f08e700901290651y1da3c2b3p8e7638fa85999ae8@mail.gmail.com>

Hi,


We're exploring to get Satellite for ease of patching the Linux servers.

The recommendation from Redhat is to have at least 200Gb disk space,
2Gb RAM.

As the largest server (SCSI) disk available is only 146Gb, I thought of
setting up Satellite on a notebook instead (as there's larger disks
available
for notebook).

My colleagues/manager prefer a server but of course this means setting up
RAID 0 (or RAID 0+1) to obtain larger disks.

Does anyone has any comments as to the pros & cons of setting up
Satellite on a notebook vs on a server?  I was told notebook/laptop is less
reliable but we're taking an Acronis backup as and when there's changes
or new patches/updates being loaded into the Satellite, so risks of a
crashed satellite is mitigated.

What's the largest disk available on a notebook/laptop?

My idea of setting up Satellite on a notebook is that I could bring the
notebook
around to connect it up to various subnets (or even to a another datacentre
at
a remote location) to patch the Linux servers without the hassle of opening
up
firewall rules and sharing of satellite between different locations.

Any issue (legal of technical) with just changing the IP address of the
Satellite
server as & when I need to connect it up to a different subnet?


What are the various hardware people knew have been used to host Satellite?


Thanks
U
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090129/73e68a4a/attachment.htm>

From td3201 at gmail.com  Thu Jan 29 17:17:17 2009
From: td3201 at gmail.com (Terry)
Date: Thu, 29 Jan 2009 11:17:17 -0600
Subject: [Linux-cluster] service dependencies
Message-ID: <8ee061010901290917y51447b83rcf6279da280f60c5@mail.gmail.com>

I have a need to ensure that service A starts before service B.  It
probably makes the most sense that service B stops before service A.
I have read that there's a 'depend=' option for services but I can't
find any more information on it or how it works.  I tried the simple
approach of putting 'depend="Service A"' in the service B
configuration but that didn't work.  Any ideas?

Thanks!



From m.watts at eris.qinetiq.com  Thu Jan 29 17:28:39 2009
From: m.watts at eris.qinetiq.com (Mark Watts)
Date: Thu, 29 Jan 2009 17:28:39 +0000
Subject: [Linux-cluster] Set up Satellite on notebook/laptop or server?
In-Reply-To: <60f08e700901290651y1da3c2b3p8e7638fa85999ae8@mail.gmail.com>
References: <60f08e700901290651y1da3c2b3p8e7638fa85999ae8@mail.gmail.com>
Message-ID: <200901291728.43197.m.watts@eris.qinetiq.com>


On Thursday 29 January 2009 14:51:39 sunhux G wrote:
> Hi,
>
>
> We're exploring to get Satellite for ease of patching the Linux servers.
>
> The recommendation from Redhat is to have at least 200Gb disk space,
> 2Gb RAM.
>
> As the largest server (SCSI) disk available is only 146Gb, I thought of
> setting up Satellite on a notebook instead (as there's larger disks
> available
> for notebook).
>
> My colleagues/manager prefer a server but of course this means setting up
> RAID 0 (or RAID 0+1) to obtain larger disks.
>
> Does anyone has any comments as to the pros & cons of setting up
> Satellite on a notebook vs on a server?  I was told notebook/laptop is less
> reliable but we're taking an Acronis backup as and when there's changes
> or new patches/updates being loaded into the Satellite, so risks of a
> crashed satellite is mitigated.
>
> What's the largest disk available on a notebook/laptop?
>
> My idea of setting up Satellite on a notebook is that I could bring the
> notebook
> around to connect it up to various subnets (or even to a another datacentre
> at
> a remote location) to patch the Linux servers without the hassle of opening
> up
> firewall rules and sharing of satellite between different locations.
>
> Any issue (legal of technical) with just changing the IP address of the
> Satellite
> server as & when I need to connect it up to a different subnet?
>
>
> What are the various hardware people knew have been used to host Satellite?
>
>
> Thanks
> U

We're currently using Satellite 5.0.2 on a fairly meaty Dell server (8GB ram, 
800GB raid-5 disk, 4 cpu).
To my mind this server is massivly over-spec'd for the job its doing, and 
spends most of its time idle.

/var/satellite is taking up 80GB (we have RHEL4 AS and ES 32bit, RHEL5 32bit 
and RHEL5 64bit)
/rhnsat (the database) is only 6.5GB, although we don't have that many servers 
(~20)

A quick check on the Dell website reveals the Precision M6400 Mobile 
Workstation, which take upto 16GB ram, a Quad-core cpu and upto two 500GB 
disks.
This type of laptop would be perfect for a mobile satellite IMHO, although 
it'll probably be more expensive than a server for long-term use.

Mark.

-- 
Mark Watts BSc RHCE MBCS
Senior Systems Engineer
QinetiQ Applied Technologies
GPG Key: http://www.linux-corner.info/mwatts.gpg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090129/73259cb5/attachment.sig>

From stewart at epits.com.au  Fri Jan 30 05:18:02 2009
From: stewart at epits.com.au (Stewart Walters)
Date: Fri, 30 Jan 2009 14:18:02 +0900
Subject: [Linux-cluster] Set up Satellite on notebook/laptop or server?
In-Reply-To: <200901291728.43197.m.watts@eris.qinetiq.com>
References: <60f08e700901290651y1da3c2b3p8e7638fa85999ae8@mail.gmail.com>
	<200901291728.43197.m.watts@eris.qinetiq.com>
Message-ID: <49828D8A.4070801@epits.com.au>

Mark Watts wrote:
> On Thursday 29 January 2009 14:51:39 sunhux G wrote:
>   
>> Hi,
>>
>>
>> We're exploring to get Satellite for ease of patching the Linux servers.
>>
>> The recommendation from Redhat is to have at least 200Gb disk space,
>> 2Gb RAM.
>>
>> As the largest server (SCSI) disk available is only 146Gb, I thought of
>> setting up Satellite on a notebook instead (as there's larger disks
>> available
>> for notebook).
>>
>> My colleagues/manager prefer a server but of course this means setting up
>> RAID 0 (or RAID 0+1) to obtain larger disks.
>>
>> Does anyone has any comments as to the pros & cons of setting up
>> Satellite on a notebook vs on a server?  I was told notebook/laptop is less
>> reliable but we're taking an Acronis backup as and when there's changes
>> or new patches/updates being loaded into the Satellite, so risks of a
>> crashed satellite is mitigated.
>>
>> What's the largest disk available on a notebook/laptop?
>>
>> My idea of setting up Satellite on a notebook is that I could bring the
>> notebook
>> around to connect it up to various subnets (or even to a another datacentre
>> at
>> a remote location) to patch the Linux servers without the hassle of opening
>> up
>> firewall rules and sharing of satellite between different locations.
>>
>> Any issue (legal of technical) with just changing the IP address of the
>> Satellite
>> server as & when I need to connect it up to a different subnet?
>>
>>
>> What are the various hardware people knew have been used to host Satellite?
>>
>>
>> Thanks
>> U
>>     
>
> We're currently using Satellite 5.0.2 on a fairly meaty Dell server (8GB ram, 
> 800GB raid-5 disk, 4 cpu).
> To my mind this server is massivly over-spec'd for the job its doing, and 
> spends most of its time idle.
>
> /var/satellite is taking up 80GB (we have RHEL4 AS and ES 32bit, RHEL5 32bit 
> and RHEL5 64bit)
> /rhnsat (the database) is only 6.5GB, although we don't have that many servers 
> (~20)
>
> A quick check on the Dell website reveals the Precision M6400 Mobile 
> Workstation, which take upto 16GB ram, a Quad-core cpu and upto two 500GB 
> disks.
> This type of laptop would be perfect for a mobile satellite IMHO, although 
> it'll probably be more expensive than a server for long-term use.
>
> Mark.
>
>   
> ------------------------------------------------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
It's OT, but I'll bite.  Before I start though - these opinions are my 
own and not my employers; I personally make no guarantees on the 
information contained herein; your milage may vary; and any other 
caution I can put in here to say "buyer beware" :-)

You can get a small (about the size of a car battery) external RAID disk 
enclosures that work off eSATA.  You could then attach it via a Cardbus 
eSATA card that is supported by Linux (a search shows that this one 
might be a likely candidate http://www.nitroav.com/product/484/) and 
have the laptop mount a RAID 1 off the eSATA the external drives.

Having the Satellite server dump it's data on to the RAID 1 mounted 
filesystem allows you to move the data to another server/laptop if 
needed.  Another benefit is that 3.5 inch hard disks are usually a lot 
cheaper with more capacity than the 2.5 inch disks.  Max spindle speeds 
tend to be better in 3.5 inch disks as well.

As for RH Satelitte questions, while I'm sure many people here on the 
Cluster list have Satellite experience too, there is a dedicated 
Satellite mailing list that you can direct your Satellite queries to 
(sign up for it at 
https://www.redhat.com/mailman/listinfo/rhn-satellite-users).

One final disclaimer, redundancy is never a replacement for frequent and 
relaible backups.  Whatever you do, make sure the Satellite data can be 
restored from recent backups in the event that things go wrong.

Kind Regards,

Stewart



From yamato at redhat.com  Fri Jan 30 08:02:40 2009
From: yamato at redhat.com (Masatake YAMATO)
Date: Fri, 30 Jan 2009 17:02:40 +0900 (JST)
Subject: [Linux-cluster] [PATCH] Remove redundant statement
Message-ID: <20090130.170240.56227421577108127.yamato@redhat.com>

I've found a redundant statement in totemsrp.c.
Could you apply following patch?


Masatake YAMATO

Index: totemsrp.c
===================================================================
--- totemsrp.c	(revision 1752)
+++ totemsrp.c	(working copy)
@@ -2608,7 +2608,6 @@
 	orf_token.header.encapsulated = 0;
 	orf_token.header.nodeid = instance->my_id.addr[0].nodeid;
 	assert (orf_token.header.nodeid);
-	orf_token.seq = 0;
 	orf_token.seq = SEQNO_START_MSG;
 	orf_token.token_seq = SEQNO_START_TOKEN;
 	orf_token.retrans_flg = 1;



From schlegel at riege.com  Fri Jan 30 13:56:53 2009
From: schlegel at riege.com (Gunther Schlegel)
Date: Fri, 30 Jan 2009 14:56:53 +0100
Subject: [Linux-cluster] cman startup after after update to 5.3
In-Reply-To: <6b6836c60901281526i7816f50cp2cf73428b47f1469@mail.gmail.com>
References: <497F76A7.1040702@riege.com>	<fac531740901271325l58ea7e47leff51e67c1351992@mail.gmail.com>	<6b6836c60901280855p6476ef15p643ef9d652406dc4@mail.gmail.com>	<fac531740901280956r593d3d63x7cd9d7771386d3a@mail.gmail.com>
	<6b6836c60901281526i7816f50cp2cf73428b47f1469@mail.gmail.com>
Message-ID: <49830725.7020209@riege.com>

Rolling back to openais-0.80.3-15.el5 worked for me as well.

Though, this is an 5.3 update blocker, as it prevents rolling upgrades 
-- and that is why you run a cluster, ins't it?

I also have no clue whether a native "nativwe" 5.3 / 
openais-0.80.3-22el5 system will work. Can anyone confirm this?


regards, Gunther


Dave Costakos wrote:
> Confirmed.  Same here.  Seems like a bug to me still though.  I would 
> hope we have to ability to do rolling upgrades on openais in our RHEL 
> clusters. 
> 
> 2009/1/28 Alan A <alan.zg at gmail.com <mailto:alan.zg at gmail.com>>
> 
>     Rolling back to previous openais package allowed me to restart cman.
>     From openais-0.80.3-22el5 to
>     openais-0.80.3-15.el5.
> 
> 
>     2009/1/28 Dave Costakos <david.costakos at gmail.com
>     <mailto:david.costakos at gmail.com>>
> 
>         Like you, I've run into this same issue.  I have 2 clusters that
>         I'm trying to update in our lab.  On one, I only updated the
>         cman and rgmanager packages: this update was successful.  On
>         another I did a full update to 5.3 and ran into what appears to
>         be this same problem.  II've noticed that manually attempting to
>         start cman via 'cman_tool -d join' prints out this message right
>         before cman fails.
> 
>         aisexec: ckpt.c:3961: message_handler_req_exec_ckpt_sync_checkpoint_refcount:Assertion `checkpoint != ((void *)0)' failed
> 
> 
> 
> 
> 
>         I suspect an openais issue, would someone be able to confirm that?
> 
>         Also, II'm going to try downgrading openais back to the version from RHEL 5.2 to see if that fixes it (though I won't get to that until the end of today).  If that works, I'll report back.
> 
> 
> 
> 
> 
>         2009/1/27 Alan A <alan.zg at gmail.com <mailto:alan.zg at gmail.com>>
> 
>             I just opened RHEL case number 1890184 regarding the same
>             issue. First Kernel would not start due to the HP ILO driver
>             conflict, but at the same time CMAN broke, and fencing
>             fails. I rolled back cman rpm to the previous version but
>             problem persists. Something else changed to affect CMAN not
>             starting again.
> 
>             2009/1/27 Gunther Schlegel <schlegel at riege.com
>             <mailto:schlegel at riege.com>>
> 
>                 Hello,
> 
>                 I updated one node from 5.2 to 5.3 using yum update and
>                 now cman does not start up anymore -- looks like ccsd
>                 has some problems:
> 
>                 [root at motel6 /]# /sbin/ccsd -4 -n
>                 Starting ccsd 2.0.98:
>                  Built: Dec  3 2008 16:32:30
>                  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
>                  IP Protocol:: IPv4 only
>                  No Daemon:: SET
> 
>                 Cluster is not quorate.  Refusing connection.
>                 Error while processing connect: Connection refused
>                 Cluster is not quorate.  Refusing connection.
>                 Error while processing connect: Connection refused
>                 Unable to connect to cluster infrastructure after 30
>                 seconds.
>                 Unable to connect to cluster infrastructure after 60
>                 seconds.
> 
> 
>                 When starting ccsd using /etc/init.d/cman it reports all
>                 three nodes to be on cluster.conf version 78, so I guess
>                 it is not a network connectivity problem.
> 
>                 The other two nodes (still on 5.2z) of the cluster are
>                 up and running with quorum. Openais is talking to those
>                 2 other nodes and it looks fine to me:
> 
>                 Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] Members
>                 Joined:
>                 Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0)
>                 ip(10.11.5.22)
>                 Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] #011r(0)
>                 ip(10.11.5.23)
>                 Jan 27 21:05:26 motel6 openais[1278]: [SYNC ] This node
>                 is within the primary component and will provide service.
>                 Jan 27 21:05:26 motel6 openais[1278]: [TOTEM] entering
>                 OPERATIONAL state.
>                 Jan 27 21:05:26 motel6 openais[1278]: [CMAN ] quorum
>                 regained, resuming activity
>                 Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got
>                 nodejoin message 10.11.5.21
>                 Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got
>                 nodejoin message 10.11.5.22
>                 Jan 27 21:05:26 motel6 openais[1278]: [CLM  ] got
>                 nodejoin message 10.11.5.23
> 
> 
>                 I am a bit lost...
> 
>                 cluster.conf:
>                 [root at motel6 init.d]# cat /etc/cluster/cluster.conf
>                 <?xml version="1.0"?>
>                 <cluster alias="RSIXENCluster2" config_version="87"
>                 name="RSIXENCluster2">
>                        <fence_daemon clean_start="0" post_fail_delay="0"
>                 post_join_delay="3"/>
>                        <clusternodes>
>                                <clusternode name="concorde.riege.de
>                 <http://concorde.riege.de>" nodeid="1" votes="1">
>                                        <fence>
>                                                <method name="1">
>                                                        <device
>                 name="Concorde_IPMI"/>
>                                                </method>
>                                        </fence>
>                                </clusternode>
>                                <clusternode name="motel6.riege.de
>                 <http://motel6.riege.de>" nodeid="2" votes="1">
>                                        <fence>
>                                                <method name="1">
>                                                        <device
>                 name="Motel6_IPMI"/>
>                                                </method>
>                                        </fence>
>                                </clusternode>
>                                <clusternode name="mercure.riege.de
>                 <http://mercure.riege.de>" nodeid="3" votes="1">
>                                        <fence>
>                                                <method name="1">
>                                                        <device
>                 name="Mercure_IPMI"/>
>                                                </method>
>                                        </fence>
>                                </clusternode>
>                        </clusternodes>
>                        <fencedevices>
>                                <fencedevice agent="fence_ipmilan"
>                 ipaddr="10.11.5.132" login="root" name="Concorde_IPMI"
>                 passwd="XXX"/>
>                                <fencedevice agent="fence_ipmilan"
>                 ipaddr="10.11.5.131" login="root" name="Motel6_IPMI"
>                 passwd="xxx"/>
>                                <fencedevice agent="fence_ipmilan"
>                 ipaddr="10.11.5.133" login="root" name="Mercure_IPMI"
>                 passwd="XXX"/>
>                        </fencedevices>
>                        <rm>
>                                <failoverdomains>
>                                        <failoverdomain name="Earth"
>                 nofailback="1" ordered="1" restricted="1">
>                                                <failoverdomainnode
>                 name="concorde.riege.de <http://concorde.riege.de>"
>                 priority="1"/>
>                                                <failoverdomainnode
>                 name="motel6.riege.de <http://motel6.riege.de>"
>                 priority="1"/>
>                                                <failoverdomainnode
>                 name="mercure.riege.de <http://mercure.riege.de>"
>                 priority="1"/>
>                                        </failoverdomain>
>                                        <failoverdomain name="Europe"
>                 nofailback="0" ordered="1" restricted="0">
>                                                <failoverdomainnode
>                 name="concorde.riege.de <http://concorde.riege.de>"
>                 priority="2"/>
>                                        </failoverdomain>
>                                        <failoverdomain name="North
>                 America" nofailback="0" ordered="1" restricted="0">
>                                                <failoverdomainnode
>                 name="motel6.riege.de <http://motel6.riege.de>"
>                 priority="2"/>
>                                        </failoverdomain>
>                                        <failoverdomain name="Africa"
>                 nofailback="0" ordered="1" restricted="0">
>                                                <failoverdomainnode
>                 name="mercure.riege.de <http://mercure.riege.de>"
>                 priority="1"/>
>                                        </failoverdomain>
>                                </failoverdomains>
>                                <resources/>
>                                <vm autostart="1" domain="Africa"
>                 exclusive="0" migrate="live"
>                 name="vm64.test.riege.de_64" path="/etc/xen"
>                 recovery="restart"/>
>                                <vm autostart="1" domain="North America"
>                 exclusive="0" migrate="pause" name="rt.test.riege.de_32"
>                 path="/etc/xen" recovery="restart"/>
>                                <vm autostart="1" domain="Africa"
>                 exclusive="0" migrate="pause"
>                 name="poincare.riege.de_32" path="/etc/xen"
>                 recovery="restart"/>
>                                <vm autostart="1" domain="North America"
>                 exclusive="0" migrate="live"
>                 name="jboss.dev.riege.de_64" path="/etc/xen"
>                 recovery="relocate"/>
>                                <vm autostart="1" domain="Africa"
>                 exclusive="0" migrate="live"
>                 name="master.cc3.dev.riege.de_64" path="/etc/xen"
>                 recovery="relocate"/>
>                                <vm autostart="1" domain="Europe"
>                 exclusive="0" migrate="pause"
>                 name="test.alphatrans.scope.riege.com_32"
>                 path="/etc/xen" recovery="relocate"/>
>                                <vm autostart="1" domain="North America"
>                 exclusive="0" migrate="live"
>                 name="slave.cc3.dev.riege.de_64" path="/etc/xen"
>                 recovery="restart"/>
>                                <vm autostart="1" domain="North America"
>                 exclusive="0" migrate="live" name="webmail.riege.com_64"
>                 path="/etc/xen" recovery="relocate"/>
>                                <vm autostart="1" domain="Europe"
>                 exclusive="0" migrate="live"
>                 name="live.rsi.scope.riege.com_64" path="/etc/xen"
>                 recovery="relocate"/>
>                                <vm autostart="1" domain="Europe"
>                 exclusive="0" migrate="pause"
>                 name="qa-16.rsi.scope.riege.com_32" path="/etc/xen"
>                 recovery="relocate"/>
>                                <vm autostart="1" domain="Africa"
>                 exclusive="0" migrate="pause"
>                 name="qa-18.rsi.scope.riege.com_32" path="/etc/xen"
>                 recovery="relocate"/>
>                                <vm autostart="1" domain="Africa"
>                 exclusive="0" migrate="pause"
>                 name="vm32.test.riege.de_32" path="/etc/xen"
>                 recovery="restart"/>
>                                <vm autostart="1" domain="Europe"
>                 exclusive="0" migrate="pause"
>                 name="qa-head.rsi.scope.riege.com_32" path="/etc/xen"
>                 recovery="restart"/>
>                                <vm autostart="1" domain="North America"
>                 exclusive="0" migrate="live" name="mq.dev.riege.de_64"
>                 path="/etc/xen" recovery="relocate"/>
>                                <vm autostart="1" domain="Europe"
>                 exclusive="0" migrate="live"
>                 name="archive.dev.riege.de_64" path="/etc/xen"
>                 recovery="restart"/>
>                        </rm>
>                        <cman quorum_dev_poll="50000"/>
>                        <totem consensus="4800" join="60" token="60000"
>                 token_retransmits_before_loss_const="20"/>
>                        <quorumd device="/dev/mapper/Quorum_Partition"
>                 interval="3" min_score="1" tko="10" votes="2"/>
>                 </cluster>
> 
>                 best regards, Gunther
> 
>                 -- 
>                 .............................................................
>                 Riege Software International GmbH  Fon: +49 (2159) 9148 0
>                 Mollsfeld 10                       Fax: +49 (2159) 9148 11
>                 40670 Meerbusch                    Web: www.riege.com
>                 <http://www.riege.com>
>                 Germany                            E-Mail:
>                 schlegel at riege.com <mailto:schlegel at riege.com>
>                 ---                                ---
>                 Handelsregister:                   Managing Directors:
>                 Amtsgericht Neuss HRB-NR 4207      Christian Riege
>                 USt-ID-Nr.: DE120585842            Gabriele  Riege
>                                                  Johannes  Riege
>                 .............................................................
>                          YOU CARE FOR FREIGHT, WE CARE FOR YOU          
> 
> 
> 
>                 --
>                 Linux-cluster mailing list
>                 Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>                 https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> 
>             -- 
>             Alan A.
> 
>             --
>             Linux-cluster mailing list
>             Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>             https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> 
>         -- 
>         Dave Costakos
>         mailto:david.costakos at gmail.com <mailto:david.costakos at gmail.com>
> 
>         --
>         Linux-cluster mailing list
>         Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>         https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> 
>     -- 
>     Alan A.
> 
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> 
> -- 
> Dave Costakos
> mailto:david.costakos at gmail.com <mailto:david.costakos at gmail.com>
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Gunther Schlegel
Manager IT Infrastructure


.............................................................
Riege Software International GmbH  Fon: +49 (2159) 9148 0
Mollsfeld 10                       Fax: +49 (2159) 9148 11
40670 Meerbusch                    Web: www.riege.com
Germany                            E-Mail: schlegel at riege.com
---                                ---
Handelsregister:                   Managing Directors:
Amtsgericht Neuss HRB-NR 4207      Christian Riege
USt-ID-Nr.: DE120585842            Gabriele  Riege
                                   Johannes  Riege
.............................................................
           YOU CARE FOR FREIGHT, WE CARE FOR YOU          



-------------- next part --------------
A non-text attachment was scrubbed...
Name: schlegel.vcf
Type: text/x-vcard
Size: 346 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090130/0232875f/attachment.vcf>