[Linux-cluster] RE: [Openais] Basic cluster not starting

Steven Dake sdake at redhat.com
Tue Jul 10 22:12:43 UTC 2007


I would be very appreciative if you could try a test RPM of openais for
me to see if it resolves your problem.

If your willing please let me know what your architecture is and I'll
build you one.

Regards
-steve
On Tue, 2007-07-10 at 18:05 -0400, james anderson wrote:
> Steve/Paul,
>  
> I am not sure why, but my emails to the linux-cluster forum have been
> getting eaten?!
>  
> In short when node 3 is shutdown the other 2 nodes lose quorum with
> each other. This seems wrong. Any ideas?
>  
> *** Steady state cluster happy***
> [root at node1 ~]# cman_tool nodes
> Node Sts Inc Joined Name
> 1 M 704 2007-07-10 13:47:51 node1
> 2 M 708 2007-07-10 13:52:54 node2
> 3 M 708 2007-07-10 13:52:54 node3
>  
> *** node 3 shutdown ***
> [root at node1 ~]# cman_tool nodes
> Node Sts Inc Joined Name
> 1 M 704 2007-07-10 13:47:51 node1
> 2 X 708 node2
> 3 X 708 node3
>  
> *** Time elapsed node 3 still down ***
> [root at node1 ~]# cman_tool nodes
> NOTE: There are 1 disallowed nodes,
> members list may seem inconsistent across the cluster
> Node Sts Inc Joined Name
> 1 M 704 2007-07-10 13:47:51 node1
> 2 d 708 2007-07-10 13:52:54 node2
> 3 X 708 node3
>  
> Jul 10 13:52:54 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE
> Jul 10 13:52:54 node2 openais[3136]: [CLM ] New Configuration:
> Jul 10 13:52:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.18)
> Jul 10 13:52:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19)
> Jul 10 13:52:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.20)
> Jul 10 13:52:54 node2 openais[3136]: [CLM ] Members Left:
> Jul 10 13:52:54 node2 openais[3136]: [CLM ] Members Joined:
> Jul 10 13:52:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.18)
> Jul 10 13:52:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.20)
> Jul 10 13:52:54 node2 openais[3136]: [SYNC ] This node is within the
> primary component and will provide service.
> Jul 10 13:52:54 node2 openais[3136]: [TOTEM] entering OPERATIONAL
> state.
> Jul 10 13:52:54 node2 openais[3136]: [CMAN ] quorum regained, resuming
> activity
> Jul 10 13:52:54 node2 openais[3136]: [CLM ] got nodejoin message
> 10.1.1.18
> Jul 10 13:52:54 node2 openais[3136]: [CLM ] got nodejoin message
> 10.1.1.19
> Jul 10 13:52:54 node2 openais[3136]: [CLM ] got nodejoin message
> 10.1.1.20
> Jul 10 13:52:54 node2 openais[3136]: [CPG ] got joinlist message from
> node 1
> Jul 10 13:52:54 node2 openais[3136]: [CPG ] got joinlist message from
> node 2
> Jul 10 13:52:54 node2 openais[3136]: [CPG ] got joinlist message from
> node 3
> Jul 10 13:59:28 node2 openais[3136]: [TOTEM] The token was lost in the
> OPERATIONAL state.
> Jul 10 13:59:28 node2 openais[3136]: [TOTEM] Receive multicast socket
> recv buffer size (262142 bytes).
> Jul 10 13:59:28 node2 openais[3136]: [TOTEM] Transmit multicast socket
> send buffer size (262142 bytes).
> Jul 10 13:59:28 node2 openais[3136]: [TOTEM] entering GATHER state
> from 2.
> Jul 10 13:59:32 node2 openais[3136]: [TOTEM] entering GATHER state
> from 0.
> Jul 10 13:59:32 node2 openais[3136]: [TOTEM] Creating commit token
> because I am the rep.
> Jul 10 13:59:32 node2 openais[3136]: [TOTEM] Saving state aru 21 high
> seq received 21
> Jul 10 13:59:32 node2 openais[3136]: [TOTEM] Storing new sequence id
> for ring 2c8
> Jul 10 13:59:32 node2 openais[3136]: [TOTEM] entering COMMIT state.
> Jul 10 13:59:32 node2 openais[3136]: [TOTEM] entering RECOVERY state.
> Jul 10 13:59:32 node2 openais[3136]: [TOTEM] position [0] member
> 10.1.1.19:
> Jul 10 13:59:32 node2 openais[3136]: [TOTEM] previous ring seq 708 rep
> 10.1.1.18
> Jul 10 13:59:32 node2 openais[3136]: [TOTEM] aru 21 high delivered 21
> received flag 0
> Jul 10 13:59:32 node2 openais[3136]: [TOTEM] Did not need to originate
> any messages in recovery.
> Jul 10 13:59:32 node2 openais[3136]: [TOTEM] Sending initial ORF token
> Jul 10 13:59:32 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE
> Jul 10 13:59:32 node2 openais[3136]: [CLM ] New Configuration:
> Jul 10 13:59:32 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19)
> Jul 10 13:59:32 node2 openais[3136]: [CLM ] Members Left:
> Jul 10 13:59:32 node2 openais[3136]: [CLM ] no interface found for
> nodeid
> Jul 10 13:59:32 node2 openais[3136]: [CLM ] no interface found for
> nodeid
> Jul 10 13:59:32 node2 openais[3136]: [CLM ] Members Joined:
> Jul 10 13:59:32 node2 openais[3136]: [CMAN ] quorum lost, blocking
> activity
> Jul 10 13:59:32 node2 openais[3136]: [SYNC ] This node is within the
> primary component and will provide service.
> Jul 10 13:59:32 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE
> Jul 10 13:59:32 node2 openais[3136]: [CLM ] New Configuration:
> Jul 10 13:59:32 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19)
> Jul 10 13:59:32 node2 openais[3136]: [CLM ] Members Left:
> Jul 10 13:59:32 node2 openais[3136]: [CLM ] Members Joined:
> Jul 10 13:59:32 node2 openais[3136]: [SYNC ] This node is within the
> primary component and will provide service.
> Jul 10 13:59:32 node2 openais[3136]: [TOTEM] entering OPERATIONAL
> state.
> Jul 10 13:59:32 node2 openais[3136]: [CLM ] got nodejoin message
> 10.1.1.19
> Jul 10 13:59:32 node2 openais[3136]: [CPG ] got joinlist message from
> node 2
> Jul 10 14:02:54 node2 openais[3136]: [TOTEM] entering GATHER state
> from 9.
> Jul 10 14:02:54 node2 openais[3136]: [TOTEM] Saving state aru b high
> seq received b
> Jul 10 14:02:54 node2 openais[3136]: [TOTEM] Storing new sequence id
> for ring 2cc
> Jul 10 14:02:54 node2 openais[3136]: [TOTEM] entering COMMIT state.
> Jul 10 14:02:54 node2 openais[3136]: [TOTEM] entering RECOVERY state.
> Jul 10 14:02:54 node2 openais[3136]: [TOTEM] position [0] member
> 10.1.1.18:
> Jul 10 14:02:54 node2 openais[3136]: [TOTEM] previous ring seq 712 rep
> 10.1.1.18
> Jul 10 14:02:54 node2 openais[3136]: [TOTEM] aru b high delivered b
> received flag 0
> Jul 10 14:02:54 node2 openais[3136]: [TOTEM] position [1] member
> 10.1.1.19:
> Jul 10 14:02:54 node2 openais[3136]: [TOTEM] previous ring seq 712 rep
> 10.1.1.19
> Jul 10 14:02:54 node2 openais[3136]: [TOTEM] aru b high delivered b
> received flag 0
> Jul 10 14:02:54 node2 openais[3136]: [TOTEM] Did not need to originate
> any messages in recovery.
> Jul 10 14:02:54 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE
> Jul 10 14:02:54 node2 openais[3136]: [CLM ] New Configuration:
> Jul 10 14:02:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19)
> Jul 10 14:02:54 node2 openais[3136]: [CLM ] Members Left:
> Jul 10 14:02:54 node2 openais[3136]: [CLM ] Members Joined:
> Jul 10 14:02:54 node2 openais[3136]: [SYNC ] This node is within the
> primary component and will provide service.
> Jul 10 14:02:54 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE
> Jul 10 14:02:54 node2 openais[3136]: [CLM ] New Configuration:
> Jul 10 14:02:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.18)
> Jul 10 14:02:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19)
> Jul 10 14:02:54 node2 openais[3136]: [CLM ] Members Left:
> Jul 10 14:02:54 node2 openais[3136]: [CLM ] Members Joined:
> Jul 10 14:02:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.18)
> Jul 10 14:02:54 node2 openais[3136]: [SYNC ] This node is within the
> primary component and will provide service.
> Jul 10 14:02:54 node2 openais[3136]: [TOTEM] entering OPERATIONAL
> state.
> Jul 10 14:02:54 node2 openais[3136]: [MAIN ] Node node1 not joined to
> cman because it has rejoined an inquorate cluster
> Jul 10 14:02:54 node2 openais[3136]: [CLM ] got nodejoin message
> 10.1.1.18
> Jul 10 14:02:54 node2 openais[3136]: [CLM ] got nodejoin message
> 10.1.1.19
> Jul 10 14:02:54 node2 openais[3136]: [CPG ] got joinlist message from
> node 1
> Jul 10 14:02:54 node2 openais[3136]: [CPG ] got joinlist message from
> node 2
>  
> *** node 3 back up ***
> [root at node1 init.d]# cman_tool nodes
> Node Sts Inc Joined Name
> 1 M 704 2007-07-10 13:47:51 node1
> 2 X 708 node2
> 3 X 708 node3
>  
> Jul 10 14:13:09 node2 openais[3136]: [TOTEM] The consensus timeout
> expired.
> Jul 10 14:13:09 node2 openais[3136]: [TOTEM] entering GATHER state
> from 0.
> Jul 10 14:13:09 node2 openais[3136]: [TOTEM] entering GATHER state
> from 3.
> Jul 10 14:13:09 node2 openais[3136]: [TOTEM] Creating commit token
> because I am the rep.
> Jul 10 14:13:09 node2 openais[3136]: [TOTEM] Saving state aru 16 high
> seq received 16
> Jul 10 14:13:09 node2 openais[3136]: [TOTEM] Storing new sequence id
> for ring 2d0
> Jul 10 14:13:09 node2 openais[3136]: [TOTEM] entering COMMIT state.
> Jul 10 14:13:09 node2 openais[3136]: [TOTEM] entering RECOVERY state.
> Jul 10 14:13:09 node2 openais[3136]: [TOTEM] position [0] member
> 10.1.1.19:
> Jul 10 14:13:09 node2 openais[3136]: [TOTEM] previous ring seq 716 rep
> 10.1.1.18
> Jul 10 14:13:09 node2 openais[3136]: [TOTEM] aru 16 high delivered 16
> received flag 0
> Jul 10 14:13:09 node2 openais[3136]: [TOTEM] Did not need to originate
> any messages in recovery.
> Jul 10 14:13:09 node2 openais[3136]: [TOTEM] Sending initial ORF token
> Jul 10 14:13:09 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE
> Jul 10 14:13:09 node2 openais[3136]: [CLM ] New Configuration:
> Jul 10 14:13:09 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19)
> Jul 10 14:13:09 node2 openais[3136]: [CLM ] Members Left:
> Jul 10 14:13:09 node2 openais[3136]: [CLM ] no interface found for
> nodeid
> Jul 10 14:13:09 node2 openais[3136]: [CLM ] Members Joined:
> Jul 10 14:13:09 node2 openais[3136]: [SYNC ] This node is within the
> primary component and will provide service.
> Jul 10 14:13:09 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE
> Jul 10 14:13:09 node2 openais[3136]: [CLM ] New Configuration:
> Jul 10 14:13:09 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19)
> Jul 10 14:13:09 node2 openais[3136]: [CLM ] Members Left:
> Jul 10 14:13:09 node2 openais[3136]: [CLM ] Members Joined:
> Jul 10 14:13:09 node2 openais[3136]: [SYNC ] This node is within the
> primary component and will provide service.
> Jul 10 14:13:09 node2 openais[3136]: [TOTEM] entering OPERATIONAL
> state.
> Jul 10 14:13:09 node2 openais[3136]: [CLM ] got nodejoin message
> 10.1.1.19
> Jul 10 14:13:09 node2 openais[3136]: [CPG ] got joinlist message from
> node 2
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] entering GATHER state
> from 9.
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] entering GATHER state
> from 11.
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] Saving state aru b high
> seq received b
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] Storing new sequence id
> for ring 2d4
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] entering COMMIT state.
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] entering RECOVERY state.
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] position [0] member
> 10.1.1.18:
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] previous ring seq 720 rep
> 10.1.1.18
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] aru b high delivered b
> received flag 0
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] position [1] member
> 10.1.1.19:
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] previous ring seq 720 rep
> 10.1.1.19
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] aru b high delivered b
> received flag 0
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] position [2] member
> 10.1.1.20:
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] previous ring seq 720 rep
> 10.1.1.20
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] aru b high delivered b
> received flag 0
> Jul 10 14:22:54 node2 openais[3136]: [TOTEM] Did not need to originate
> any messages in recovery.
> Jul 10 14:22:54 node2 openais[3136]: [CLM ] CLM CONFIGURATION CHANGE
> Jul 10 14:22:54 node2 openais[3136]: [CLM ] New Configuration:
> Jul 10 14:22:54 node2 openais[3136]: [CLM ] r(0) ip(10.1.1.19)
> Jul 10 14:22:54 node2 openais[3136]: [CLM ] Members Left:
> Jul 10 14:22:54 node2 gfs_controld[3164]: groupd_dispatch error -1
> errno 11
> Jul 10 14:22:54 node2 gfs_controld[3164]: groupd connection died
> Jul 10 14:22:54 node2 gfs_controld[3164]: cluster is down, exiting
> Jul 10 14:22:54 node2 dlm_controld[3158]: groupd is down, exiting
> Jul 10 14:23:20 node2 ccsd[3130]: Unable to connect to cluster
> infrastructure after 30 seconds.
> Jul 10 14:23:50 node2 ccsd[3130]: Unable to connect to cluster
> infrastructure after 60 seconds.
>  
> *** node2 cman crashed ***
> [root at node1 init.d]# cman_tool nodes
> Node Sts Inc Joined Name
> 1 M 704 2007-07-10 13:47:51 node1
> 2 X 708 node2
> 3 X 724 node3
>  
> [root at node2 init.d]# cman_tool nodes
> cman_tool: Cannot open connection to cman, is it running ?
>  
> [root at node3 init.d]# cman_tool nodes
> Node Sts Inc Joined Name
> 1 X 724 node1
> 2 X 724 node2
> 3 M 712 2007-07-10 14:07:13 node3
>  
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] CLM CONFIGURATION CHANGE
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] New Configuration:
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.18)
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] Members Left:
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] Members Joined:
> Jul 10 14:42:55 node1 openais[3166]: [SYNC ] This node is within the
> primary component and will provide service.
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] CLM CONFIGURATION CHANGE
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] New Configuration:
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.18)
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.19)
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.20)
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] Members Left:
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] Members Joined:
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.19)
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.20)
> Jul 10 14:42:55 node1 openais[3166]: [SYNC ] This node is within the
> primary component and will provide service.
> Jul 10 14:42:55 node1 openais[3166]: [TOTEM] entering OPERATIONAL
> state.
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] got nodejoin message
> 10.1.1.18
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] got nodejoin message
> 10.1.1.19
> Jul 10 14:42:55 node1 openais[3166]: [CLM ] got nodejoin message
> 10.1.1.20
> Jul 10 14:42:55 node1 openais[3166]: [TOTEM] Retransmit List: c
> Jul 10 14:42:55 node1 openais[3166]: [TOTEM] Retransmit List: c
> Jul 10 14:42:55 node1 openais[3166]: [TOTEM] Retransmit List: c d
> Jul 10 14:43:01 node1 last message repeated 47 times
> Jul 10 14:43:21 node1 openais[3166]: [TOTEM] The token was lost in the
> OPERATIONAL state.
> Jul 10 14:43:21 node1 openais[3166]: [TOTEM] Receive multicast socket
> recv buffer size (262142 bytes).
> Jul 10 14:43:21 node1 openais[3166]: [TOTEM] Transmit multicast socket
> send buffer size (262142 bytes).
> Jul 10 14:43:21 node1 openais[3166]: [TOTEM] entering GATHER state
> from 2.
> Jul 10 14:43:26 node1 openais[3166]: [TOTEM] entering GATHER state
> from 0.
> Jul 10 14:43:26 node1 openais[3166]: [TOTEM] Creating commit token
> because I am the rep.
> Jul 10 14:43:26 node1 openais[3166]: [TOTEM] Saving state aru b high
> seq received d
> Jul 10 14:43:26 node1 openais[3166]: [TOTEM] Storing new sequence id
> for ring 2ec
> Jul 10 14:43:26 node1 openais[3166]: [TOTEM] entering COMMIT state.
> Jul 10 14:43:26 node1 openais[3166]: [TOTEM] entering RECOVERY state.
> Jul 10 14:43:26 node1 openais[3166]: [TOTEM] position [0] member
> 10.1.1.18:
> Jul 10 14:43:26 node1 openais[3166]: [TOTEM] previous ring seq 744 rep
> 10.1.1.18
> Jul 10 14:43:26 node1 openais[3166]: [TOTEM] aru b high delivered b
> received flag 0
> Jul 10 14:43:26 node1 openais[3166]: [TOTEM] copying all old ring
> messages from c-d.
> Jul 10 14:43:26 node1 openais[3166]: [TOTEM] Originated 0 messages in
> RECOVERY.
> Jul 10 14:43:26 node1 openais[3166]: [TOTEM] Originated for recovery:
> Jul 10 14:43:26 node1 openais[3166]: [TOTEM] Not Originated for
> recovery: c d
> Jul 10 14:43:26 node1 openais[3166]: [TOTEM] Sending initial ORF token
> Jul 10 14:43:26 node1 openais[3166]: [CLM ] CLM CONFIGURATION CHANGE
> Jul 10 14:43:26 node1 openais[3166]: [CLM ] New Configuration:
> Jul 10 14:43:26 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.18)
> Jul 10 14:43:26 node1 openais[3166]: [CLM ] Members Left:
> Jul 10 14:43:26 node1 openais[3166]: [CLM ] no interface found for
> nodeid
> Jul 10 14:43:26 node1 openais[3166]: [CLM ] no interface found for
> nodeid
> Jul 10 14:43:26 node1 openais[3166]: [CLM ] Members Joined:
> Jul 10 14:43:26 node1 openais[3166]: [CMAN ] quorum lost, blocking
> activity
> Jul 10 14:43:26 node1 openais[3166]: [SYNC ] This node is within the
> primary component and will provide service.
> Jul 10 14:43:26 node1 openais[3166]: [CLM ] CLM CONFIGURATION CHANGE
> Jul 10 14:43:26 node1 openais[3166]: [CLM ] New Configuration:
> Jul 10 14:43:26 node1 openais[3166]: [CLM ] r(0) ip(10.1.1.18)
> Jul 10 14:43:26 node1 openais[3166]: [CLM ] Members Left:
> Jul 10 14:43:26 node1 openais[3166]: [CLM ] Members Joined:
> Jul 10 14:43:26 node1 openais[3166]: [SYNC ] This node is within the
> primary component and will provide service.
> Jul 10 14:43:26 node1 openais[3166]: [TOTEM] entering OPERATIONAL
> state.
> Jul 10 14:43:26 node1 openais[3166]: [CLM ] got nodejoin message
> 10.1.1.18
> Jul 10 14:43:26 node1 openais[3166]: [CPG ] got joinlist message from
> node 1
>  
>  
>  
> Let me know what else I can do to narrow this problem down.
>  
> Thank you for the help :)
> James
> 
> 
> > Subject: RE: [Openais] Basic cluster not starting
> > From: sdake at redhat.com
> > To: jamesanderson1 at hotmail.com
> > CC: openais at lists.linux-foundation.org; linux-cluster at redhat.com
> > Date: Mon, 9 Jul 2007 13:11:39 -0700
> > 
> > Explain crashes whole cluster? Could you send cman_tool nodes after
> > fence but before the node restarts? (ie: fence it then unplug its
> power
> > cord or use the power gui :)
> > 
> > Thanks
> > -steve
> > 
> > 
> > On Mon, 2007-07-09 at 12:47 -0400, james anderson wrote:
> > > Steve/Patrick,
> > > 
> > > Thanks for the replies :)
> > > 
> > > I found the following FC6 x86_64 updates and applied them to all 3
> > > nodes:
> > > rpm -ivh xen-libs-3.0.3-9.fc6.x86_64.rpm
> > > rpm -ivh --nodeps libvirt-0.2.3-1.fc6.x86_64.rpm
> > > rpm -ivh bridge-utils-1.1-2.x86_64.rpm
> > > rpm -ivh libvirt-python-0.2.3-1.fc6.x86_64.rpm
> > > rpm -ivh python-virtinst-0.95.0-1.fc6.noarch.rpm
> > > rpm -ivh xen-3.0.3-9.fc6.x86_64.rpm
> > > rpm -Uvh cman-2.0.60-1.fc6.x86_64.rpm
> > > 
> > > After installing these I triple checked that the cluster.conf
> files
> > > are identical. I then rebooted them all and restarted the cman
> > > service. The good news is that the basic cluster now works! The
> bad
> > > news: fencing a node crashes the whole cluster, also conga has
> some
> > > serious problems. I will post those in seperate emails.
> > > 
> > > Just wanted to tie up this thread for anyone else encountering the
> > > same problem. If anyone has had the same experience please post so
> my
> > > findings can be confirmed.
> > > 
> > > Cheers,
> > > James
> > > 
> > > 
> > > > Subject: Re: [Openais] Basic cluster not starting
> > > > From: sdake at redhat.com
> > > > To: jamesanderson1 at hotmail.com
> > > > CC: openais at lists.linux-foundation.org; linux-cluster at redhat.com
> > > > Date: Sat, 7 Jul 2007 18:06:07 -0700
> > > > 
> > > > James,
> > > > 
> > > > Let me speak with Patrick Caulfield on this topic Monday.
> > > > 
> > > > I have not seen this before in any of our testing, but it is
> > > possible
> > > > someone else using RHCS has. I've also copied the linux-cluster
> > > list.
> > > > 
> > > > The problem appears to be, however, with something relating to
> ccs
> > > or
> > > > the startup order. The opennais code doesn't know anything about
> the
> > > > ccsd node ids or parsing of the xml configuration file. That
> work is
> > > > done by ccsd and cman.
> > > > 
> > > > Did you try the cman init script?
> > > > 
> > > > Regards
> > > > -steve
> > > > 
> > > > On Thu, 2007-07-05 at 14:21 -0400, james anderson wrote:
> > > > > I am attempting to install GFS on FC6 64bit using RPMs.
> > > > > Below you will find my config and steps taken to get a GFS
> cluster
> > > > > running.
> > > > > I am unclear if the problem is with OpenAIS or RHCS.
> > > > > 
> > > > > 
> > > > > FC6 64bit RPMs
> > > > > --------------
> > > > > rpm -ivh openais-0.80.1-3.x86_64.rpm
> > > > > rpm -ivh perl-Net-Telnet-3.03-5.noarch.rpm
> > > > > rpm -ivh cman-2.0.18-2.fc6.x86_64.rpm
> > > > > System config cluster
> > > > > rpm -ivh system-config-cluster-1.0.29-1.0.noarch.rpm
> > > > > Luci
> > > > > rpm -ivh python-imaging-1.1.6-3.fc6.x86_64.rpm
> > > > > rpm -ivh zope-2.9.7-2.fc6.x86_64.rpm
> > > > > rpm -ivh plone-2.5.3-1.fc6.x86_64.rpm
> > > > > rpm -ivh luci-0.9.3-2.fc6.x86_64.rpm
> > > > > Ricci
> > > > > rpm -ivh --nodeps oddjob-libs-0.27-8.x86_64.rpm
> > > > > rpm -ivh oddjob-0.27-8.x86_64.rpm
> > > > > rpm -ivh modcluster-0.9.3-2.fc6.x86_64.rpm
> > > > > rpm -ivh ricci-0.9.3-2.fc6.x86_64.rpm
> > > > > 
> > > > > 
> > > > > /etc/cluster/cluster.conf
> > > > > -------------------------
> > > > > <?xml version="1.0"?>
> > > > > <cluster alias="alpha_cluster" config_version="8"
> > > > > name="alpha_cluster">
> > > > > <fence_daemon post_fail_delay="0" post_join_delay="3"/>
> > > > > <clusternodes>
> > > > > <clusternode name="node1" nodeid="1" votes="1">
> > > > > <multicast addr="239.192.196.121" interface="eth1"/>
> > > > > <fence>
> > > > > <method name="1">
> > > > > <device name="nps1" port="1" switch="1"/>
> > > > > </method>
> > > > > </fence>
> > > > > </clusternode>
> > > > > <clusternode name="node2" nodeid="2" votes="1">
> > > > > <multicast addr="239.192.196.121" interface="eth0"/>
> > > > > <fence>
> > > > > <method name="1">
> > > > > <device name="nps1" port="2" switch="1"/>
> > > > > </method>
> > > > > </fence>
> > > > > </clusternode>
> > > > > <clusternode name="node3" nodeid="3" votes="1">
> > > > > <multicast addr="239.192.196.121" interface="eth2"/>
> > > > > <fence>
> > > > > <method name="1">
> > > > > <device name="nps1" port="3" switch="1"/>
> > > > > </method>
> > > > > </fence>
> > > > > </clusternode>
> > > > > </clusternodes>
> > > > > <cman>
> > > > > <multicast addr="239.192.196.121"/>
> > > > > </cman>
> > > > > <fencedevices>
> > > > > <fencedevice agent="fence_apc" ipaddr="10.1.1.123"
> login="root"
> > > > > name="***" passwd="***"/>
> > > > > </fencedevices>
> > > > > <rm>
> > > > > <failoverdomains/>
> > > > > <resources/>
> > > > > </rm>
> > > > > </cluster>
> > > > > 
> > > > > 
> > > > > Commands
> > > > > --------
> > > > > # modprobe lock_dlm
> > > > > # modprobe dlm
> > > > > # mount -t configfs non /sys/kernel/config
> > > > > # ccsd
> > > > > # cman_tool join
> > > > > 
> > > > > 
> > > > > /var/log/messages
> > > > > -----------------
> > > > > 1 Jul 2 14:50:16 node1 ccsd[22457]: Starting ccsd 2.0.18:
> > > > > 2 Jul 2 14:50:16 node1 ccsd[22457]: Built: Oct 1 2006 17:18:46
> > > > > 3 Jul 2 14:50:16 node1 ccsd[22457]: Copyright (C) Red Hat,
> Inc.
> > > 2004
> > > > > All rights reserved.
> > > > > 4 Jul 2 14:50:45 node1 ccsd[22457]: Unable to connect to
> cluster
> > > > > infrastructure after 30 seconds.
> > > > > 5 Jul 2 14:51:15 node1 ccsd[22457]: Unable to connect to
> cluster
> > > > > infrastructure after 60 seconds.
> > > > > 6 Jul 2 14:51:39 node1 ccsd[22457]: cluster.conf (cluster name
> =
> > > > > alpha_cluster, version = 6) found.
> > > > > 7 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] AIS Executive
> > > Service
> > > > > RELEASE 'subrev 1204 version 0.80.1'
> > > > > 8 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] Copyright (C)
> > > 2002-2006
> > > > > MontaVista Software, Inc and contributors.
> > > > > 9 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] Copyright (C)
> 2006
> > > Red
> > > > > Hat, Inc.
> > > > > 10 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] No nodeid
> > > specified in
> > > > > cluster.conf
> > > > > 11 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] Error reading
> CCS
> > > > > info, cannot start
> > > > > 12 Jul 2 14:51:41 node1 openais[22542]: [MAIN ]
> > > > > 13 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] AIS Executive
> > > exiting
> > > > > (-9).
> > > > > 14 Jul 2 14:51:45 node1 ccsd[22457]: Unable to connect to
> cluster
> > > > > infrastructure after 90 seconds.
> > > > > 15 Jul 2 14:52:15 node1 ccsd[22457]: Unable to connect to
> cluster
> > > > > infrastructure after 120 seconds.
> > > > > 16 Jul 2 14:52:44 node1 ccsd[22457]: Stopping ccsd, SIGTERM
> > > received.
> > > > > 
> > > > > Lines 1-6 are from running the "ccsd" command above.
> > > > > Lines 7-13 are from running the "cman_tool join" command
> above.
> > > > > 
> > > > > I also received the following error message:
> > > > > cman not started: CCS does not have a nodeid for this node,
> run
> > > > > 'ccs_tool addnodeids' to fix
> > > > > cman_tool: aisexec daemon didn't start
> > > > > 
> > > > > Yes I did try running the ccs_tool addnodeids. It did not
> help.
> > > Notice
> > > > > in the cluster.conf the nodeids were already in place. Any
> > > pointers to
> > > > > narrowing down my problem are appreciated.
> > > > > 
> > > > > Thanks,
> > > > > James
> > > > > 
> > > > > 
> > > > > 
> > > > >
> > >
> ______________________________________________________________________
> > > > > See what you’re getting into…before you go there. Check it
> out!
> > > > > _______________________________________________
> > > > > Openais mailing list
> > > > > Openais at lists.linux-foundation.org
> > > > > https://lists.linux-foundation.org/mailman/listinfo/openais
> > > > 
> > > 
> > > 
> > > 
> > >
> ______________________________________________________________________
> > > Missed the show? Watch videos of the Live Earth Concert on MSN.
> See
> > > them now!
> > 
> 
> 
> 
> ______________________________________________________________________
> PC Magazine’s 2007 editors’ choice for best web mail—award-winning
> Windows Live Hotmail. Check it out!




More information about the Linux-cluster mailing list