[Linux-cluster] Cluster starts, but a node won't rejoin after reboot

Finnur Örn Guðmundsson - TM Software fog at t.is
Thu May 22 17:12:38 UTC 2008


Hi,

 

I'm having the exact same issue on a RHEL 5.2 system, and have a open support case with Redhat. When it will be resolved i can post the details ....

 

Thanks,

Finnur

From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeremy Lyon
Sent: 22. maí 2008 17:04
To: linux clustering
Subject: [Linux-cluster] Cluster starts, but a node won't rejoin after reboot

 

Hi,

I'm running Cluster 2 on RHEL 5.2 (I saw this behavior on 5.1 and updated just yesterday to see if it fixed it, but no luck) and I'm seeing issues when I reboot a node.  I tried increasing the post_join_delay to 60 and the totem token to 25000, but nothing seems to be working.

During the boot when the cman init script runs, I see openais messages on the current running node for anywhere between 15 to 30 seconds:

May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering GATHER state from 0.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Creating commit token because I am the rep.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Saving state aru 89 high seq received 89
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Storing new sequence id for ring 560
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering COMMIT state.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering RECOVERY state.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] position [0] member 151.117.65.61:
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] previous ring seq 1372 rep 151.117.65.61
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] aru 89 high delivered 89 received flag 1
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Did not need to originate any messages in recovery.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Sending initial ORF token
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] CLM CONFIGURATION CHANGE
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] New Configuration:
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ]         r(0) ip(151.117.65.61)
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] Members Left:
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] Members Joined:
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] CLM CONFIGURATION CHANGE
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] New Configuration:
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ]         r(0) ip(151.117.65.61)
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] Members Left:
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] Members Joined:
May 22 11:52:20 lxomp83k openais[3602]: [SYNC ] This node is within the primary component and will provide service.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering OPERATIONAL state.
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] got nodejoin message 151.117.65.61
May 22 11:52:20 lxomp83k openais[3602]: [CPG  ] got joinlist message from node 1
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering GATHER state from 9.

That repeats until I finally see this...

May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Creating commit token because I am the rep.
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Saving state aru 89 high seq received 89
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Storing new sequence id for ring 568
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] entering COMMIT state.
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] entering RECOVERY state.
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] position [0] member 151.117.65.61:
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] previous ring seq 1380 rep 151.117.65.61
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] aru 89 high delivered 89 received flag 1
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] position [1] member 151.117.65.62:
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] previous ring seq 1368 rep 151.117.65.62
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] aru c high delivered c received flag 1
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Did not need to originate any messages in recovery.
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Sending initial ORF token
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] CLM CONFIGURATION CHANGE
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] New Configuration:
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ]         r(0) ip(151.117.65.61)
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] Members Left:
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] Members Joined:
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] CLM CONFIGURATION CHANGE
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] New Configuration:
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ]         r(0) ip(151.117.65.61)
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ]         r(0) ip(151.117.65.62)
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] Members Left:
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] Members Joined:
May 22 11:52:27 lxomp83k openais[3602]: [CLM  ]         r(0) ip(151.117.65.62)
May 22 11:52:27 lxomp83k openais[3602]: [SYNC ] This node is within the primary component and will provide service.
May 22 11:52:27 lxomp83k openais[3602]: [TOTEM] entering OPERATIONAL state.
May 22 11:52:27 lxomp83k openais[3602]: [MAIN ] Killing node lxomp84k because it has rejoined the cluster with existing state


At this point when the second node comes up, I can login and run service cman stop and service cman start.  On that start the node joins the cluster immediately with no issue.


[root at lxomp84k ~]# uname -a
Linux lxomp84k 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[root at lxomp84k ~]# rpm -q cman
cman-2.0.84-2.el5


Any suggestions??

TIA,
Jeremy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080522/964d4a31/attachment.htm>


More information about the Linux-cluster mailing list