[Linux-cluster] Nodes leaving and re-joining intermittently

Dukhan, Meir Mdukhan at nds.com
Sun Dec 11 07:16:49 UTC 2011


Are your nodes time synced and how?

We ran into problems of nodes being fenced because NTP problem.

The solution (AFAIR, from the Redhat knowledge base) was to start ntpd _before_ cman.
I'm not sure but there could be an update of openais or ntpd re this issue.

For those of you who have RedHat account, see the RedHat KB article:

        Does cman need to have the time of nodes in sync?
        https://access.redhat.com/kb/docs/DOC-42471

Hope this help,

Regards,
-- Meir R. Dukhan

|-----Original Message-----
|From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-
|bounces at redhat.com] On Behalf Of Digimer
|Sent: Sunday, December 11, 2011 0:23 AM
|To: Matthew Painter
|Cc: linux clustering
|Subject: Re: [Linux-cluster] Nodes leaving and re-joining intermittently
|
|On 12/10/2011 05:00 PM, Matthew Painter wrote:
|> The switch was our first thought, but that has been swapped, and while
|> we are not having nodes fenced anymore (we were daily), this anomoly
|> remains.
|>
|> I will ask for those logs and conf on Monday.
|>
|> I think it might be worth reinstalling corosync on this box anyway?
|> Can't be healthy if it is exiting unclearly. I have has reports of the
|> rgmanager dying on this box. (pid file but not running) Could that be
|> related?
|>
|> Thanks :)
|
|It's impossible to say without knowing your configuration. Please share the
|cluster.conf (only obfuscate passwords, please) along with the log files.
|The more detail, the better. Versions, distros, network config, etc.
|
|Uninstalling corosync is not likely help. RGManager is something fairly
|high up in the stack, so it's not likely the cause either.
|
|Did you configure the timeouts to be very high, by chance? I'm finding it
|difficult to fathom how the node can withdraw without being fenced, short
|of cleanly stopping the cluster stack. I suspect there is something
|important not being said, which the configuration information, versions and
|logs will hopefully expose.
|
|--
|Digimer
|E-Mail:              digimer at alteeve.com
|Freenode handle:     digimer
|Papers and Projects: http://alteeve.com
|Node Assassin:       http://nodeassassin.org
|"omg my singularity battery is dead again.
|stupid hawking radiation." - epitron
|
|--
|Linux-cluster mailing list
|Linux-cluster at redhat.com
|https://www.redhat.com/mailman/listinfo/linux-cluster

This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postmaster at nds.com and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by NDS for employment and security purposes.
To protect the environment please do not print this e-mail unless necessary.

An NDS Group Limited company. www.nds.com




More information about the Linux-cluster mailing list