[Linux-cluster] Nodes leaving and re-joining intermittently

Sun Dec 11 11:12:51 UTC 2011

Thank you for your input :)

The nodes are syncd using NTP. Although I am unsure about the respective
run levels.

I will look into this, thank you.

On Sun, Dec 11, 2011 at 7:16 AM, Dukhan, Meir <Mdukhan at nds.com> wrote:

>
> Are your nodes time synced and how?
>
> We ran into problems of nodes being fenced because NTP problem.
>
> The solution (AFAIR, from the Redhat knowledge base) was to start ntpd
> _before_ cman.
> I'm not sure but there could be an update of openais or ntpd re this issue.
>
> For those of you who have RedHat account, see the RedHat KB article:
>
>        Does cman need to have the time of nodes in sync?
>        https://access.redhat.com/kb/docs/DOC-42471
>
> Hope this help,
>
> Regards,
> -- Meir R. Dukhan
>
> |-----Original Message-----
> |From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-
> |bounces at redhat.com] On Behalf Of Digimer
> |Sent: Sunday, December 11, 2011 0:23 AM
> |To: Matthew Painter
> |Cc: linux clustering
> |Subject: Re: [Linux-cluster] Nodes leaving and re-joining intermittently
> |
> |On 12/10/2011 05:00 PM, Matthew Painter wrote:
> |> The switch was our first thought, but that has been swapped, and while
> |> we are not having nodes fenced anymore (we were daily), this anomoly
> |> remains.
> |>
> |> I will ask for those logs and conf on Monday.
> |>
> |> I think it might be worth reinstalling corosync on this box anyway?
> |> Can't be healthy if it is exiting unclearly. I have has reports of the
> |> rgmanager dying on this box. (pid file but not running) Could that be
> |> related?
> |>
> |> Thanks :)
> |
> |It's impossible to say without knowing your configuration. Please share
> the
> |cluster.conf (only obfuscate passwords, please) along with the log files.
> |The more detail, the better. Versions, distros, network config, etc.
> |
> |Uninstalling corosync is not likely help. RGManager is something fairly
> |high up in the stack, so it's not likely the cause either.
> |
> |Did you configure the timeouts to be very high, by chance? I'm finding it
> |difficult to fathom how the node can withdraw without being fenced, short
> |of cleanly stopping the cluster stack. I suspect there is something
> |important not being said, which the configuration information, versions
> and
> |logs will hopefully expose.
> |
> |--
> |Digimer
> |E-Mail:              digimer at alteeve.com
> |Freenode handle:     digimer
> |Papers and Projects: http://alteeve.com
> |Node Assassin:       http://nodeassassin.org
> |"omg my singularity battery is dead again.
> |stupid hawking radiation." - epitron
> |
> |--
> |Linux-cluster mailing list
> |Linux-cluster at redhat.com
> |https://www.redhat.com/mailman/listinfo/linux-cluster
>
> This message is confidential and intended only for the addressee. If you
> have received this message in error, please immediately notify the
> postmaster at nds.com and delete it from your system as well as any copies.
> The content of e-mails as well as traffic data may be monitored by NDS for
> employment and security purposes.
> To protect the environment please do not print this e-mail unless
> necessary.
>
> An NDS Group Limited company. www.nds.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111211/7166942d/attachment.htm>