[Linux-cluster] NTP sync cause CNAM shutdown

Alvaro Jose Fernandez alvaro.fernandez at sivsa.com
Wed Oct 12 15:52:11 UTC 2011


Jean,

I too suffered the same issue, opened a case with support, etc. The best option running ntpd and RHCS are:

-First, start the cman, rgmanager, etc. (I mean, all the RHCS daemons) always after ntpd startup. In RHEL5 at least the default is the other way around. 

You can do that if you disable all RHCS daemons (via chkconfig off) from automatic startup, and then, starting them explicitly via your rc.local init script, as the last init sequence action (ie, after the network, basic systems, and most importantly after ntpd initially adjusted the clock, via it's "ntpdate" call.

Be aware that if you do the above, you must explicitly (manually) stop them if you need to shutdown the cluster or the nodes, as with this hack, the init scripts of cman, rgmanager, etc , won't run for the "kill"/shutdown sequence.

-Start the ntpd using the "slew" mode ( -x startup flag), in the configuration file. Running it in slew mode makes ntpd adjust the time over a large time span, enough to assure that CMAN internal timings won't get messed.

Using that hack was Ok for me, no more node evictions or unexpected problems since.

There is a FAQ and best practices document in Redhat Network for NTPD and RHCS, updated few months ago as I recall. Just search for it in the Redhat Network website (sorry, I don't have the link for the DOC at the moment)

regards,


Álvaro Fernández 
 Departamento de Sistemas_

-------
Hi,

I post previous email asking what was wrong in my two nodes cluster.conf. I think I found it and have some question.

The problem was two nodes boot, join then cman shutdown with :
Oct 12 15:55:30 s64lmwbig3c openais[7672]: [MAIN ] Killing node s64lmwbig3b because it has rejoined the cluster with existing state Oct 12 15:55:30 s64lmwbig3c openais[7672]: [CMAN ] cman killed by node 1 because we rejoined the cluster without a full restart

Few seconds before, ntpd sync and jump forward with 7200 sec (2 hours, my timzone is GMT + 2).

My questions are:
Which date do you set up in your bios (GMT, your time zone)?
Do you use ntpd ? all documentations say to use it.
What are best practices about ntp and RHCS?

Jean-Daniel BONNETOT




More information about the Linux-cluster mailing list