[Linux-cluster] NTP sync cause CNAM shutdown

Alvaro Jose Fernandez alvaro.fernandez at sivsa.com
Thu Oct 13 16:58:42 UTC 2011


Hi Jean,

The DOC is https://access.redhat.com/kb/docs/DOC-42471 .

But, at Steven Drake said in a previous email, if you *can* upgrade to RHEL6, sure that would be the best option (I just cannot upgrade my customer, he will die on RHEL 5.x). 

In RHEL6, the cluster daemons are different and use other API, unlike openais.

Best regards.

Álvaro Fernández 
 Departamento de Sistemas_
  
________________________________

SIVSA, Soluciones Informáticas S.A. 
Arenal nº 18 · 3ª Planta · 36201 · Vigo 
Teléfono: (+34)  986 092 100  
Fax: (+34)  986 092 219
e-mail: alvaro.fernandez at sivsa.com
www.sivsa.com
España_
 
******************************  ADVERTENCIA LEGAL  ****************************
En cumplimiento de la Ley de Servicios de la Sociedad de la Información y de Comercio Electrónico (LSSI-CE), y de la vigente Ley Orgánica 15/1999 de 13 de Diciembre de Protección de Datos de Carácter Personal (LOPD), le informamos que su dirección de correo electrónico figura en este momento en la base de datos  de SIVSA, Soluciones Informáticas, S.A,  con domicilio en la calle Areal nº 18 - 3ª planta, Vigo (Pontevedra),  que, como responsable del fichero, le garantiza el ejercicio de sus derechos de acceso, rectificación, cancelación y oposición de los datos facilitados, en los términos y condiciones previstos en la propia LOPD, mediante una comunicación por escrito dirigida a la dirección indicada, a la atención del "Departamento de Administración".  De no ser así, se entiende que usted consiente expresamente que sus datos puedan ser utilizados por SIVSA con fines publicitarios, promocionales y de marketing, en relación con sus propios productos y servicios. 

Este mensaje va dirigido, de manera exclusiva, a su destinatario y contiene información confidencial y sujeta al secreto profesional, cuya divulgación no está permitida por la ley. En caso de haber recibido este mensaje por error, le rogamos que, de forma inmediata, nos lo comunique mediante correo electrónico remitido a nuestra atención o a través del teléfono (+ 34) 986 092 100 y proceda a su eliminación, así como a la de cualquier documento adjunto al mismo. Asimismo, le comunicamos que la distribución, copia o utilización de este mensaje, o de cualquier documento adjunto al mismo, cualquiera que fuera su finalidad, están prohibidas por la ley."


-----Mensaje original-----
De: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] En nombre de BONNETOT Jean-Daniel (EXT THALES)
Enviado el: jueves, 13 de octubre de 2011 18:15
Para: linux clustering
Asunto: Re: [Linux-cluster] NTP sync cause CNAM shutdown

Thanks for your answer, it help me to find my way ;) I saw "-x" option fot ntpd, but it's not the only things to apply.

First, I had to solve my timezone problem. 
-> Hwclock set on GMT int BIOS (UTC if you prefer) timezone --utc 
-> Europe/Paris in kickstart, or set ZONE="Europe/Paris" and UTC=true in 
-> /etc/sysconfig/clock
This two settings make my time boot kernel in the right place, kernel get time from hwclock and know that it has to apply my timezone over it.

Then, I add "-x" option in /etc/syscinfig/ntp to say ntpd to not make big step.

As a result, boot time before:
Oct 13 12:02:20 s64lmwbig3b ntpd[7996]: ntpd 4.2.2p1 at 1.1570-o Thu Nov 26 11:34:34 UTC 2009 (1) Oct 13 12:02:20 s64lmwbig3b ntpd[7997]: precision = 1.000 usec Oct 13 12:02:20 s64lmwbig3b ntpd[7997]: Listening on interface wildcard, 0.0.0.0#123 Disabled ...
Oct 13 12:02:20 s64lmwbig3b ntpd[7997]: Listening on interface bond0, 10.151.231.215#123 Enabled <== 2H TIME JUMP Oct 13 14:02:31 s64lmwbig3b openais[7701]: [TOTEM] The token was lost in the OPERATIONAL state.
Oct 13 14:02:31 s64lmwbig3b openais[7701]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes).
Oct 13 14:02:31 s64lmwbig3b openais[7701]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Oct 13 14:02:31 s64lmwbig3b openais[7701]: [TOTEM] entering GATHER state from 2.
=> CMAN crashed

Boot time now:
Oct 13 16:10:08 s64lmwbig3b clvmd: Cluster LVM daemon started - connected to CMAN ...
Oct 13 16:10:27 s64lmwbig3b ntpdate[7971]: step time server 10.151.156.87 offset 1.306150 sec <== 1S TIME JUMP Oct 13 16:10:29 s64lmwbig3b ntpd[7975]: ntpd 4.2.2p1 at 1.1570-o Thu Nov 26 11:34:34 UTC 2009 (1) Oct 13 16:10:29 s64lmwbig3b ntpd[7976]: precision = 1.000 usec ...
Oct 13 16:10:40 s64lmwbig3b modclusterd: startup succeeded => CMAN up and running

I looked for the FAQ you talked about but nothing, if you can post it when you have time ;)

Jean-Daniel BONNETOT

-----Message d'origine-----
De : linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] De la part de Alvaro Jose Fernandez Envoyé : mercredi 12 octobre 2011 17:52 À : linux clustering Objet : Re: [Linux-cluster] NTP sync cause CNAM shutdown

Jean,

I too suffered the same issue, opened a case with support, etc. The best option running ntpd and RHCS are:

-First, start the cman, rgmanager, etc. (I mean, all the RHCS daemons) always after ntpd startup. In RHEL5 at least the default is the other way around. 

You can do that if you disable all RHCS daemons (via chkconfig off) from automatic startup, and then, starting them explicitly via your rc.local init script, as the last init sequence action (ie, after the network, basic systems, and most importantly after ntpd initially adjusted the clock, via it's "ntpdate" call.

Be aware that if you do the above, you must explicitly (manually) stop them if you need to shutdown the cluster or the nodes, as with this hack, the init scripts of cman, rgmanager, etc , won't run for the "kill"/shutdown sequence.

-Start the ntpd using the "slew" mode ( -x startup flag), in the configuration file. Running it in slew mode makes ntpd adjust the time over a large time span, enough to assure that CMAN internal timings won't get messed.

Using that hack was Ok for me, no more node evictions or unexpected problems since.

There is a FAQ and best practices document in Redhat Network for NTPD and RHCS, updated few months ago as I recall. Just search for it in the Redhat Network website (sorry, I don't have the link for the DOC at the moment)

regards,


Álvaro Fernández
 Departamento de Sistemas_

-------
Hi,

I post previous email asking what was wrong in my two nodes cluster.conf. I think I found it and have some question.

The problem was two nodes boot, join then cman shutdown with :
Oct 12 15:55:30 s64lmwbig3c openais[7672]: [MAIN ] Killing node s64lmwbig3b because it has rejoined the cluster with existing state Oct 12 15:55:30 s64lmwbig3c openais[7672]: [CMAN ] cman killed by node 1 because we rejoined the cluster without a full restart

Few seconds before, ntpd sync and jump forward with 7200 sec (2 hours, my timzone is GMT + 2).

My questions are:
Which date do you set up in your bios (GMT, your time zone)?
Do you use ntpd ? all documentations say to use it.
What are best practices about ntp and RHCS?

Jean-Daniel BONNETOT

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------
Ce message et toutes les pièces jointes sont établis à l'intention exclusive de ses destinataires et sont confidentiels. L'intégrité de ce message n'étant pas assurée sur Internet, la SNCF ne peut être tenue responsable des altérations qui pourraient se produire sur son contenu. Toute publication, utilisation, reproduction, ou diffusion, même partielle, non autorisée préalablement par la SNCF, est strictement interdite. Si vous n'êtes pas le destinataire de ce message, merci d'en avertir immédiatement l'expéditeur et de le détruire.
-------
This message and any attachments are intended solely for the addressees and are confidential. SNCF may not be held responsible for their contents whose accuracy and completeness cannot be guaranteed over the Internet. Unauthorized use, disclosure, distribution, copying, or any part thereof is strictly prohibited. If you are not the intended recipient of this message, please notify the sender immediately and delete it. 


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list