[Linux-cluster] NTP time steps causes cluster reconfiguration

Fri Jul 16 20:36:32 UTC 2010

On 07/16/2010 06:18 AM, Martin Waite wrote:
> Hi,
>
> During testing, I noticed that a time step caused by ntpd caused the
> cluster to drop into GATHER state:
>
> Jun 16 12:13:16 cp1edidbm001 ntpd[30917]: time reset -16.332117 s
>
> Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering GATHER
> state from 12.
>
> Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Creating commit
> token because I am the rep.
>
> Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Saving state aru 9e
> high seq received 9e
>
> Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Storing new
> sequence id for ring 328
>
> Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering COMMIT state.
>
> Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering RECOVERY
> state.
>
> ...
>
> This is easily repeatable through setting the clock forwards by 20
> seconds using /bin/date. This probably causes comms timeouts to expire
> prematurely, and almost every time causes the cluster to reconfigure -
> luckily without affecting running services.
>
> Stepping the clock backwards also causes a similar disruption, but there
> is a long lag between changing the time and the cluster reconfiguring:
> perhaps this extends a timeout or sleep on the affected node, causing
> genuine timeouts on the other nodes.
>
> All I am looking for is some reassurance that clock changes are not
> going to crash the cluster. Is anyone able to confirm this please ?
>
> regards,
>
> Martin
>

Martin,

NTP integration is not available with openais.  That feature has been 
introduced into corosync.

Regards
-steve

>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster