NTPD and FC3

Mon Nov 15 13:26:08 UTC 2004

Robert wrote:
...

> +time.uswo.net   198.82.1.201     3 u   22   64   77   60.548  -1778.9 
> 316.669
> *0x50a13f43.boan 192.36.134.25    2 u   26   64   77  140.391  -1947.8 
> 394.214
> +blade.avnf.com  212.82.32.15     2 u   25   64   77   55.640  -1768.1 
> 297.106
> LOCAL(0)        LOCAL(0)        10 l   26   64   77    0.000    0.000   
> 0.002
> ntpq>
> 
> 
> [root at clem ~]# uname -a
> Linux clem 2.6.9-1.667 #1 Tue Nov 2 14:41:31 EST 2004 i586 i586 i386 
> GNU/Linux
> [root at clem ~]# ntpq
> ntpq> pe
>     remote           refid      st t when poll reach   delay   offset  
> jitter
> ============================================================================== 
> 
> +zoiedog.com     131.107.1.10     2 u  391 1024  377  114.447  -24.679   
> 1.572
> *171.Red-80-36-1 130.206.3.166    2 u  425 1024  377  211.861  -33.906   
> 1.004
> +ns1.pulsation.f 194.2.0.28       3 u  399 1024  377  145.030  -36.732   
> 3.050
> LOCAL(0)        LOCAL(0)        10 l   48   64  377    0.000    0.000   
> 0.002
> ntpq>                         

First puzzling thing is the numbers under "reach".  This represents
an 8 bit field (using base 8 notation) showing whether or not a
response was received to the last 8 polls.  Now if you left for
coffee, and then came back, we have
    clem: 377 = 11111111  all of last polls responded to
    mavis: 077 = 00111111 only most recent 6 polls received
                           an answer (or only 6 polls were issued)
With the shorter polling interval on mavis, it should have attempted
at least as many polls as clem, unless something is drastically
wrong (interrupts stuck, CPU maxed out, etc.)

It would be instructive on mavis to run nptq and issue the "assoc"
command.  This shows servers in the same order as the "peer" command.
then use "pstat associd" replacing "associd" by the peculiar number
under "assocID" in the output of assoc.

Most instructive would be the last three lines, which include the
delay and offset for the most recent 8 polls for the particular
server being queried.

My first guesses would be:
-- connectivity problem to the servers causing either widely varying
    delay, or an asymetric delay (consistently different delay of
    query and response)
-- badly implemented clock, e.g. on an interrupt that is getting
    disabled sufficiently long that interrupts are missed, or a
    processor that is being "speed adjusted" based on load, or some
    such.