NTPD and FC3

Mon Nov 15 14:40:31 UTC 2004

John DeDourek wrote:
> 
> 
> Robert wrote:
> ....
> 
>> +time.uswo.net   198.82.1.201     3 u   22   64   77   60.548  -1778.9 
>> 316.669
>> *0x50a13f43.boan 192.36.134.25    2 u   26   64   77  140.391  -1947.8 
>> 394.214
>> +blade.avnf.com  212.82.32.15     2 u   25   64   77   55.640  -1768.1 
>> 297.106
>> LOCAL(0)        LOCAL(0)        10 l   26   64   77    0.000    
>> 0.000   0.002
>> ntpq>
>>
>>
>> [root at clem ~]# uname -a
>> Linux clem 2.6.9-1.667 #1 Tue Nov 2 14:41:31 EST 2004 i586 i586 i386 
>> GNU/Linux
>> [root at clem ~]# ntpq
>> ntpq> pe
>>     remote           refid      st t when poll reach   delay   offset  
>> jitter
>> ============================================================================== 
>>
>> +zoiedog.com     131.107.1.10     2 u  391 1024  377  114.447  
>> -24.679   1.572
>> *171.Red-80-36-1 130.206.3.166    2 u  425 1024  377  211.861  
>> -33.906   1.004
>> +ns1.pulsation.f 194.2.0.28       3 u  399 1024  377  145.030  
>> -36.732   3.050
>> LOCAL(0)        LOCAL(0)        10 l   48   64  377    0.000    
>> 0.000   0.002
>> ntpq>                         
> 
> 
> First puzzling thing is the numbers under "reach".  This represents
> an 8 bit field (using base 8 notation) showing whether or not a
> response was received to the last 8 polls.  Now if you left for
> coffee, and then came back, we have
>    clem: 377 = 11111111  all of last polls responded to
>    mavis: 077 = 00111111 only most recent 6 polls received
>                           an answer (or only 6 polls were issued)
> With the shorter polling interval on mavis, it should have attempted
> at least as many polls as clem, unless something is drastically
> wrong (interrupts stuck, CPU maxed out, etc.)
> 
> It would be instructive on mavis to run nptq and issue the "assoc"
> command.  This shows servers in the same order as the "peer" command.
> then use "pstat associd" replacing "associd" by the peculiar number
> under "assocID" in the output of assoc.
> 
> Most instructive would be the last three lines, which include the
> delay and offset for the most recent 8 polls for the particular
> server being queried.
> 
> My first guesses would be:
> -- connectivity problem to the servers causing either widely varying
>    delay, or an asymetric delay (consistently different delay of
>    query and response)
> -- badly implemented clock, e.g. on an interrupt that is getting
>    disabled sufficiently long that interrupts are missed, or a
>    processor that is being "speed adjusted" based on load, or some
>    such.
> 

Thanks for your reply!

The numbers in the "reach" column can be explained.  On mavis, ntpd 
apparently throws up its hands, showing jitter of 4000 for all servers 
for one 64 second cycle, then starts over with reach=1...3...7...17, etc
I changed the conditions this morning but did not spoil the bug. On 
mavis, I removed ntp-4.2.0.a.20040617-4 than came with FC3 and installed 
ntp-4.1.2-5.i386.rpm from my FC1 CDs. Then I copied the ntp.conf from my 
FC1 backup to the running system. The problem still exists.

This pstat was just taken:

ntpq> pstat 32409
status=9484 reach, conf, sel_candidat, 8 events, event_reach,
srcadr=now.cis.okstate.edu, srcport=123, dstadr=192.168.1.8,
dstport=123, leap=00, stratum=1, precision=-18, rootdelay=0.000,
rootdispersion=0.427, refid=PSC, reach=177, unreach=0, hmode=3, pmode=4,
hpoll=6, ppoll=6, flash=00 ok, keyid=0, offset=-633.919, delay=22.937,
dispersion=63.426, jitter=142.579,
reftime=c54337ef.1e45a1ca  Mon, Nov 15 2004  8:13:03.118,
org=c54337fc.97024f65  Mon, Nov 15 2004  8:13:16.589,
rec=c54337fd.3c3a647f  Mon, Nov 15 2004  8:13:17.235,
xmt=c54337fd.364d6e47  Mon, Nov 15 2004  8:13:17.212,
filtdelay=    22.94   22.98   22.11   22.50   22.85   22.81   23.81    0.00,
filtoffset= -633.92 -491.34 -390.64 -306.91 -195.18  -65.21  -16.16    0.00,
filtdisp=      0.01    1.00    1.96    2.92    3.88    4.86    5.82 16000.0
ntpq>

This is what the silly thing has done since I installed the older 
version of ntp this morning, which pretty much tells me that the second 
of your first guesses is homing in on the problem and that I should quit 
beating up on ntpd and start looking for another kernel.

[root at mavis ~]# grep ntpd /var/log/messages | tail -24
Nov 15 06:25:28 mavis ntpd[28082]: ntpd exiting on signal 15
Nov 15 06:25:29 mavis ntpd: ntpd shutdown succeeded
Nov 15 06:28:04 mavis ntpd: ntpd shutdown failed
Nov 15 06:31:40 mavis ntpd[18971]: ntpd 4.1.2 at 1.892 Wed Oct 29 06:06:59 
EST 2003 (1)
Nov 15 06:31:41 mavis ntpd: ntpd startup succeeded
Nov 15 06:31:41 mavis ntpd[18971]: precision = 9 usec
Nov 15 06:31:41 mavis ntpd[18971]: kernel time discipline status 0040
Nov 15 06:31:41 mavis ntpd[18971]: frequency initialized 0.000 from 
/var/lib/ntp/drift
Nov 15 06:35:55 mavis ntpd[18971]: time reset -4.487942 s
Nov 15 06:35:55 mavis ntpd[18971]: kernel time discipline status change 41
Nov 15 06:35:55 mavis ntpd[18971]: synchronisation lost
Nov 15 06:51:06 mavis ntpd[18971]: time reset -2.007400 s
Nov 15 06:51:06 mavis ntpd[18971]: kernel time discipline status change 1
Nov 15 06:51:06 mavis ntpd[18971]: synchronisation lost
Nov 15 07:06:22 mavis ntpd[18971]: time reset -1.899484 s
Nov 15 07:06:22 mavis ntpd[18971]: synchronisation lost
Nov 15 07:21:23 mavis ntpd[18971]: time reset -1.683434 s
Nov 15 07:21:23 mavis ntpd[18971]: synchronisation lost
Nov 15 07:36:27 mavis ntpd[18971]: time reset -1.750547 s
Nov 15 07:36:27 mavis ntpd[18971]: synchronisation lost
Nov 15 07:51:36 mavis ntpd[18971]: time reset -1.632804 s
Nov 15 07:51:36 mavis ntpd[18971]: synchronisation lost
Nov 15 08:06:46 mavis ntpd[18971]: time reset -2.003586 s
Nov 15 08:06:46 mavis ntpd[18971]: synchronisation lost
[root at mavis ~]#