HZ value changed from 250 to 1000 in the latest updated kernel

Mon Nov 20 21:11:58 UTC 2006

I have dealt with this issue before.  I think that the effect is the
result of rounding errors in the timing code in the kernel.  In any
case, the effective frequency of the clock changes slightly when the
Hz. is changed.  When starting ntpd, it uses the "remembered" effective
frequency of the clock from the previous shutdown in the "drift file".
(Actually, this remembers the frequency error from the "nominal" frequency.)
In any case, when changing to a kernel with a different Hz., I had
the best result when I deleted the drift file and let ntpd create
a new one.  Otherwise, it appears, that it takes a while for ntpd
to convince itself that the frequency of the clock has made a step
change.  (Of course, if you just let it run a while, it should
eventually convince itself of the new frequency; so if you're still
having these problems after a while, then it is something different).

If you want to convince yourself of the issue, delete (or rename)
the drift file, and run the 250 Hz. kernel a while; then record
the contents of the drift file.  Repeat with the 1000 Hz. kernel.
I for one, would be interested in the result, since it has been
a significant time since I ran these tests, and the kernel clock
code has changed significantly since then.

BTW.  The most significant problem with a 1000 Hz kernel the last
time I ran it several years ago shows up
on slow machines.  It turns out that changing the frequency of the
kernel clock timer doesn't make code in drivers execute any faster;
rather that is a function of the CPU speed.  Unfortunately, this
increases the amount of time that the drivers will lock out interrupts.
Unfortunately, since clock interrupts are a simple toggle, if
two interrupts occur while the driver has the interrupts locked
out, only one interrupt will actually be handled.  Higher clock rates
increase the probability that this will happen.  In the "old days"
this showed up as a "falling behind" by one clock interrupt interval
whenever it happened.  I seem to recall that some code has since
been added to the clock interrupt routine to try to detect lost
interrupts and add multiple clock intervals on a single interrupt.
If it is a "slow processor" (or a dynamically throttled processor
running in slow mode) that is showing the clock problems, you may
be "debugging" the lost interrupt code.

John DeDourek, University of New Brunswick

Dave Jones wrote:

> On Mon, Nov 20, 2006 at 04:31:42PM +0100, Ralf Corsepius wrote:
>  > On Sun, 2006-11-19 at 21:03 -0600, Callum Lerwick wrote:
>  > > On Tue, 2006-11-14 at 18:44 +0300, Dmitry Butskoy wrote:
>  > > > The latest updated kernel has another HZ value (1000 instead of 250), 
>  > > > according to:
>  > > > 
>  > > > > * Thu Nov 9 2006 Dave Jones
>  > > > > - Change HZ to 1000 for increased accuracy.
>  > > > > (Except in Xen, where it stays at 250 for now).
>  > > 
>  > > Woohoo, Rosegarden (which I maintain in Extras) doesn't bitch anymore!
>  > 
>  > Could it be this change also had an impact on ntpd?
>  > 
>  > At least, since this change I am facing severe problems with my ntp
>  > setup (ntp clients are drifting away and have problems to sync).
>  > 
>  > Or is this just a random coincidence?
> 
> A coincidence I hope.  I'm not sure how increased timing resolution could
> cause the drifting effects you've observed.  I've also not noticed
> any other similar reports (yet?).
> 
> 		Dave
>