Something Wicked (with my hdwr clock) This Way Comes
Dean S. Messing
deanm at sharplabs.com
Fri Dec 28 23:23:24 UTC 2007
First, apologies for the long post!
On 12/27/07, Dean S. Messing <deanm at sharplabs.com> wrote:
> I'd be interested to see if anyone else on the list running
>
> F7_x86_64 and using the 2.6.23.8-34.fc7 kernel
Thanks you Aaron Konstam and Jon Stanley for confirming that `hwclock
--show' is working properly on your systems.
Somehow, my Real Time Clock got mangled. I'm not sure how. I had been
running some measurements with `adjtimex' and the output was going
totally bonkers. That's when I noticed my `hwclock' problem.
Upon shutting down, bringing up the BIOS, I saw that the RTC was
nearly 10 hours off of UTC.
So I reset it, rebooted, and now `hwclock --show' works correctly. I
don't understand why I (and you Jon) get the "/dev/rtc does not have
interrupt functions" but it _is_ getting the clock tick after all.
Still, there is some real strangeness going on which I explain
below.
Let me preface this by saying that I am not a n00bie at NTP, adjusting
kernel parameters related to time-keeping via `adjtimex', &c. I've
worked with this stuff for several years.
Jon, Aaron, I'd like to ask you both to run another test. (Thanks in
advance.) As root, what does
adjtimex --utc --compare=20 --interval=10 > /tmp/adj_file.txt
give you? (It will take about 3.5 minutes for this to run)
For the F7 machine on which I'm seeing the weird behaviour,
I get:
[root at medulla ~]# adjtimex --utc --compare=20 --interval=10
--- current --- -- suggested --
cmos time system-cmos error_ppm tick freq tick freq
1198880050 0.798082
1198880059 0.900194 10211.2 10000 10426576
1198880069 0.002171 -89802.3 10000 10426576 10899 4022989
1198880078 0.104114 10194.3 10000 10426576 9899 4246426
1198880087 0.206065 10195.1 10000 10426576 9899 4194864
1198880096 0.308030 10196.5 10000 10426576 9899 4102676
1198880105 0.409981 10195.1 10000 10426576 9899 4193301
1198880114 0.511942 10196.1 10000 10426576 9899 4129239
1198880123 0.612893 10095.1 10000 10426576 9900 4192826
1198880132 0.714856 10196.3 10000 10426576 9899 4116739
1198880141 0.816836 10198.0 10000 10426576 9899 4002676
1198880150 0.918770 10193.4 10000 10426576 9899 4305801
1198880160 0.020729 -89804.1 10000 10426576 10899 4141739
1198880169 0.122682 10195.3 10000 10426576 9899 4180801
1198880178 0.224641 10195.9 10000 10426576 9899 4141739
1198880187 0.326589 10194.8 10000 10426576 9899 4213614
1198880196 0.428548 10195.9 10000 10426576 9899 4141739
1198880205 0.530505 10195.7 10000 10426576 9899 4155801
1198880214 0.632459 10195.4 10000 10426576 9899 4174551
1198880223 0.734426 10196.7 10000 10426576 9899 4088614
The second column is in seconds and is the delta between the system
clock and the RTC. On a well tuned machine with an accurate RTC and a
system clock adjusted to run in sync with UTC (say, via NTP) this
number should be small (less than 1 second) and nearly constant.
The third column is the change in Parts per Million of the number in
Column 2 from one line to the next. It tells you how much the RTC is
deviating relative to the system clock. (100 ppm is 8.64 seconds per
day.)
As you can see, the system time appears to be advancing by almost
exactly 0.1 seconds every 10 seconds relative to the RTC (cmos clock).
Then, just as they are about to get out of phase by 1 second, something
causes either the system clock to "jump back" by ~1 second or the RTC
to jump forward.
It is certainly _not_ the system time doing the jumping. In fact:
[root at medulla ~]# ntpdate -q montpelier.ilan.caltech.edu
server 192.12.19.20, stratum 1, offset -0.001485, delay 0.05589
28 Dec 14:23:03 ntpdate[29445]: adjust time server 192.12.19.20 offset -0.001485 sec
which says that my system is about 1 ms ahead of
"montpelier.ilan.caltech.edu" which is a highly accurate Statum 1 time
server.
This delta is nearly constant so my system clock is not jumping
around.
That leaves the RTC doing the jumping. But having an RTC that is
runing nearly 10000 ppm slower than my system clock and which "jumps
ahead" every 10 seconds or so seems absurd.
Furthermore, when I halt the system, boot into the bios, and watch the
RTC time evolve I see no jumps and it is certainly not running 1/10
second slower than wall time.
That leaves two other possibilities: a bug in adjtimex or a bug in the
kernel. That's where I am right now.
Couple of other items:
If I increase the --interval, above, to 20 seconds, I get (in part)
[root at medulla ~]# adjtimex --utc --compare=20 --interval=20
--- current --- -- suggested --
cmos time system-cmos error_ppm tick freq tick freq
1198882092 0.313780
1198882111 0.415686 5095.3 10000 10426576
1198882130 0.516580 5044.7 10000 10426576 9951 942820
1198882149 0.617472 5044.6 10000 10426576 9951 950633
1198882168 0.718367 5044.8 10000 10426576 9951 939695
1198882187 0.820270 5095.1 10000 10426576 9950 4190951
1198882206 0.921165 5044.7 10000 10426576 9951 940476
1198882226 0.022069 -44954.8 10000 10426576 10451 910789
1198882245 0.122960 5044.6 10000 10426576 9951 952976
1198882264 0.224865 5095.2 10000 10426576 9950 4184701
The really odd thing is that the change in "system-cmos" from
line to line is still about 0.1 seconds! If the RTC were really
running slowly, it should be 0.2 seconds. In fact, no matter what
I make the interval, the 0.1 seconds remains the same.
So I vote for this being a bug in adjtimex (or the kernel).
Just as a reference here's the output of the same commond on my FC6 laptop
(whose sytem clock is also well sychronised with UTC):
[root at axon ~]# adjtimex --utc --compare=20 --interval=20
--- current --- -- suggested --
cmos time system-cmos error_ppm tick freq tick freq
1198882771 0.000563
1198882791 0.001026 23.1 10000 -580512
1198882811 0.000995 -1.5 10000 -580512 9999 6072345
1198882831 0.001093 4.9 10000 -580512 9999 5652814
1198882851 0.000830 -13.2 10000 -580512 10000 282026
1198882871 0.001060 11.5 10000 -580512 9999 5220782
1198882891 0.000850 -10.5 10000 -580512 10000 104683
1198882911 0.001099 12.4 10000 -580512 9999 5158282
1198882931 0.001014 -4.3 10000 -580512 9999 6252033
1198882951 0.001111 4.8 10000 -580512 9999 5656720
As you can see, the drift between the RTC and the system is quite low,
less than plus or minus 25ppm, and there's no wierd "jumping" going
on. (Well there _is_ the "11 minute kernel correction" but that's
another bedtime story.)
I have no idea what's happening on my F7 system to cause the strange
behaviour. Does anyone else?
Dean
More information about the fedora-list
mailing list