hwclock can cause system lockup

Todd Denniston Todd.Denniston at ssa.crane.navy.mil
Thu Oct 16 23:50:24 UTC 2008


Mikkel L. Ellertson wrote, On 10/16/2008 05:23 PM:
>> On Thu, Oct 16, 2008 at 8:19 AM, Chris Mocock
>> <chris1.noreply at googlemail.com> wrote:
>> I've got an hourly cron job on some machines that runs "/sbin/hwclock --utc
>> --hctosys" to sync the hardware clock to the system clock. These machines
>> were recently upgraded from an old custom RH7.3 installation to a custom
>> spin of Fedora 9 and started to occassionally lock up - every day or three.
>> I noticed that they would always lock up at 1 minute past the hour so
>> tracked it down to this cron job.
>>
> What are you trying to do with this cron job? You are updating the
> system clock from the hardware clock, and not the other way around,
> as you say you are trying to do. The system does synchronize the
> hardware clock to the system clock on shutdown. 

Not if you are sane enough to disable that in the halt script.
(search this or the fedora-test list for ntp and me to see why I say this)

> Also, /etc/adjtime
> contains the information needed to  correct for hardware clock drift
> that happens while the system is shut down. The offset information
> is re-computed every time you set the hardware clock.

If you don't have an ntp source, using /etc/adjtime (with `hwclock --adjust` 
after appropriate disciplining), the quartz time of the hardware clock can be 
SIGNIFICANTLY better than system time, so if you pull the hardware clock into 
your system in a reasonable periodicity the system will have a better time for 
use.

It could even make a box that would be reasonable enough, not great but 
reasonable, to provide ntp local time for having the rest of the computers in 
a disconnected lab synced. [Been there Done that. On an RH7.3 machine of all 
things :]

> 
>> I decided to test it on my standard Fedora 9 installation by running a
>> simple script that runs the hwclock command every 3 seconds. Sure enough,
>> the system ran for just over an hour before it locked up. By locked up I
>> mean no activity, caps lock key doesn't work, can't ping, can't ssh in, but
>> power is still on.
>>
> I never looked into how the hardware clocked is accessed. I wounder
> if they are using the BIOS to access the clock, and the BIOS code
> has a problem. From what I have read, it is also not a good idea to
> change the system clock to rapidly, especially if you are adjusting
> it backwards - but I do not know if that would cause lockups.
> 

IIRC backwards could LOOK like it did, but it should only last (in the case of 
running the command every 3 seconds) 2 to 4 seconds.

I _think_ that by default hwclock uses /dev/rtc which is a kernel abstraction 
to the real clock... something in that abstraction may be breaking down. 
There may even be an OOPS or Panic, but if the machine is running in level 5 
at the time you will not see it.

I would suggest two things:
1) see if punching the calls up to .5Hz or 1Hz instead of .3Hz gets it
2) booting in runlevel 3 and running the script again and see if it gets you 
the error in a few hours, hopefully this time with an OOPS or Panic message.

If any of these ends up being again 'just over an hour before it locked up' it 
might be some interaction with another cron job... did you disable the hourly 
cron job first?  if not I would set your 3 second script and a 2 minute cron 
and see if it may be a '2 accesses at the same time' problem.

race conditions in time, oh what fun.
-- 
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter




More information about the fedora-list mailing list