System lockup with SMP Kernel.

Bob Jones bt4rfj at earthlink.net
Tue Feb 10 06:13:46 UTC 2004


On Wed, 04 Feb 2004 08:32:46 +0000, WipeOut 
<wipe_out at users.sourceforge.net> wrote:

> Paul Furness wrote:
>
>> Interestingly, I had the system hang on me 5 times today. I then stopped
>> cron and anacron, and just to make sure I individually ran all the jobs
>> that anacron might have tried to run (/etc/cron.*/*) without any
>> trouble.
>>
>> cron and anacron are still stopped, and I haven't hung since. I'll try
>> leaving it on overnight tonight and see what happens.
>>
>> I did recompile the kernel but haven't tried booting it yet, because the
>> system hasn't hung for a while.
>>
>> Oh, incidentally (in case this helps) I have Athlons, not Pentiums, so
>> this doesn't look to be CPU specific.
>>
>> Watch this space for further updates... :)
>>
>> P.
>>
> I have run a P4 HT system all night on the SMP kernel and it didn't 
> crash, the systems was setup as a minimum install and then Apache, PHP 
> and MySQL were added..
>
> So it looks like the crashing SMP problem is caused by something that is 
> installed when a workstation or desktop install is done, and that is 
> possibly run or triggered by Cron..
>
> The only difference I can see between the two cron.daily directorys is 
> that the workstation install has a "tetex.cron" script in cron.daily but 
> I dont think that would be the cause of the problem..
>
> The other thing that could be freaking it out is when prelink runs, 
> maybe when prelink is doing whatever it does it is hanging a workstation 
> but not a minimum installed system..
>


I'm willing to bet a part of your particular hangs - the one sometimes 
caused by cron in smp machines at night - is caused by the symbolically 
linked file 00-logwatch, or rather, how the shells are implemented in smp 
computers for that particular package (logwatch-4.3.2-2.1) cron job. I 
didn't look to see which "group" of packages it comes with, probably 
Developer or System Tools. It wouldn't be in a minimal install.

The quick fix is to just "rm 00-logwatch" in /etc/cron.daily, or remove 
the whole package, unless you are using logwatch to track special log 
reports, etc. You can always initiallize logwatch files by executing 
"/etc/log.d/scripts/logwatch.pl" manually. The alternative is to remove 
OO-logwatch and to replace it by the following script, calling it (the 
script) the same name. The "OO-" is in there just to make sure logwatch is 
initiallized first - before anything else is executed in the cron daily.

#/bin/bash
/etc/log.d/scripts/logwatch.pl
echo "logwatch done" >> /var/log/cron

You can leave off the last line if desired. I just put it in to be able to 
see in the cron log if the script was executed - not necessarily 
logwatch.pl. .

Anacron, a #/bin/sh script, doesn't have any problem running the symbolic 
link pointing to a pearl script running in a bash shell. Cron, a 
#/bin/bash script, apparently does. I think the chain of scripts/links 
gets tangled up in which processor/memory address to use. I'll leave that 
up to the "wizards" of smp. In any event, since I did that, I haven't had 
any hangs/lockups whatsoever. One caveat to all of this, is that I'm 
running FC1 - testing, fully updated, with kernel-2.4.22-1.2166.nptlsmp 
and did not test on 2149.  2166 is really solid. Running big transfers on 
NFS and no problems.
Just for the record, you don't have to wait for cron to run every night. 
vi your crontab file (save a copy of the original first) and set the 
minutes and hours to about three minutes ahead of the "computer" present 
time and it will execute. cron reads the crontab file every minute for 
executable times.  This was also the way that I found out that cron 
doesn't check the "timestamps" of any of the cron jobs 
(/var/spool/anacron) before running. In will run even though anacron just 
ran 5 minutes before - or it could attempt to run at the same time. 
anacron runs 65 minutes after booting.

All things considered, always look for problems with symbolic links in 
cron jobs. That's a hang over from Unix days when it was necessary to have 
full path names. It's so easy to write a two line script that symbloic 
links ought to be banned in these cases - cron jobs. Unfortunately, if 
that package is upgraded it will probably put the link back in.

You can also add a line in some of the indiviual scripts such as

	echo "Cron Daily 0anacron finished" >> /var/log/cron

just to see if it executed.

If you installed everything you may have a whole slew of these problems,

HTH

Bob Jones






More information about the fedora-list mailing list