[K12OSN] Random system crashes: Linux gurus, what would you do?

Eric Harrison eharrison at mail.mesd.k12.or.us
Wed Dec 21 05:21:53 UTC 2005


On Tue, 20 Dec 2005, Carl Keil wrote:

> I'm sorry if this is considered spamming the list.  (3 emails in rapid 
> succession)  I just have been slammed with weird problems lately.  I promise 
> I've been googling and trying to solve them on my own, but I'm at my wit's 
> end. 
> My k12ltsp 4.2.1 server has been frozen every night when I wake up in the 
> morning.  The message, when I can read the screen, usually says:
>
> kernel panic - not syncing:  fs/dcache.c:413:spin_lock(fs/dcache.c:8837820) 
> already locked by fs/dcache.c/158(not tainted)
> [<c0118f47>]error_code+0x4f/0x54
> journal_get_write_access
> do_page_fault
> ll_rw_block
> etc. etc. etc.
>
> I copied more down, but I'm not sure it helps to troubleshoot this.  Googling 
> on parts of this error message quickly got me into threads about the overall 
> instability of the linux kernel lately, rants about Linus' kernel updating 
> philosophy, and some very technical bug reports that I couldn't begin to 
> understand, much less apply to solving my problem.  I run fsck as I reboot 
> each morning and sometimes it has to fix some inodes and other times it 
> doesn't. 
> I should say that I tried rolling back to the previous kernel on boot (from 
> the boot menu) and it still crashed.  The crashes happen at random times in 
> the early morning.  The earliest seems to be about 2AM and the latest was at 
> around 9AM.  I can tell when the server crashes because of logs of an every 5 
> minute cron job that moodle runs.  I run my backuppc, webalizer and logwatch 
> processes between 3-5AM daily.  The hardware is a "custom" PIII 500 box that 
> is my retired workstation, running 640 megs of RAM from various sources, a 5 
> year old boot drive, etc.  I would say that the hardware is very suspect, but 
> it doesn't, in my mind, account for the server crashing every single night and 
> never during the day, when it gets more use.

One thing that happens late at night are the jobs in /etc/cron.daily/
Some of these jobs can chew up a lot of memory.

You might want to run memtest on this box to see if you have a bad
stick of ram.

>I should also say that I don't 
> use this computer for serving thin clients, it runs LAMP with a few moodle 
> instances, drupal, wordpress and about 10 different web domains.  I believe 
> that my computer was broken into via webmin right before all this started 
> happening(there was an unauthorized login as root from a church in town), but 
> I couldn't find any signs of damage, other than my computer crashed the next 
> day.

If you think your computer may have been broken into, it is best to do
a fresh re-install. If a root kit was installed, it would hide the signs
of damage and could very well cause stability problems.

> I have also been having what I call dictionary attacks almost every 
> night, repeated login attempts via various ports and user names via ssh, all 
> from the same IP.  But the only validated logins via ssh can all be accounted 
> for as coming from me or a trusted user.

If you keep getting attacked from a specific IP address, it would be a good
idea to firewall off that IP.

-Eric




More information about the K12OSN mailing list