Re: [K12OSN] Random system crashes: Linux gurus, what would you do?

On Tue, 20 Dec 2005, Carl Keil wrote:

I'm sorry if this is considered spamming the list. (3 emails in rapid succession) I just have been slammed with weird problems lately. I promise I've been googling and trying to solve them on my own, but I'm at my wit's end. My k12ltsp 4.2.1 server has been frozen every night when I wake up in the morning. The message, when I can read the screen, usually says:

kernel panic - not syncing: fs/dcache.c:413:spin_lock(fs/dcache.c:8837820) already locked by fs/dcache.c/158(not tainted)
etc. etc. etc.

I copied more down, but I'm not sure it helps to troubleshoot this. Googling on parts of this error message quickly got me into threads about the overall instability of the linux kernel lately, rants about Linus' kernel updating philosophy, and some very technical bug reports that I couldn't begin to understand, much less apply to solving my problem. I run fsck as I reboot each morning and sometimes it has to fix some inodes and other times it doesn't. I should say that I tried rolling back to the previous kernel on boot (from the boot menu) and it still crashed. The crashes happen at random times in the early morning. The earliest seems to be about 2AM and the latest was at around 9AM. I can tell when the server crashes because of logs of an every 5 minute cron job that moodle runs. I run my backuppc, webalizer and logwatch processes between 3-5AM daily. The hardware is a "custom" PIII 500 box that is my retired workstation, running 640 megs of RAM from various sources, a 5 year old boot drive, etc. I would say that the hardware is very suspect, but it doesn't, in my mind, account for the server crashing every single night and never during the day, when it gets more use.

One thing that happens late at night are the jobs in /etc/cron.daily/
Some of these jobs can chew up a lot of memory.

You might want to run memtest on this box to see if you have a bad
stick of ram.

I should also say that I don't use this computer for serving thin clients, it runs LAMP with a few moodle instances, drupal, wordpress and about 10 different web domains. I believe that my computer was broken into via webmin right before all this started happening(there was an unauthorized login as root from a church in town), but I couldn't find any signs of damage, other than my computer crashed the next day.

If you think your computer may have been broken into, it is best to do
a fresh re-install. If a root kit was installed, it would hide the signs
of damage and could very well cause stability problems.

I have also been having what I call dictionary attacks almost every night, repeated login attempts via various ports and user names via ssh, all from the same IP. But the only validated logins via ssh can all be accounted for as coming from me or a trusted user.

If you keep getting attacked from a specific IP address, it would be a good
idea to firewall off that IP.


