[K12OSN] Random system crashes: Linux gurus, what would you do?
Carl Keil
carl at snarlnet.com
Wed Dec 21 04:51:11 UTC 2005
I'm sorry if this is considered spamming the list. (3 emails in rapid
succession) I just have been slammed with weird problems lately. I
promise I've been googling and trying to solve them on my own, but I'm
at my wit's end.
My k12ltsp 4.2.1 server has been frozen every night when I wake up in
the morning. The message, when I can read the screen, usually says:
kernel panic - not syncing:
fs/dcache.c:413:spin_lock(fs/dcache.c:8837820) already locked by
fs/dcache.c/158(not tainted)
[<c0118f47>]error_code+0x4f/0x54
journal_get_write_access
do_page_fault
ll_rw_block
etc. etc. etc.
I copied more down, but I'm not sure it helps to troubleshoot this.
Googling on parts of this error message quickly got me into threads
about the overall instability of the linux kernel lately, rants about
Linus' kernel updating philosophy, and some very technical bug reports
that I couldn't begin to understand, much less apply to solving my
problem. I run fsck as I reboot each morning and sometimes it has to
fix some inodes and other times it doesn't.
I should say that I tried rolling back to the previous kernel on boot
(from the boot menu) and it still crashed. The crashes happen at random
times in the early morning. The earliest seems to be about 2AM and the
latest was at around 9AM. I can tell when the server crashes because of
logs of an every 5 minute cron job that moodle runs. I run my backuppc,
webalizer and logwatch processes between 3-5AM daily. The hardware is a
"custom" PIII 500 box that is my retired workstation, running 640 megs
of RAM from various sources, a 5 year old boot drive, etc. I would say
that the hardware is very suspect, but it doesn't, in my mind, account
for the server crashing every single night and never during the day,
when it gets more use. I should also say that I don't use this computer
for serving thin clients, it runs LAMP with a few moodle instances,
drupal, wordpress and about 10 different web domains. I believe that my
computer was broken into via webmin right before all this started
happening(there was an unauthorized login as root from a church in
town), but I couldn't find any signs of damage, other than my computer
crashed the next day. I have also been having what I call dictionary
attacks almost every night, repeated login attempts via various ports
and user names via ssh, all from the same IP. But the only validated
logins via ssh can all be accounted for as coming from me or a trusted
user.
So, how do I sort this out? Like I said, I've googled. I've tried
different kernels. I've tried warming up the very cold room that the
server is in. I've tried restricting services as much as I can. But
I'm wondering what real linux pros would do to get to the bottom of
this. The server crashes every single night, there has to be a way to
figure out what the cause is.
Thank you very much for any help you can give me,
ck
More information about the K12OSN
mailing list