[K12OSN] Random system crashes: Linux gurus, what would you do?

Carl Keil carl at snarlnet.com
Wed Dec 21 04:51:11 UTC 2005


I'm sorry if this is considered spamming the list.  (3 emails in rapid 
succession)  I just have been slammed with weird problems lately.  I 
promise I've been googling and trying to solve them on my own, but I'm 
at my wit's end. 

My k12ltsp 4.2.1 server has been frozen every night when I wake up in 
the morning.  The message, when I can read the screen, usually says:

kernel panic - not syncing:  
fs/dcache.c:413:spin_lock(fs/dcache.c:8837820) already locked by 
fs/dcache.c/158(not tainted)
[<c0118f47>]error_code+0x4f/0x54
journal_get_write_access
do_page_fault
ll_rw_block
etc. etc. etc.

I copied more down, but I'm not sure it helps to troubleshoot this.  
Googling on parts of this error message quickly got me into threads 
about the overall instability of the linux kernel lately, rants about 
Linus' kernel updating philosophy, and some very technical bug reports 
that I couldn't begin to understand, much less apply to solving my 
problem.  I run fsck as I reboot each morning and sometimes it has to 
fix some inodes and other times it doesn't. 

I should say that I tried rolling back to the previous kernel on boot 
(from the boot menu) and it still crashed.  The crashes happen at random 
times in the early morning.  The earliest seems to be about 2AM and the 
latest was at around 9AM.  I can tell when the server crashes because of 
logs of an every 5 minute cron job that moodle runs.  I run my backuppc, 
webalizer and logwatch processes between 3-5AM daily.  The hardware is a 
"custom" PIII 500 box that is my retired workstation, running 640 megs 
of RAM from various sources, a 5 year old boot drive, etc.  I would say 
that the hardware is very suspect, but it doesn't, in my mind, account 
for the server crashing every single night and never during the day, 
when it gets more use.  I should also say that I don't use this computer 
for serving thin clients, it runs LAMP with a few moodle instances, 
drupal, wordpress and about 10 different web domains.  I believe that my 
computer was broken into via webmin right before all this started 
happening(there was an unauthorized login as root from a church in 
town), but I couldn't find any signs of damage, other than my computer 
crashed the next day.  I have also been having what I call dictionary 
attacks almost every night, repeated login attempts via various ports 
and user names via ssh, all from the same IP.  But the only validated 
logins via ssh can all be accounted for as coming from me or a trusted 
user. 

So, how do I sort this out?  Like I said, I've googled.  I've tried 
different kernels.  I've tried warming up the very cold room that the 
server is in.  I've tried restricting services as much as I can.  But 
I'm wondering what real linux pros would do to get to  the bottom of 
this.  The server crashes every single night, there has to be a way to 
figure out what the cause is.

Thank you very much for any help you can give me,

ck




More information about the K12OSN mailing list