[Linux-cluster] Freeze with cluster-2.03.11

Wendy Cheng s.wendy.cheng at gmail.com
Sat Mar 28 16:07:18 UTC 2009


Kadlecsik Jozsef wrote:
>> I don't see a strong evidence of deadlock (but it could) from the thread
>> backtraces However, assuming the cluster worked before, you could have
>> overloaded the e1000 driver in this case. There are suspicious page faults
>> but memory is very "ok". So one possibility is that GFS had generated too
>> many sync requests that flooded the e1000. As the result, the cluster heart
>> beat missed its interval.
>>     
>
> It's a possibility. But it assumes also that the node freezes >because< 
> it was fenced off. So far nothing indicates that.
>   

Re-read your console log. There are many foot-prints of spin_lock - 
that's worrisome. Hit a couple of "sysrq-w"  next time when you have 
hangs, other than sysrq-t. This should give traces of the threads that 
are actively on CPUs at that time. Also check your kernel change log (to 
see whether GFS has any new patch that touches spin lock that doesn't in 
previous release).

BTW, I do have opinions on other parts of your postings but don't have 
time to express them now. Maybe I'll say something when I finish my 
current chores :) ... Need to rush out now. Good luck on your debugging !

-- Wendy




More information about the Linux-cluster mailing list