[Linux-cluster] Freeze with cluster-2.03.11
s.wendy.cheng at gmail.com
Sat Mar 28 16:07:18 UTC 2009
Kadlecsik Jozsef wrote:
>> I don't see a strong evidence of deadlock (but it could) from the thread
>> backtraces However, assuming the cluster worked before, you could have
>> overloaded the e1000 driver in this case. There are suspicious page faults
>> but memory is very "ok". So one possibility is that GFS had generated too
>> many sync requests that flooded the e1000. As the result, the cluster heart
>> beat missed its interval.
> It's a possibility. But it assumes also that the node freezes >because<
> it was fenced off. So far nothing indicates that.
Re-read your console log. There are many foot-prints of spin_lock -
that's worrisome. Hit a couple of "sysrq-w" next time when you have
hangs, other than sysrq-t. This should give traces of the threads that
are actively on CPUs at that time. Also check your kernel change log (to
see whether GFS has any new patch that touches spin lock that doesn't in
BTW, I do have opinions on other parts of your postings but don't have
time to express them now. Maybe I'll say something when I finish my
current chores :) ... Need to rush out now. Good luck on your debugging !
More information about the Linux-cluster