[Linux-cluster] Re: occasional cluster crashes

Fabrizio Lippolis fabrizio.lippolis at aurigainformatica.it
Fri Nov 17 16:10:43 UTC 2006


Hi Lon,

Lon Hohberger ha scritto:

> Do they crash (panic), or do they just become totally unresponsive?

One server suddenly becomes unresponsive, like frozen. The second server 
starts to miss heartbeats from the first. At the moment I have 
configured manual fencing so the service is not relocated (more 
explained below). If I remember good restarting the locked machine is 
not enough, I have to reboot the working one too.

> Have you tried getting a stack trace from the console using sysrq? (echo
> 1 > /proc/sys/kernel/sysrq;  then hit alt-sysrq-t from the console).

No I haven't, I will try this thing too.

> One thing that's peculiar is that - if they are locking up, they have to
> be locking up at about the same time -- otherwise, one would fence the
> other, and life would go on.

As I wrote only one gets locked. The fencing configuration is another 
problem to me and something I am aware of. I haven't understood very 
well how it works, looks like I need an external device which manages 
power. In this case which device and consequently fencing method is more 
suitable? I am rather confused about this argument.

Fabrizio




More information about the Linux-cluster mailing list