[Linux-cluster] Things that i don't understand about cluster suite

Greg Forte gforte at leopard.us.udel.edu
Thu Sep 21 12:11:53 UTC 2006

> Lon, I use original's postfix script and returns this if postfix is up: 
> "master (pid 957) is running..." when postfix isn't up, script returns: 
> "master is stopped". Do I need to change this message to "0" for status 
> check works ok??

That's just a message that's printed.  return status is the value given 
in a statement of the form 'return X', or 0 if no such statement is 
explicitly reached.  All executables return a status value to the shell, 
where 0 is taken to mean "OK", and non-zero means "something bad happened".

The postfix script appears to return the correct values in each case.
My guess would be that it's cluster configuration problem, but I didn't 
see anything about postfix in the conf that you pasted ...

> Then, I can't startup only one node, when both are stopped, right??

No, you definitely can do this, if the cluster is configured correctly. 
  The problem may be in your fencing method - the first thing the booted 
node will do when cman starts is to try to contact the other node.  When 
it times out, it'll try to fence the other node and won't continue until 
it does. If the fence process fails, it'll hang there, which I'm 
guessing is what you're seeing.  So the problem is most likely that 
fencing is failing, either due to misconfiguration or because the other 
node is powered off and so its iLo agent isn't responding.  Since iLo is 
supposed to be able to power-up a switched-off server, my guess is 
there's a problem with your fencing configuration - did you fix it so 
that you have a separate fencedevice entry for each node?


