[Linux-cluster] problem with rejoining a node

Javi Polo javipolo at datagrama.net
Thu Aug 11 10:03:35 UTC 2005


On Aug/11/2005, David Teigland wrote:

> > I've made a script that, prior to starting any of the cluster
> > infrastructure, enables his SAN port.
> I'm not sure if this is related to the rest.

I did it because the san port never turned on, and I thought it could be
part of the problem, but I see is not ...

> > gfstest1:~# cman_tool services
> > Service          Name                              GID LID State     Code
> > Fence Domain:    "default"                           0   2 join      S-1,80,3
> > []
> it's waiting to join the fence domain, the others won't let him yet...

> > Service          Name                              GID LID State     Code
> > Fence Domain:    "default"                           1   2 recover 2 -
> > [2 3]
> These two appear to be trying to fence gfstest1, but the fencing operation
> hasn't completed.  They won't let anyone join the domain until they
> finish.  You could check /var/log/messages on 2&3 for any fencing messages
> or errors.

I tried fence_tool with -D on those, and found the problem ....
dont know why, but sometimes the switch sets the port status to "FAULTY"
instead of "OFFLINE", and so the fence_IBMswitch failed and so the node
wasnt completely fenced ....

Now it seems to be working fine! :)))
thx a lot

Now there's another doubt I have ... when the system rejoins the fence,
does the fence_XXX script runs to enable the port switch, or should I do
it by other means (ie enabling it on boot and so) :?

I though about making a boot script that runs cman_tool services, checks if
the host is in the fence, and if so, enable the SAN port and then rescan
for SCSI devices ... but I dont know if that's "the right way" to do it,
or at least a polite one
:?

-- 
Javier Polo @ Datagrama
902 136 126




More information about the Linux-cluster mailing list