[Linux-cluster] problem with rejoining a node
Javi Polo
javipolo at datagrama.net
Thu Aug 11 10:03:35 UTC 2005
On Aug/11/2005, David Teigland wrote:
> > I've made a script that, prior to starting any of the cluster
> > infrastructure, enables his SAN port.
> I'm not sure if this is related to the rest.
I did it because the san port never turned on, and I thought it could be
part of the problem, but I see is not ...
> > gfstest1:~# cman_tool services
> > Service Name GID LID State Code
> > Fence Domain: "default" 0 2 join S-1,80,3
> > []
> it's waiting to join the fence domain, the others won't let him yet...
> > Service Name GID LID State Code
> > Fence Domain: "default" 1 2 recover 2 -
> > [2 3]
> These two appear to be trying to fence gfstest1, but the fencing operation
> hasn't completed. They won't let anyone join the domain until they
> finish. You could check /var/log/messages on 2&3 for any fencing messages
> or errors.
I tried fence_tool with -D on those, and found the problem ....
dont know why, but sometimes the switch sets the port status to "FAULTY"
instead of "OFFLINE", and so the fence_IBMswitch failed and so the node
wasnt completely fenced ....
Now it seems to be working fine! :)))
thx a lot
Now there's another doubt I have ... when the system rejoins the fence,
does the fence_XXX script runs to enable the port switch, or should I do
it by other means (ie enabling it on boot and so) :?
I though about making a boot script that runs cman_tool services, checks if
the host is in the fence, and if so, enable the SAN port and then rescan
for SCSI devices ... but I dont know if that's "the right way" to do it,
or at least a polite one
:?
--
Javier Polo @ Datagrama
902 136 126
More information about the Linux-cluster
mailing list