[Linux-cluster] problem with rejoining a node
Javi Polo
javipolo at datagrama.net
Mon Aug 8 14:12:38 UTC 2005
Hi there (again :P)
I'm still fighting with all this, sorry to bother so much (hope some day
when I understand it all better I'll write some article on how to set this up)
Well, I have already up the cluster and mounted the gfs filesystem in 3
machines, and if one of those goes down, it's correctly fenced. The FC
port is also disconnected, so I suppose at this point is everything ok.
The problem is on the recovery. I understand that when a node rejoins
is automaticaly unfenced, and then it can rejoin the fence and
mount again the filesystem.
I've blocked all input and output traffic on the node I want to test
with iptables.
The node gets fenced ok:
Aug 8 16:00:48 gfstest2 fenced[2594]: fencing node "gfstest1"
Aug 8 16:00:56 gfstest2 fenced[2594]: fence "gfstest1" success
Now I can access the GFS filesystem safely from my other 2 nodes, as the
FC port for gfstest1 is disabled now, but if I enable traffic for the
node, it does not rejoin the cluster. Shouldnt this be automatically?
Anyway, I cannot rejoin/leave/whatever the cluster from gfstest1:
gfstest1:~# cman_tool services
Service Name GID LID State Code
Fence Domain: "default" 1 2 run -
[1 2 3]
DLM Lock Space: "primer_fs" 2 3 run -
[1 2 3]
GFS Mount Group: "primer_fs" 3 4 run -
[1 2 3]
gfstest1:~# cman_tool join
cman_tool: Node is already active
gfstest1:~# cman_tool leave
cman_tool: Can't leave cluster while there are 5 active subsystems
and also, I cannot umount /dev/sdc1 as I have no access to the SAN
(and however DLM should block him not to do so). So I get a totally
screwed up system, that I can just fix by hard-rebooting (if I do a
clean reboot, the system "hangs" while "umounting filesystems").
Also, when the system boots up, the SAN is still unaccessible, as the
fencing script does not run to re-enable the port ...
I'm loooooost diving into google querys ... and certainly it's hard to
find accurate info about all this :/
could someone spot some light?
(probably I dont understand well how the fencing system works, but also
havent find anywhere where its explained :/)
thx in advance :)
--
Javier Polo @ Datagrama
902 136 126
More information about the Linux-cluster
mailing list