[Linux-cluster] problem with rejoining a node

Javi Polo javipolo at datagrama.net
Mon Aug 8 14:12:38 UTC 2005


Hi there (again :P)

I'm still fighting with all this, sorry to bother so much (hope some day
when I understand it all better I'll write some article on how to set this up)

Well, I have already up the cluster and mounted the gfs filesystem in 3
machines, and if one of those goes down, it's correctly fenced. The FC
port is also disconnected, so I suppose at this point is everything ok.

The problem is on the recovery. I understand that when a node rejoins
is automaticaly unfenced, and then it can rejoin the fence and
mount again the filesystem.

I've blocked all input and output traffic on the node I want to test
with iptables.

The node gets fenced ok:
Aug  8 16:00:48 gfstest2 fenced[2594]: fencing node "gfstest1"
Aug  8 16:00:56 gfstest2 fenced[2594]: fence "gfstest1" success

Now I can access the GFS filesystem safely from my other 2 nodes, as the
FC port for gfstest1 is disabled now, but if I enable traffic for the
node, it does not rejoin the cluster. Shouldnt this be automatically?

Anyway, I cannot rejoin/leave/whatever the cluster from gfstest1:
gfstest1:~# cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 2 3]

DLM Lock Space:  "primer_fs"                         2   3 run       -
[1 2 3]

GFS Mount Group: "primer_fs"                         3   4 run       -
[1 2 3]

gfstest1:~# cman_tool join
cman_tool: Node is already active
gfstest1:~# cman_tool leave
cman_tool: Can't leave cluster while there are 5 active subsystems

and also, I cannot umount /dev/sdc1 as I have no access to the SAN
(and however DLM should block him not to do so). So I get a totally
screwed up system, that I can just fix by hard-rebooting (if I do a
clean reboot, the system "hangs" while "umounting filesystems").

Also, when the system boots up, the SAN is still unaccessible, as the
fencing script does not run to re-enable the port ...

I'm loooooost diving into google querys ... and certainly it's hard to
find accurate info about all this :/

could someone spot some light?
(probably I dont understand well how the fencing system works, but also
havent find anywhere where its explained :/)

thx in advance :)
-- 
Javier Polo @ Datagrama
902 136 126




More information about the Linux-cluster mailing list