[Linux-cluster] strange cluster behavior

Wed Mar 3 18:23:44 UTC 2010

Le mercredi 03 mars 2010 à 14:23 +0100, brem belguebli a écrit : 
> Hi Xavier,

Hi Brem, Xavier,

> 2010/3/3 Xavier Montagutelli <xavier.montagutelli at unilim.fr>:
> > On Wednesday 03 March 2010 03:11:50 brem belguebli wrote:
> >> Hi,
> >>
> >> I experienced a strange cluster behavior that I couldn't explain.
> >>
> >> I have a 4 nodes Rhel 5.4 cluster (node1, node2, node3 and node4).
> >>
> >> Node1 and node2 are connected to an ethernet switch (sw1), node3 and
> >> node4 are connected to another switch (sw2). The 4 nodes are on the same
> >> Vlan.
> >>
> >> sw1 and sw2 are connected thru a couple of core switches, and the nodes
> >> Vlan is well propagated across the network that I just described.
> >>
> >> Latency between node1 and node4 (on 2 different switches) doesn't exceed
> >> 0.3 ms.
> >>
> >> The cluster is normally configured with a iscsi quorum device located on
> >> another switch.
> >>
> >> I wanted to check how it would behave when quorum disk is not active
> >> (removed from cluster.conf) if a member node came to get isolated (link
> >> up but not on the right vlan).
> >>
> >> Node3 is the one I played with.
> >>
> >> The fence_device for this node is intentionally misconfigured to be able
> >> to follow on this node console what happens.
> >>
> >> When changing the vlan membership of node3, results are as expected, the
> >> 3 remaining nodes see it come offline after totem timer expiry, and
> >> node1 (lowest node id) starts trying to fence node3 (without success as
> >> intentionally misconfigured).
> >>
> >> Node3 sees itself the only member of the cluster which is inquorate.
> >> Coherent as it became a single node parition.
> >>
> >> When putting back node3 vlan conf to the right value, things go bad.
> >
> > (My two cents)
> >
> > You just put it back in the good VLAN, without restarting the host ?
> 
> Yeap, this it what I wanted to test.
>
> >
> > I did this kind of test (under RH 5.3), and things always get bad if a node
> > supposed to be fenced is not really fenced and comes back. Perhaps this is an
> > intended behaviour to prevent "split brain" cases (even at the cost of the
> > whole cluster going down) ? Or perhaps it depends how your misconfigured fence
> > device behaves (does it give an exit status ? What exit status does it send
> > ?).

+1

> When node3 comes back with the same membership state as previously,
> node1 (2 and 4) kill node3 (instruct cman to exit) because of this
> previous state being the same as the new one.
> 
> The problem is that, in the log, node1 and node2 at the very same time
> loose the quorum ( clurgmgrd[10469]: <emerg> #1: Quorum Dissolved) and
> go offline. This is what I cannot explain.
> 
> There is no split brain thing involved here as I expected node1 (and
> why not all the other nodes) to instruct node3 cman to exit and things
> could continue to run (may be without relocating node3 services as I
> couldn't get fenced).
> 
> Concerning the fencing, it may return a non zero value as I can see in
> node1 logs that it is looping trying to fence node3.
> >
> >>
> >> Node1, 2 and 4 instruct node3 cman to kill itself as it did re appear
> >> with an already existing status. Why not.
> >>
> >> Node1 and node2 then say then the quorum is dissolved and see themselves
> >> offline (????), node3 offline and node4 online.
> >>
> >> Node4 sees itself online but cluster inquorate as we also lost node1 and
> >> node2.
> >>
> >> I thought about potential multicast problems, but it behaves the same
> >> way when cman is configured to broadcast.
> >>
> >> The same test run with qdisk enabled is behaving normally, when node3
> >> gets back to network it gets automatically rebooted (thx to qdisk), the
> >> cluster remains stable.
> 
> Concerning the fact that it works when qdisk is enabled may be a "side
> effect" as I use a iscsi LUN accessed through the LAN interface, qdisk
> being a "heartbeat vector" node3 not being able to write to the LUN
> may make things more stable.
> 
> I should give a try with a SAN LUN used as qdisk and see how it behaves.

One would benefit seeing the architecture details, configuration and
logs.
Did you open a ticket at our support to investigate this behaviour with
our experts ?

Regards,

J.
-- 
Jérôme Fenal, RHCE                                     Tel.: +33 1 41 91 23 37
Solution Architect                                     Mob.: +33 6 88 06 51 15
Consultant Avant-ventes                                Fax.: +33 1 41 91 23 32
http://www.fr.redhat.com/                                    jfenal at redhat.com
Red Hat France SARL                                 Siret n° 421 199 464 00064
Le Linea, 1 rue du Général Leclerc                92047 Paris La Défense Cedex
Venez aux Red Hat Tech Happy Hours :   http://www.redhat.fr/events/happy-hour/