[Linux-cluster] strange cluster behavior

Fri Mar 5 18:58:00 UTC 2010

Hello Jerome,

Hope you're well.

I didn't as things seem to work correctly with qdisk enabled.

There is a chance the guys from support will tell me to re enable qdisk .....;-)

I was hopping Lon or Chrissie to point me to some idea.

Brem

2010/3/3 Jerome Fenal <jfenal at redhat.com>:
> Le mercredi 03 mars 2010 à 14:23 +0100, brem belguebli a écrit :
>> Hi Xavier,
>
> Hi Brem, Xavier,
>
>> 2010/3/3 Xavier Montagutelli <xavier.montagutelli at unilim.fr>:
>> > On Wednesday 03 March 2010 03:11:50 brem belguebli wrote:
>> >> Hi,
>> >>
>> >> I experienced a strange cluster behavior that I couldn't explain.
>> >>
>> >> I have a 4 nodes Rhel 5.4 cluster (node1, node2, node3 and node4).
>> >>
>> >> Node1 and node2 are connected to an ethernet switch (sw1), node3 and
>> >> node4 are connected to another switch (sw2). The 4 nodes are on the same
>> >> Vlan.
>> >>
>> >> sw1 and sw2 are connected thru a couple of core switches, and the nodes
>> >> Vlan is well propagated across the network that I just described.
>> >>
>> >> Latency between node1 and node4 (on 2 different switches) doesn't exceed
>> >> 0.3 ms.
>> >>
>> >> The cluster is normally configured with a iscsi quorum device located on
>> >> another switch.
>> >>
>> >> I wanted to check how it would behave when quorum disk is not active
>> >> (removed from cluster.conf) if a member node came to get isolated (link
>> >> up but not on the right vlan).
>> >>
>> >> Node3 is the one I played with.
>> >>
>> >> The fence_device for this node is intentionally misconfigured to be able
>> >> to follow on this node console what happens.
>> >>
>> >> When changing the vlan membership of node3, results are as expected, the
>> >> 3 remaining nodes see it come offline after totem timer expiry, and
>> >> node1 (lowest node id) starts trying to fence node3 (without success as
>> >> intentionally misconfigured).
>> >>
>> >> Node3 sees itself the only member of the cluster which is inquorate.
>> >> Coherent as it became a single node parition.
>> >>
>> >> When putting back node3 vlan conf to the right value, things go bad.
>> >
>> > (My two cents)
>> >
>> > You just put it back in the good VLAN, without restarting the host ?
>>
>> Yeap, this it what I wanted to test.
>>
>> >
>> > I did this kind of test (under RH 5.3), and things always get bad if a node
>> > supposed to be fenced is not really fenced and comes back. Perhaps this is an
>> > intended behaviour to prevent "split brain" cases (even at the cost of the
>> > whole cluster going down) ? Or perhaps it depends how your misconfigured fence
>> > device behaves (does it give an exit status ? What exit status does it send
>> > ?).
>
> +1
>
>> When node3 comes back with the same membership state as previously,
>> node1 (2 and 4) kill node3 (instruct cman to exit) because of this
>> previous state being the same as the new one.
>>
>> The problem is that, in the log, node1 and node2 at the very same time
>> loose the quorum ( clurgmgrd[10469]: <emerg> #1: Quorum Dissolved) and
>> go offline. This is what I cannot explain.
>>
>> There is no split brain thing involved here as I expected node1 (and
>> why not all the other nodes) to instruct node3 cman to exit and things
>> could continue to run (may be without relocating node3 services as I
>> couldn't get fenced).
>>
>> Concerning the fencing, it may return a non zero value as I can see in
>> node1 logs that it is looping trying to fence node3.
>> >
>> >>
>> >> Node1, 2 and 4 instruct node3 cman to kill itself as it did re appear
>> >> with an already existing status. Why not.
>> >>
>> >> Node1 and node2 then say then the quorum is dissolved and see themselves
>> >> offline (????), node3 offline and node4 online.
>> >>
>> >> Node4 sees itself online but cluster inquorate as we also lost node1 and
>> >> node2.
>> >>
>> >> I thought about potential multicast problems, but it behaves the same
>> >> way when cman is configured to broadcast.
>> >>
>> >> The same test run with qdisk enabled is behaving normally, when node3
>> >> gets back to network it gets automatically rebooted (thx to qdisk), the
>> >> cluster remains stable.
>>
>> Concerning the fact that it works when qdisk is enabled may be a "side
>> effect" as I use a iscsi LUN accessed through the LAN interface, qdisk
>> being a "heartbeat vector" node3 not being able to write to the LUN
>> may make things more stable.
>>
>> I should give a try with a SAN LUN used as qdisk and see how it behaves.
>
> One would benefit seeing the architecture details, configuration and
> logs.
> Did you open a ticket at our support to investigate this behaviour with
> our experts ?
>
> Regards,
>
> J.
> --
> Jérôme Fenal, RHCE                                     Tel.: +33 1 41 91 23 37
> Solution Architect                                     Mob.: +33 6 88 06 51 15
> Consultant Avant-ventes                                Fax.: +33 1 41 91 23 32
> http://www.fr.redhat.com/                                    jfenal at redhat.com
> Red Hat France SARL                                 Siret n° 421 199 464 00064
> Le Linea, 1 rue du Général Leclerc                92047 Paris La Défense Cedex
> Venez aux Red Hat Tech Happy Hours :   http://www.redhat.fr/events/happy-hour/
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster