[Linux-cluster] Cluster stability with missing qdisk

Fri Feb 17 10:24:37 UTC 2012

Hi,

> Please stay on-list or call Red Hat Support.

Whoops, my bad, it's back on-list again. (reply without checking to didn't help)

> On 02/16/2012 04:50 AM, Jan Huijsmans wrote:
>>>> In the clusters we have we use a qdisk to determine which node had the quorum, in case of a split brain situation.
>>
>>>> This is working great... until the qdisk itself is hit due to problems with the SAN. Is there a way to have a stable cluster,
>>>> with qdisks, where the absence of (1) qdisk won't kill the cluster all together. At this moment, with the setup with 1 qdisk,
>>>> the cluster is totally depending on the availability of the qdisk, while, IMHO, it should be expendable.
>>
>>> What kind of problems are you trying to avoid?
>>
>>> 1) I/O errors ->  disk died:
>>
>>> solution: set max_error_cycles to something nonzero (1? 2?), and qdiskd
>>> will then exit on the host where the problems are occurring when I/O
>>> errors are received
>>
>> We now have the interval for the qdisk set to 3 and tko to 50. So the status is
>> updated every 3 seconds and it's allowed to fail 50 times.
>>
>> Will the max_error_cycles cause the qdisk tries to fail when it didn't respond on
>> time? If so, what is it's relation with the interval and tko?
>>
>> Is this an option that can be used with the clustering suite in RHEL 5.6 software stack?
>>
>>> 2) Long I/O hangs (e.g. path fail-over)
>>
>>> solution: current 3.1.x / 3.2.x differentiates between I/O hangs and I/O
>>> errors, so hangs (e.g. due to path fail-over) no longer cause reboots.
>>
>> We have seen I/O hang of over 350 seconds at the worst times. (it's now<  10 seconds)
>> We see discarded frames on the SAN, so it's explainable. Only the system has
>> 4 paths, 2 on 1 fabric and 2 on the other. The default failure detection time is
>> 60 seconds in the RedHat default set-up. (which wasn't changed)
>
> You can hang forever with the new upstream feature as long as the nodes
> can communicate.

This is usefull. Is this available in the rhn channels for the 5.6 RHEL release or
is there an upgrade needed.

>> Our setup has 3 locations, datacenters A and B and quorum location C.
>> The last location is used by the SAN (IBM SVC/V7000 units) to determine
>> which datacenter (A or B) has access to C, when there is no communication
>> possible between both datacenters.
>
> For starters, set master_wins to '1' and don't use heuristics.

I'll see when I can test this. There was 1 cluster I had to add heuristics to ensure
logging from the evicted node before it was reset. (It's very irritating when a node
is evicted without a logged cause)

>> I would like to migrate the qdisk to this location, so we have the same setup
>> as with the SAN. The main problem is the failure of the quorum location C.

> Sure.

>> When we move the qdisk there and it fails, the cluster will fail on the qdisk,
>> when it should be able to function properly, as both nodes are up and are
>> able to communicate with each other.
>
> Setting max_error_cycles to 1 will cause I/O errors to remove the quorum
> disk on the host.

> The new upstream feature will prevent a hang from causing evictions.

> There is no method to 'ignore' eviction notices.

I don't want to ignore it, I don't want to get them when the nodes can reach each
other and both could do the job they need to.

>> On the SAN setup this is solved with 3 'qdisks', with one on each location. (A, B
>> and C) When C fails, there are still 2 qdisks available, so the cluster keeps
>> functioning.

> Qdiskd doesn't work deterministically in replicated environments.

I was thinking, is it possible to use an MD device with 3 mirror copies as qdisk
device? This would give the same functionality with only 1 qdisk device.

>> The problem that I'm trying to solve is the complete failure of the qdisk taking
>> down a perfectly correct operating cluster. We have to guard against a split brain
>> situation, but at the moment the costs of the qdisks are to high. (all clusters
>> are now limited to 1 node to prevent failures due to the qdisk problems)

> You might not need a quorum disk at all.

> A quorum disk doesn't obviate the need for fencing to complete in
> environments where you have a streched cluster.  E.g. even when you have
> sites A and C, when B dies, it will need to be fenced.  This will fail,
> because the site is not available.

That was what's bothering me on the design of the current cluster set-up.
However, when both nodes could reach the qdisk and not each other
via LAN, they evicted each other. (which was executed as soon as
the LAN was back up...)

> Why don't you take a look at these and file a ticket with Red Hat Support:

>   https://access.redhat.com/kb/docs/DOC-53348
>   https://access.redhat.com/kb/docs/DOC-58412

I'll take a look at it.

-- Jan