[Linux-cluster] Using ping as a heuristic not a great idea?

Dustin Henry Offutt dhoffutt at gmail.com
Thu May 20 13:42:26 UTC 2010


Same experience.

In addition, best practice is to make the quorum partition its own 
physical disk, and admittedly this is not best practice, but instead 
took a slice of SAN RAID so as not to waste a whole precious disk plus 
one for mirroring for what is essentially a 20MB partition - and fibre 
traffic would upset Qdiskd's vote.

Again, what I was doing was not best practice, and if a qdisk partition 
is needed it's definitely worth the two physical disks.

In conclusion, no, I wouldn't use ping heuristics.

Richard Rogerson wrote:
> I've experienced the same thing. I have a two node cluster with
> DRBD+GFS2 and during very high network activity I've had the node get
> killed which caused GFS2 to lock up. I'd be very interested to see what
> solution you come up with.
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Don Hoover
> Sent: Thursday, May 20, 2010 8:48 AM
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] Using ping as a heuristic not a great idea?
>
> I know its in a ton of the example cluster configs out there, but we
> have had trouble whenever we try to use ping as a quorum heuristic.
>
> What we have seen is whenever a large file transfer happens, the pings
> start to get dropped by Linux, and the heuristic test starts to fail,
> and the cluster node gets killed.
>
> We have tried tweaking the interval, the count, the ttl, adding a -w
> etc.. and nothing really keeps the cluster from killing nodes when they
> get really busy network traffic.
>
>
> I have personally had this heuristic work on small test environments,
> but once we put into production on really bust workloads it pretty much
> is useless.
>
>
> It is a good idea in theory to use this because it would help ensure
> that in a split cluster situation you would end up with the box which
> had network connectivity would win over the one that did not. But...if
> it causes your cluster to die periodically its not worth it.
>
>
> Is this a known issue, but its just never mentioned in any of the
> cluster setup examples?
>
> Any one have a similar experience, or have any ideas on how to make it
> work in a very busy cluster environment?
>
>
> Also, this makes me wonder, if I have a two node cluster, with each node
> getting 1 vote, the quorum getting 1 vote, and the heuristic getting 1
> vote, but set the 'required' to only 2 votes, why would the heuristic
> cause a loss of quorum since the node with the quorum disk alone would
> have the needed two votes?
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100520/9da7c171/attachment.htm>


More information about the Linux-cluster mailing list