[Linux-cluster] Why Redhat replace quorum partition/lock lun with new fencing mechanisms?

Thu Jun 15 16:30:22 UTC 2006

On 6/15/06, Kevin Anderson <kanderso at redhat.com> wrote:
>
> On Thu, 2006-06-15 at 02:49 +0800, jOe wrote:
> > Hello all,
> >
> > Sorry if this is a stupid question.
> >
> > I deploy both HP MC/SG linux edition and RHCS for our customers. I
> > just wondered why the latest RHCS remove quorum partition/lock lun
> > with the new fencing mechanisms(powerswitch,iLO/DRAC, SAN
> > switch....)?
>
> First off, I don't think it is completely fair to compare quorum
> partitions to fencing.  They really serve different purposes.  Quorum
> partition gives you the ability to maintain the cluster through flakey
> network spikes.  It will keep you from prematurely removing nodes from
> the cluster.  Fencing is really used to provide data integrity of your
> shared storage devices.  You really want to make sure that a node is
> gone before recovering their data.  Just because a node isn't updating
> the quorum partition, doesn't mean it isn't still scrogging your file
> systems.  However, a combination of the two provides a pretty solid
> cluster in small configurations.  And a quorum disk has another nice
> feature that is useful.
>
> That said, a little history before I get to the punch line.  Two
> clustering technologies were merged together for RHCS 4.x releases and
> the resulting software used the core cluster infrastructure that was
> part of the GFS product for both RHCS and RHGFS.  GFS didn't have a
> quorum partition as an option primarily due to scalability reasons.  The
> quorum disk works fine for a limited number of nodes, but the core
> cluster infrastructure needed to be able to scale to large numbers.  The
> fencing mechanisms provide the ability to ensure data integrity in that
> type of configuration.  So, the quorum disk wasn't carried into the new
> cluster infrastructure at that time.
>
> Good news is we realized the deficiency and have added quorum disk
> support and it will be part of the RHCS4.4 update release which should
> be hitting the RHN beta sites within a few days.  This doesn't replace
> the need to have a solid fencing infrastructure in place.  When a node
> fails, you still need to ensure that it is gone and won't corrupt the
> filesystem.  Quorum disk will still have scalability issues and is
> really targeted at small clusters, ie <16 nodes.  This is primarily due
> to having multiple machines pounding on the same storage device.  It
> also provides an additional feature, the ability to represent a
> configurable number of votes.  If you set the quorum device to have the
> same number of votes as nodes in the cluster.  You can maintain cluster
> sanity down to a single active compute node in the cluster.  We can get
> rid of our funky special two node configuration option.  You will then
> be able to grow a two node cluster without having to reset.
>
> Sorry I rambled a bit..
>
> Thanks
> Kevin
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

Thank you very much Kevin, your information is very useful to us and i've
shared it to our engineer team.
Here are two questions still left:
Q1: In a two node cluster config, how does RHCS(v4) handle the heartbeat
failed ? (suppose the bonded heartbeat path still failed by some bad
situations).
When using quorum disk/lock lun, the quorum will act as a tier breaker and
solve the brain-split if heartbeat failed. Currently the GFS will do this ?
or other part of RHCS?

Q2: As you mentioned the quorum disk support is added into  RHCS v4.4 update
release, so in a two-nodes-cluster config "quorum disk+bonding
heartbeat+fencing(powerswitch or iLO/DRAC) (no GFS)" is the recommended
config from RedHat? Almost 80% cluster requests from our customers are
around two-nodes-cluster(10% is RAC and the left is hpc cluster), We really
want to provide our customers a simple and solid cluster config in their
production environment, Most customer configure their HA cluster as
Active/passive so GFS is not necessary to them and they even don't want GFS
exists in their two-nodes-cluster system.

I do think more and more customers will choose RHCS as their cluster
solution and we'll push this after completely understand RHCS's technical
benefits and advanced mechanisms.

Thanks a lot,

Jun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060616/616b2b3e/attachment.htm>