[Linux-cluster] Why Redhat replace quorum partition/lock lun with new fencing mechanisms?

Kevin Anderson kanderso at redhat.com
Thu Jun 15 02:47:06 UTC 2006


On Thu, 2006-06-15 at 02:49 +0800, jOe wrote:
> Hello all,
> 
> Sorry if this is a stupid question.
> 
> I deploy both HP MC/SG linux edition and RHCS for our customers. I
> just wondered why the latest RHCS remove quorum partition/lock lun
> with the new fencing mechanisms(powerswitch,iLO/DRAC, SAN
> switch....)? 

First off, I don't think it is completely fair to compare quorum
partitions to fencing.  They really serve different purposes.  Quorum
partition gives you the ability to maintain the cluster through flakey
network spikes.  It will keep you from prematurely removing nodes from
the cluster.  Fencing is really used to provide data integrity of your
shared storage devices.  You really want to make sure that a node is
gone before recovering their data.  Just because a node isn't updating
the quorum partition, doesn't mean it isn't still scrogging your file
systems.  However, a combination of the two provides a pretty solid
cluster in small configurations.  And a quorum disk has another nice
feature that is useful.

That said, a little history before I get to the punch line.  Two
clustering technologies were merged together for RHCS 4.x releases and
the resulting software used the core cluster infrastructure that was
part of the GFS product for both RHCS and RHGFS.  GFS didn't have a
quorum partition as an option primarily due to scalability reasons.  The
quorum disk works fine for a limited number of nodes, but the core
cluster infrastructure needed to be able to scale to large numbers.  The
fencing mechanisms provide the ability to ensure data integrity in that
type of configuration.  So, the quorum disk wasn't carried into the new
cluster infrastructure at that time.

Good news is we realized the deficiency and have added quorum disk
support and it will be part of the RHCS4.4 update release which should
be hitting the RHN beta sites within a few days.  This doesn't replace
the need to have a solid fencing infrastructure in place.  When a node
fails, you still need to ensure that it is gone and won't corrupt the
filesystem.  Quorum disk will still have scalability issues and is
really targeted at small clusters, ie <16 nodes.  This is primarily due
to having multiple machines pounding on the same storage device.  It
also provides an additional feature, the ability to represent a
configurable number of votes.  If you set the quorum device to have the
same number of votes as nodes in the cluster.  You can maintain cluster
sanity down to a single active compute node in the cluster.  We can get
rid of our funky special two node configuration option.  You will then
be able to grow a two node cluster without having to reset.

Sorry I rambled a bit..

Thanks
Kevin




More information about the Linux-cluster mailing list