<div>On 6/15/06, Kevin Anderson <<a href="mailto:kanderso@redhat.com">kanderso@redhat.com</a>> wrote:<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> On Thu, 2006-06-15 at 02:49 +0800, jOe wrote: > Hello all, > > Sorry if this is a stupid question. > > I deploy both HP MC/SG linux edition and RHCS for our customers. I > just wondered why the latest RHCS remove quorum partition/lock lun > with the new fencing mechanisms(powerswitch,iLO/DRAC, SAN > switch....)? First off, I don't think it is completely fair to compare quorum partitions to fencing. They really serve different purposes. Quorum partition gives you the ability to maintain the cluster through flakey network spikes. It will keep you from prematurely removing nodes from the cluster. Fencing is really used to provide data integrity of your shared storage devices. You really want to make sure that a node is gone before recovering their data. Just because a node isn't updating the quorum partition, doesn't mean it isn't still scrogging your file systems. However, a combination of the two provides a pretty solid cluster in small configurations. And a quorum disk has another nice feature that is useful. That said, a little history before I get to the punch line. Two clustering technologies were merged together for RHCS 4.x releases and the resulting software used the core cluster infrastructure that was part of the GFS product for both RHCS and RHGFS. GFS didn't have a quorum partition as an option primarily due to scalability reasons. The quorum disk works fine for a limited number of nodes, but the core cluster infrastructure needed to be able to scale to large numbers. The fencing mechanisms provide the ability to ensure data integrity in that type of configuration. So, the quorum disk wasn't carried into the new cluster infrastructure at that time. Good news is we realized the deficiency and have added quorum disk support and it will be part of the RHCS4.4 update release which should be hitting the RHN beta sites within a few days. This doesn't replace the need to have a solid fencing infrastructure in place. When a node fails, you still need to ensure that it is gone and won't corrupt the filesystem. Quorum disk will still have scalability issues and is really targeted at small clusters, ie <16 nodes. This is primarily due to having multiple machines pounding on the same storage device. It also provides an additional feature, the ability to represent a configurable number of votes. If you set the quorum device to have the same number of votes as nodes in the cluster. You can maintain cluster sanity down to a single active compute node in the cluster. We can get rid of our funky special two node configuration option. You will then be able to grow a two node cluster without having to reset. Sorry I rambled a bit.. Thanks Kevin -- Linux-cluster mailing list <a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster">https://www.redhat.com/mailman/listinfo/linux-cluster </a> </blockquote></div> Thank you very much Kevin, your information is very useful to us and i've shared it to our engineer team. Here are two questions still left: Q1: In a two node cluster config, how does RHCS(v4) handle the heartbeat failed ? (suppose the bonded heartbeat path still failed by some bad situations). When using quorum disk/lock lun, the quorum will act as a tier breaker and solve the brain-split if heartbeat failed. Currently the GFS will do this ? or other part of RHCS? Q2: As you mentioned the quorum disk support is added into RHCS v4.4 update release, so in a two-nodes-cluster config "quorum disk+bonding heartbeat+fencing(powerswitch or iLO/DRAC) (no GFS)" is the recommended config from RedHat? Almost 80% cluster requests from our customers are around two-nodes-cluster(10% is RAC and the left is hpc cluster), We really want to provide our customers a simple and solid cluster config in their production environment, Most customer configure their HA cluster as Active/passive so GFS is not necessary to them and they even don't want GFS exists in their two-nodes-cluster system. I do think more and more customers will choose RHCS as their cluster solution and we'll push this after completely understand RHCS's technical benefits and advanced mechanisms. Thanks a lot, Jun