[Linux-cluster] Clustering Tutorial

Thu Oct 20 18:58:24 UTC 2005

Just a note of caution, there's a big difference between High Availability Clustering and High
Performance Clustering.  AFAIK, Beowulf is an HPC technology.  RHCS (Red Hat Cluster Suite) and
GFS (Global File System) are HAC technologies.  Some of the underlying building blocks are used by
both communities but they are used for fundamentally difference purposes.

http://www.linux-ha.org is the home of another HAC, linux-based technology.  They have more
documentation on clustering and its concepts.  Red Hat does a good job on the HOW-TOs of getting a
cluster working but a terrible job of telling folks the WHY-TOs of clustering.

I'm currently working on a comparison of linux-ha and RHCS so if you have questions regarding HAC
on linux then fire away.  If you have a beowulf cluster, je ne comprends pas, sorry.

--tims

--- Michael Will <mwill at penguincomputing.com> wrote:

> http://www.phy.duke.edu/resources/computing/brahma/Resources/beowulf_book.php 
> is a good start,
> http://www.beowulf.org is another good place, it is also the home of the 
> original beowulf mailinglist.
> 
> Generally I would recommend digging through recent mailinglist postings 
> because
> there are often very informed answers to questions.
> 
> Lon just answered a fencing question a few days ago:
> 
> "STONITH, STOMITH, etc. are indeed implementations of I/O fencing.
> 
> Fencing is the act of forcefully preventing a node from being able to
> access resources after that node has been evicted from the cluster in an
> attempt to avoid corruption.
> 
> The canonical example of when it is needed is the live-hang scenario, as
> you described:
> 
> 1. node A hangs with I/Os pending to a shared file system
> 2. node B and node C decide that node A is dead and recover resources
> allocated on node A (including the shared file system)
> 3. node A resumes normal operation
> 4. node A completes I/Os to shared file system
> 
> At this point, the shared file system is probably corrupt.  If you're
> lucky, fsck will fix it -- if you're not, you'll need to restore from
> backup.  I/O fencing (STONITH, or whatever we want to call it) prevents
> the last step (step 4) from happening.
> 
> How fencing is done (power cycling via external switch, SCSI
> reservations, FC zoning, integrated methods like IPMI, iLO, manual
> intervention, etc.) is unimportant - so long as whatever method is used
> can guarantee that step 4 can not complete."
> 
> "GFS can use fabric-level fencing - that is, you can tell the iSCSI
> server to cut a node off, or ask the fiber-channel switch to disable a
> port.  This is in addition to "power-cycle" fencing."
> 
> 
> Michael
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

__________________________________ 
Yahoo! Music Unlimited 
Access over 1 million songs. Try it free.
http://music.yahoo.com/unlimited/