GFS inside Xen

Fri Feb 25 18:13:28 UTC 2005

Colin Charles wrote:
> On Mon, 2005-02-14 at 13:09 +1100, Bojan Smojver wrote:
> 
>>This may sound like a very strange thing to do, but it may be useful
>>in
>>situations where you have only one physical machine and want to play
>>with GFS
>>based clusters. So the real question is, did anyone try this (with
>>current
>>development branch) and if yes, was there some kind of "magic" setup
>>involved
>>or did it just work?
> 
> 
> fedora-test-list at redhat.com might be a better avenue for these questions
> 
> Keep in mind that GFS might end up moving into Extras at some stage
> soon...

I've been playing with GFS on a test cluster for a while now.  I'm using
the rsync, bleeding-edge version of GFS and it's running on FC3 using
the -766 kernel on single processor 1.2GHz P4s with 1GB RAM.  Currently
I'm using manual fencing.  Works fine.

If you use the stuff I've been using, there's no real "magic" other than
having to rebuild the kernel with the "Scan all SCSI LUNs" option
enabled.  You should also keep the kernel source available as there
have been times where patches must be made...generally in the exported
kernel symbols arena.

The GFS setup I have is a two-node cluster.  I use cman/DLM to manage
the logical volumes and gulm (on a separate lock server) to handle GFS
locking (although cman/DLM can now handle that).  This has been done on
both a SCSI-based SAN (using Adaptec 2940UWs) and a fiberchannel-based
SAN (using QLogic QLA2300s).

The only major difference I've seen that I can't explain is that even
with the servers absolutely quiescent, the load on the servers sits at
1.00 with the QLogic stuff.  It goes to 0.00 with the SCSI stuff.  I
haven't looked at it too hard, but there seems to be some extra process
or overhead with the qla2xxx drivers.  It may have to do with the
multipath capability of the driver, but I don't know and it's a minor
point to our tests.

None the less, with ServerRoot and DocumentRoot on the GFS filesystem,
we were able to make Apache 1.3.x produce wire-speed traffic levels
(100Mbps) when throwing hundreds of simulated web browsers at the
cluster using siege.  The loads on the servers were in the 0.6 range
(well, 1.6 with the qla2xxx drivers).  In other words, the machines
didn't even break a sweat and GFS performed flawlessly.  To test write
ops, we also launched a pretty big rsync job on one node that copied
about 200GB of data to the GFS filesystem while the test ran.  Load
on the machine that did the rsync obviously went up (gee...all the way
to 2.03 (1.03 if you disregard the qla2xxx overhead), but it continued
to deliver 100Mbps over Apache (rsync was run on a separate NIC).

Two-node clusters with manual fencing is NOT a good idea, but it does
work.  The loss of a single server will cause you to lose quorum and
halt filesystem ops on the GFS volume.  I'd suggest a minimum of three
nodes, and if you use gulm as I am, two lock servers.  If you use
cman/DLM for GFS locking, you can do away with the lock servers.
----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer     rstevens at vitalstream.com -
- VitalStream, Inc.                       http://www.vitalstream.com -
-                                                                    -
-           Blech!  ACKth!  Ooop!  -- Bill the Cat (Outland)         -
----------------------------------------------------------------------