[Linux-cluster] Questions about GFS

Greg Perry gregp at liveammo.com
Wed Apr 12 03:13:31 UTC 2006


Hello,

I have been researching GFS for a few days, and I have some questions 
that hopefully some seasoned users of GFS may be able to answer.

I am working on the design of a linux cluster that needs to be scalable, 
it will be primarily an RDBMS-driven data warehouse used for data mining 
and content indexing.  In an ideal world, we would be able to start with 
a small (say 4 node) cluster, then add machines (and storage) as the 
various RDBMS' grow in size (as well as the use virtual IPs for load 
balancing across multiple lighttpd instances.  All machines on the node 
need to be able to talk to the same volume of information, and GFS (in 
theory at least) would be used to aggregate the drives from each machine 
into that huge shared logical volume).

With that being said, here are some questions:

1) What is the preference on the RDBMS, will MySQL 5.x work and are 
there any locking issues to consider?  What would the best open source 
RDBMS be (MySQL vs. Postgresql etc)
2) If there was a 10 machine cluster, each with a 300GB SATA drive, can 
you use GFS to aggregate all 10 drives into one big logical 3000GB 
volume?  Would that scenario work similar to a RAID array?  If one or 
two nodes fail, but the GFS quorum is maintained, can those nodes be 
replaced and repopulated just like a RAID-5 array?  If this scenario is 
possible, how difficult is it to "grow" the shared logical volume by 
adding additional nodes (say I had two more machines each with a 300GB 
SATA drive)?
3) How stable is GFS currently, and is it used in many production 
environments?
4) How stable is the FC5 version, and does it include all of the 
configuration utilities in the RH Enterprise Cluster version?  (the idea 
would be to prove the point on FC5, then migrate to RH Enterprise).
5) Would CentOS be preferred over FC5 for the initial proof of concept 
and early adoption?
6) Are there any restrictions or performance advantages of using all 
drives with the same geometry, or can you mix and match different size 
drives and just add to the aggregate volume size?

Thanks in advance,

Greg




More information about the Linux-cluster mailing list