[Linux-cluster] large-scale ( size: +10TB, users: +500 ) file server (Samba & NFS) using RHCS (CLVM + GFS) + CTDB + CoRAID (AoE) as backend storage?

Mon Dec 21 05:07:54 UTC 2009

Has anyone successfully setup a production large-scale ( size: +10TB, users: +500, concurrent/active users: +50 ) file server (Samba & NFS) using RHCS (CLVM + GFS) + CTDB + CoRAID (AoE) as backend storage? I'd be thankful if someone can share their experience with that sort of setup.  The setup I've done works but I'm not confident enough to put it in production, it doesn't consistently cope well under high load, sometimes it's RHCS (fencing, rgmanager, gfs, clvm) related misbehavior, other times it's CTDB related. I'm using the latest versions of RHCS & CTDB & AoE. 

This is the basic layout:
Nodes: 3 (identical IBM blades)
Fencing: IBM blade fence
FS: GFS (GFS2 seems to be less reliable even without CTDB)
Service Network: eth0
RHCS (multicasting) & CoRAID/AoE Network: eth1 (isolated from the service network)

RHCS handles the availability of CTDB through rgmanager, three services ensures the running of three services exclusively:
ctdb{1-3}: clvm --> gfs --> ctdb

CTDB handles the IP failover + Samba + NFS

I'm also interested to know if someone had a production CTDB with other cluster file systems like GPFS or OCFS or Lustre. 

Cheers,

  -- Abraham

''''''''''''''''''''''''''''''''''''''''''''''''''''''
Abraham Alawi

Unix/Linux Systems Administrator
Science IT
University of Auckland
e: a.alawi at auckland.ac.nz
p: +64-9-373 7599, ext#: 87572

''''''''''''''''''''''''''''''''''''''''''''''''''''''