a couple questions from a cluster newbie

Fri Mar 20 16:12:56 UTC 2009

Hi list,

our park is going to gain three new boxes, pushing storage size to 70TB.
I think it's time to get rid of nfs /net automounts, and to go for some kind 
of a cluster.
long story short:
each typical server has a local storage (1 to 8TB, up to 15 soon), that are 
sata discs connected to a 3ware card, using hard raid 10 or 5.
each of these machines is aimed at processing data from a given satellite.
there are also one pgsql server, one apache server, one nis/home (via nfs) 
server each with a 3ware and its discs. brw, the nis/nfs server is soon to be 
turned into a directory server.
gbps network, non administrable switches. /24 network class.

now, I'd like to transform that mess into:
1) have one GFS volume for sat1...N data. So that, if needed, you can process 
whatever you want from whatever machine.
2) have a failover machine that could automagically take load for pg, apache 
and nfs/nis (the soon to be directory server) if the dedicated box fails. 
that means an efficient replication so data are identical on original 
pg/apache/etc machines and the failover one.
3) have some kind of load balancing on sat1...N, that would put processes on a 
box where processed data are local, without having the user to decide where 
to launch processes. resulting data from processes would have to be written 
on the local storage of the box. So that sat1 data and sat1 processed data 
stay on the same physical volume. That way, if a box really badly crashes, we 
know which data were lost (we can't afford to backup 70TB).

now, questions (thx for arriving down there:) :

1) what i've read in doc is i should use gndb. am i on the right track ? It's 
unclear to me if it is safe to use a machine both for serving and processing 
data.
2) failover should be possible if i understood correctly doc. where i'm a bit 
stuck is the replication part part. wal shipping should do the trick for pg. 
directory server has some kind of failover mechanism afaik. about apache, i'm 
a bit in the dark. could someone enlighten me ?
3) is such a thing possible with cluster suite ? at all ? Would there be any 
better way to solve the problem of the boxes configuration so our DC can 
continue to grow without becoming a nightmare for me and users ?
4) right now, user homes follow them to whatever box they log on. should /home 
be another gfs volume so that every server (potentially hidden by load 
balancing if i understood correctly) can continue to access these data 
(processing codes are often on /home). Any other solution ?

You'll find attached some kind of ascii art trying to describe what i'd like 
to get :) (open it with fixed size font)
Thanks a lot for helping.
Best Regards,
-- 
Laurent
-------------- next part --------------
        _____
|S1|----|G  |----|U1|--------|
        |F  |                |
|S2|----|S  |----|U2|--------|
        |   |                |
|S3|----|V  |----|U3|--------|
        |O  |                |
|S4|----|L  |----|U4|--------|
        |U  |                |
.       |M  |    .           |
.       |E  |    .           |
.       |   |    .           |
|Sn|----|   |----|Un|--------| |--home GFS volume accessible by every box ?
        |   |                | |
        |   |---------------|Ds|--|
        |   |                     |
        |   |----|Pg|--|----------|
        |   |          |
        |   |----|Ap|--|
        |   |          |
        |   |----|Fo|--|
        |___|

|Sx|: boxes with dedicated storage for satellite images processing.
|Ux|: user boxes.
|Ds|: Directory server (serves /home to user machines)
|Pg|: PostgreSQL server
|Ap|: Apache server
|Fo|: Failover server (can take Pg, Ds, Ap load)