a couple questions from a cluster newbie

Thu Mar 26 10:49:29 UTC 2009

Le lundi 23 mars 2009 07:47, Colin van Niekerk a écrit :
> Hi there,
Hi Colin,
>
> Apologies is anyone has answered this already and I have missed it. This
> post has been out for a while now.
You're the first, kudos :)
>
> I would configure three VM's on the Failover box and add the ability to
> have each server failover separately. This would involve having three load
> balanced clusters as in the attached, again fixed sized fonts.
Thanks for your ascii art. Which VM would you advice ? Xen as it is officially 
supported on rhel, or kvm ? something else maybe ?
>
> To replicate data between the virtual server and the physical server within
> each cluster I would use DRBD (RAID1 on a network level), you can configure
> this so that only once the data is committed to disk on both sides does the
> kernel confirm the write. This will present the system with a new block
> device and data must only be read and written via this device. As long as
> your system is 'strong' enough and the link between the servers is fast
> enough (this would depend on the amount of changed to the data -  how much
> data would need to be written to the block device on the other end of the
> network) it will be just like reading and writing to any other block
> device.
Our network is gbps, and machines will be in the same rack, one hop away. So I 
guess synchronous replication will do the trick.
>
> For the backend you could use Conga with luci and ricci to manage the
> cluster (thinking about ways to avoid pain going forward) but I have not
> done this in a production environment so I'm not sure about the details.
OK, I'll set up a couple VM soon to check the details.
>
> I'm afriad I have worked very little GFS as well so I can't answer you on
> that side of things. Maybe the GNBD would be better for the load balanced
> server replication as well, but as far as I know the main reason you would
> use GNBD is that it exports the file system to many users and manages
> locking better between the users which wouldn't help in the pg/ds/ap
> clusters. Can anyone confirm?
>
> Just so I'm clear on the backend side. It sounds like there is a level of
> interaction between users and the actual data on the backend servers. Do
> the users query a process on the storage/processing servers and then that
> process works on the data and gives the user a result? Or do the users
> interact with the data directly?
Users interact directly with data. classic (and simplified) scheme is:
(shell script pseudo code)
for i in files_to_be_processed do
    processing_program $i $output_dir/$output_result
done

Thx for helping,
Regards,
-- 
Laurent Wandrebeck
IT Manager / Directeur des systemes d'informations
HYGEOS, Earth Observation Department / Observation de la Terre
Euratechnologies
165 Avenue de Bretagne
59000 Lille, France
tel: +33 3 20 08 24 98
http://www.hygeos.com