[Linux-cluster] Cluster Setup Questions

Wed May 17 12:40:51 UTC 2006

iana wrote:

>---------------8< snip, snip ---------------------------------
>
>What has to be decided:
>=======================
>Currently, no fencing is decided and two options are considered:
> 1. APC AP7921 Rack PDUs (still don't quite imagine how it'll work
> with servers with dual power supplies, but this can be solved)
> 2. Integrated IBM Slimline RSA II controller (Don't know if it's
> supported by RHCS).
>
We have a fence agent for RSA II, but it has never been tested with the 
slimline controller; only the standard RSA II controller.

 AP7921  is a supported power switch and a good choice. In a dual power 
supply environment, just configure the general device in the UI under 
fence devices, then configure two fences for each node (on the same 
fence level). The ui app will note that there are two power fences on 
the level and then write the correct 'off-on' sequence into the conf 
file. If you have any questions about the conf file and dual power 
supplies, just post your file here and someone will be able to help.

-J

>
>I  found  from  the  mailing  list, that there are problems with power
>fencing   when  losing  LAN  connectivity  (no  heartbeat  =>  servers
>powercycle each other). Haven't found a way to overcome it, though ;(
>
>The  manual says that RHCS can work with single storage only. Is there
>a  way  to  overcome it without storage virtualization? I really don't
>want building software RAID arrays over MAN.
>
>If  not  - I can establish SAN mirroring then, and mirror all the suff
>from  "primary"  storage  to  backup  storage.  Is there a way to make
>cluster   automatically   check   health   of   "active"  storage  and
>automatically  remount  FS  from backup storage if active storage goes
>down?  The only solution I found is to do this from customized service
>status script..
>
>If  so  -  can I run the cluster without common FS resource? My simple
>tests show that that's possible, but want proofs.
>
>The questions:
>==============
>Besides the questions asked above, the two main questions are:
> * How should I design the cluster so it works like I want?
> * How should I implement it?
> 
>Some smaller questions:
>* If the status script returns 0 - status is ok. What happens if not?
>Does the Cluster Software first try to restart the service or fails it
>over to the next node? If tries to restart - by which means ('service
>XXX restart', kill -9, or smth else)?
>
>*  How  does  cluster  software  check  the  status  of infrastructure
>resources  (IPs,  FSes, heartbeat). Can I change the course of actions
>there by writing my own scripts?
>
>
>
>Thanks in advance to those brave enough to answer =)
>
>
>
>
>
>
>  
>