[Linux-cluster] I give up

Wed Nov 28 17:47:43 UTC 2007

Kevin Anderson wrote:
>
> Sorry to hear you were unable to get it to work.  Going back through 
> your postings over the last month, it looks like you have been trying 
> to setup a two node cluster with quorum disk that provides a failover 
> environment.  Would a simple button in conga that does that have been 
> useful?
>
Actually 3 nodes down to 1 node using IP tie breaker. No shared storage 
so I can't use Qdisk. Was resorting to ping test to stop fencing from 
broken node.
> Also, you mention lingering issues about openais, but didn't see 
> anything in your posts that describe any problems in that space.  Can 
> you be more specific?
>
During network partition test, expecting a fencing race where I control 
the outcome, one node would not fence the other and did not takeover the 
service until the other node attempted to rejoin the cluster (way too late).

Another poster stated that he could not get the cluster to function 
properly since the switch to Openais. Hence I'm speculating that they 
are related.

If you guys are serious then I have some suggestions:

Interactive documentation so the community can contribute.

Debugging mode option at the top of cluster.conf which will cause 
"Every" component to verbosely contribute to a separate log file which 
can be sent in to to be analyzed. A limited community size because of 
the niche nature of the system, calls for extra debugging effort.

An interactive design document for Cluster Suite where users can make 
suggestions for changes and vote for new features. In the process of 
exploring ways to get my setup working I kept finding ways and 
roadblocks. It's an involved product and process and I personally don't 
have any way to contribute to a design discussion.

For example, am I the only one who would benefit from ssh support in the 
fence agents? Or would that help many others also.

Another example, during the process I realized that the redundant power 
and redundant network connections I set up to "Keep Things Working" 
actually had a negative impact on the reliability of fencing. This is OK 
because lack of ability to fence doesn't stop the cluster as long as a 
fencing ability failure is detected and fixed in a timely manner. So I 
setup an hourly cron script to test communications to the power 
switches. Would others benefit from that feature? Should it be included?

How about a bigger question, is that already a feature? I think 
Heartbeat has fence device status monitoring but without an organized, 
easy to navigate design document for Cluster Suite I don't know if 
that's a feature of it or not.

    thanks
    scottb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20071128/b91074f7/attachment.htm>