[Linux-cluster] I give up

Kevin Anderson kanderso at redhat.com
Wed Nov 28 20:32:44 UTC 2007


On Wed, 2007-11-28 at 09:47 -0800, Scott Becker wrote:
> Kevin Anderson wrote: 
> > 
> > Sorry to hear you were unable to get it to work.  Going back through
> > your postings over the last month, it looks like you have been
> > trying to setup a two node cluster with quorum disk that provides a
> > failover environment.  Would a simple button in conga that does that
> > have been useful?
> > 
> Actually 3 nodes down to 1 node using IP tie breaker. No shared
> storage so I can't use Qdisk. Was resorting to ping test to stop
> fencing from broken node.

Not sure what you mean by 3 to 1 using IP tie breaker.  How are you
maintaining quorum without qdisk as a voting entity?

> > Also, you mention lingering issues about openais, but didn't see
> > anything in your posts that describe any problems in that space.
> > Can you be more specific?
> > 
> During network partition test, expecting a fencing race where I
> control the outcome, one node would not fence the other and did not
> takeover the service until the other node attempted to rejoin the
> cluster (way too late).

Is this resolved with the 5.1 release we did a few weeks ago?

> 
> Another poster stated that he could not get the cluster to function
> properly since the switch to Openais. Hence I'm speculating that they
> are related.
Doubtful.  There have been issues with cisco switch configurations with
allowing multicast properly.  All of those have been resolved with a
switch configuration setting change.

> 
> 
> 
> If you guys are serious then I have some suggestions:
> 
> Interactive documentation so the community can contribute.
> 
Good point.  Our community interface (ie sources.redhat.com) is pretty
weak and needs revamping.  Are you thinking blogs, wiki, etc?  Also, our
assumption is people accessing the community pages and mailing lists are
developers, open source contributors, nonpaying consumers versus direct
paying customers.  Responses on the mailing lists, while pretty frequent
can be spotty.  Paying customers usually contact support directly rather
than wait on developer responses.

> Debugging mode option at the top of cluster.conf which will cause
> "Every" component to verbosely contribute to a separate log file which
> can be sent in to to be analyzed. A limited community size because of
> the niche nature of the system, calls for extra debugging effort.
> 
Good suggestion and one to look into providing.  We did add the ability
to centralize log messages in openais, but haven't incorporated that
support into the commands/utilities as of yet.

We have an internal bugzilla for this one already, will see if product
management can open it up for external viewing (ie not sure why it is
locked at the moment).

> An interactive design document for Cluster Suite where users can make
> suggestions for changes and vote for new features. In the process of
> exploring ways to get my setup working I kept finding ways and
> roadblocks. It's an involved product and process and I personally
> don't have any way to contribute to a design discussion.
Currently community development discussion happens on cluster-devel
mailing list, #linux-cluster at irc.freenode.net and through bugzilla
updates.  

However, we don't have good records of which things are actively getting
worked, versus areas where people can jump in and won't feel redundant.
This also points out that we haven't had a cluster summit for a couple
of years, which is also a way we coordinated/communicated effort and
direction.  Probably time to do that again as well as revamping our web
interface.

> 
> For example, am I the only one who would benefit from ssh support in
> the fence agents? Or would that help many others also.

Not the only one, which is why we probably didn't respond.  ssh support
needs to be added uniquely to each fence agent since it isn't just
connecting, but also using the commands unique to each fencing device
once you get access. Just doing a quick bugzilla search shows:

https://bugzilla.redhat.com/buglist.cgi?product=Red+Hat+Cluster
+Suite&product=Red+Hat+Enterprise+Linux
+5&version=&component=&target_milestone=&bug_status=NEW&bug_status=ASSIGNED&bug_status=NEEDINFO&bug_status=MODIFIED&short_desc_type=allwordssubstr&short_desc=ssh+fence&long_desc_type=allwordssubstr&long_desc=

This is definitely support we are working on.  Again, we should include
in roadmap/priorities/vision page so people know it.

> 
> Another example, during the process I realized that the redundant
> power and redundant network connections I set up to "Keep Things
> Working" actually had a negative impact on the reliability of fencing.
> This is OK because lack of ability to fence doesn't stop the cluster
> as long as a fencing ability failure is detected and fixed in a timely
> manner. So I setup an hourly cron script to test communications to the
> power switches. Would others benefit from that feature? Should it be
> included?
> 
> How about a bigger question, is that already a feature? I think
> Heartbeat has fence device status monitoring but without an organized,
> easy to navigate design document for Cluster Suite I don't know if
> that's a feature of it or not.
Both of these are part of the bigger picture resource monitoring work
that Lon and some of the linux ha guys are jointly working on converging
to a single base.  See this page - 
http://people.redhat.com/lhh/cluster-integration.html

Which again, not very visible :-(.

So, my take aways: 

1. For your environment and the current product, fence device access is
insecure and needs to be on an internal private network.  SSH specific
fence device support is on the list of work items, but is dependent on
getting time/resources to do them.  Even then, it should still be on an
internal private network.

2. Need better community pages - revamp/move website, add blogging,
wiki, regular updates, more communication, roadmaps, etc...  Purpose is
to allow more community communication and participation.

3. Time for Cluster Summit again - location preferences, timeframe,
funding, etc?


Other ideas?

Thanks
Kevin


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20071128/9f658407/attachment.htm>


More information about the Linux-cluster mailing list