[Linux-cluster] Fencing Driver API Requirements

Tue Apr 22 16:22:28 UTC 2008

On Tue, 2008-04-22 at 18:45 +0300, Harri.Paivaniemi at tietoenator.com
wrote:
> I agree also,
> 
> but my problem is much more basic: to my mind this whole cluster is so badly documented, that it's
> really hard to believe we have talked for years about how linux can be business-critical platform...
> 
> >From a normal human being like myself it has taken incredible reverse-engineering just to find all pieces
> of information, one piece here and one there and nothing from RH, to just understand how cluster works. 
> 
> Versions go on, things change and information just gets old just when I understand it.
> 
> Just an example: When I first used qdisk I leared that I have to tune deadnode_timeout. When moved to ver5
>  /proc/cluster got lost... so had to figure out.... ahaa its totem token now... RH support didn' know
> this. This kind of frustrating things happen to me all the time.
> 
> Information is splitted to man- pages, wiki, faq's, poor RH- manuals, different txt- files from the
> deepnes of internet. I have had to use all my poor genetic power to trie to create theorys about this
> cluster as an administrator.
> 
> -hjp
> 
> 

Harri,

Your complaints are valid and we are aware of them within the various
projects that make up the community cluster stack.  We are working
towards improving the documentation we produce as open source projects
and our feeding of that documentation to commercial distribution vendor
products like RHEL5.

On a positive note, the various open source communities don't plan to
make any significant user-interface-specific changes to any of the
cluster stack anytime soon or for a very very long time.  We have
learned through experience this is very painful on our open source
users, distribution vendors, various third party support, etc (the folks
that add value to the software the various open source communities
produce).  We have made changes to our infrastructure from previous
versions of the cluster stack to the latest versions for various reasons
1) reliability 2) remove all bits from kernel that are unnecessary 3)
downstream adoption by third parties.  I know as a user these things may
not be critical to "getting the thing to just work" but over time there
is significant value in having _more_ people working, supporting,
distributing the code base then less.

I'd ask that folks be patient with the communities.  We are coordinating
and working together for the first time since clusters were started on
Linux, have widespread distribution, good adoption, and in general our
development pace is accelerating, our user view is maturing, and our
third party support from various distributions is improving.  All of
these things lead to downstream distributions with a better product,
containing better documentation and support, then was ever available in
the past.

regards
-steve

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com on behalf of Marek 'marx' Grac
> Sent: Tue 4/22/2008 18:09
> To: linux clustering
> Subject: Re: [Linux-cluster] Fencing Driver API Requirements
>  
> Hi,
> 
> Jonathan Buzzard wrote:
> > On Mon, 2008-04-14 at 20:47 +0200, Marek 'marx' Grac wrote:
> >   
> > The issue is that with such a critical component of a cluster (if the
> > fencing is not right bad things will happen) that in order to write a
> > new fencing agent one has to start reverse engineering from source to
> > work out what you need to do.
> >   
> Those new agents with python module are available only in developer 
> branch are not a part of any distribution yet. There will be a 
> documentation soon. Supported fencing agents has their man pages are 
> there is description of how they work as they can use both getopt and 
> stdin arguments. These options does not have to have anything common, as 
> they are taken from the cluster.conf. Unfortunately some of the existing 
> fencing agents use different options, so there are no standard options 
> [there is an attempt to have them in new fencing agents].
> 
> > This is incredibly bad practice, and is bound to lead to improperly
> > implemented fencing agents that then lead to bad things happening on
> > clusters with these fencing agents.
> >
> >   
> I agree.
> 
> > There a loads of potential fencing devices out there that could be
> > supported, that are currently not. From my perspective trying to
> > implement a fencing agent for Alert On Lan 2, it was easier to reverse
> > engineer the magic packets of death using tcpdump and IDA pro as well as
> > implementing a C based Linux command tool to generate them, than it has
> > been to write a functioning fencing agent.
> >
> > It would take a couple of hours tops for someone to write a spec for
> > what a fencing agent needs to do.
> >   
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster