[Linux-cluster] Interfacing csnap to cluster stack

Sat Oct 9 00:44:44 UTC 2004

On Friday 08 October 2004 18:30, Daniel McNeil wrote:
> The csnap server is the best one to know what is best.

It's not running, how can it know?  You probably meant, the csnap agent 
(running in user space on every cluster node).  Then I'd say, how can 
it know by looking only at itself?  (Similar to looking in the mirror 
at yourself and thinking "I'm da man!"  In the real world, it's 
normally other people who decide you're the man, and give you 
responsibilities.)

> See my previous posting on using 2 dlm locks to allow
> different priorities.  A directly connected node will
> be selected if there is one, otherwise one of the "other"
> nodes will be selected.  The csnap server can just do
> the right thing.

Sorry for not responsding to that.  Speed of connection is only one 
metric.  What about the speed and memory capacity of the node itself?  
What about the current workload of the node?  And what about comparing 
these things to other nodes?

> > > (3) Don't use the cluster-lock model.  It has its shortcomings. 
> > > Its strengths are in its simplicity; not its flexibility.
>
> Actually, the DLM can be used in simple ways or very complex
> ways.  It is very flexible.  It does have a different programming
> model that takes time to get use to.

As soon as you start using it in a complex way, you probably should have 
spent your time building the thing you're trying to approximate with 
the dlm.  You'll end up with what you really want, with the same amount 
of effort, or less because you won't spend a lot of time trying to make 
it be something it isn't.

If you find a need for global locking in your design, go ahead and use 
the dlm for it, but don't try to turn the dlm into a resource manager, 
it simply isn't.

Why is it that the hammer/nail effect gets so strong in the vicinity of 
a dlm?

> > Yes, that's the one.  We need real resource management, even if it
> > initially just consists of an administrator setting up config
> > files. Something has to read those config files[1] and respond to
> > server instance requests from csnap agents accordingly.
> >
> > [1] At cluster bring-up time.  The resource manager has to be able
> > to operate without reading files during failover.
>
> IMHO, a resource manager is NOT the right way to do this:
>
> - cluster services should avoid config files if at all possible.
>    If they are are not set up right, the whole cluster can get
>    messed up.  If the cluster changes, the config files might
>    need to change.  The config files will be your single point
>    of failure.  From previous experience, cluster configuration
>    is one of the biggest sources of cluster failure, and you
>    won't know it until a failure -- the worst possible time.

Then the configuration file should let you configure only what needs to 
be configuring.  Anyway, what do config files have to do with needing 
or not needing a resource manager?

> - It makes a low-level function dependent on a higher-level
>    function.  As you say above, the resource manager has to
>    operate very carefully to avoid dead locking.  This is
>    asking for trouble.

Then what we want is a good, low-level resource manager.  Let people 
interface to it and script themselves into oblivion, err, nirvana, but 
make the low level thing very simple and easy to audit.

Regards,

Daniel