Daniel McNeil daniel at osdl.org
Wed Sep 29 23:55:41 UTC 2004

On Mon, 2004-09-20 at 05:25, Patrick Caulfield wrote:
> At the cluster summit most people seedm to agree that we needed a generic,
> pluggable kernel API for cluster functions. Well, I've finally got round to
> doing something.
> The attached spec allows for plug-in cluster modules with the possibility of
> a node being a member of multiple clusters if the cluster managers allow it.
> I've seperated out the functions of "cluster manager" so they can be provided by
> different components if necessary.
> Two things that are not complete (or even started) in here are a communications
> API and a locking API. 
> For the first, I'd like to leave that to those more qualified than me to do and
> for the second I'd like to (less modestly) propose our existing DLM API with the
> argument that it is a full-featured API that others can implement parts of if
> necessary.
> Comments please.


I read over your api and have a few comments.

Simple stuff first.  The membership_node looks very similar to the SAF
interfaces, so I assume they fields mean the same.  mn_member is 32bits
but it just specifies if this node is a member (1) or not (0), right?

The mni_viewnumber is 32 bits, in SAF it is 64bits.  Might want it to
be 64bits.  (I think nodeid should be 64bits, but SAF has it as 32bits,
so I guess it is ok).

What is mni_context?

I bit more description of these fields would be nice -- don't have to
be as verbose as SAF :)

In membership_ops, you have start_notify and notify_stop -- might want
to be consistent with the naming (either notify_start or stop_notify).

Now the more complicated stuff:

I think we need more information on how this api works and a description
of how the calls are used.

cm_attach() is used to attach to a particular cluster provider that
has been registered.  Who calls cm_attach()?

I assume whoever calls cm_attach() will then be calling the ops

What is cmprivate in start_notify?

Once start_notify is called the CM module will call the callback
function whenever there is a change until notify_stop is called?

The membership_callback_routine only has "context" and "reason".
Again, what is context?  What is reason?
How is the data returned?  I'm guessing a struct membership_notify_info
is filled in at from the buffer passed in from start_notify,  Is that
right?  A bit more description here would be good.

What is the difference between get_quorate() and get_info() which
returns a struct quorum_info with qi_quorum?

Should get_quorate() and get_info() take a viewnumber so we can match
up the list of member and whether it had quorum?  (it could have changed
after the callback with membership before we call get_quorum.)



