[Linux-cluster] Interfacing csnap to cluster stack

Daniel McNeil daniel at osdl.org
Mon Oct 11 23:08:28 UTC 2004


On Fri, 2004-10-08 at 17:49, Daniel Phillips wrote:
> On Thursday 07 October 2004 18:36, Daniel McNeil wrote:
> > On Thu, 2004-10-07 at 10:58, Daniel Phillips wrote:
> > > On Thursday 07 October 2004 12:08, Lon Hohberger wrote:
> > > > On Thu, 2004-10-07 at 02:07, David Teigland wrote:
> > > > > Using DLM locks is a third way you might
> > > > > solve this problem without using SM or RM; I don't understand
> > > > > the details of how that might work yet but it sounds
> > > > > interesting.
> > > >
> > > > It's primarily because it assumes that the DLM or GuLM can ensure
> > > > that only one exclusive lock is granted at a time.  Because of
> > > > this, the holder of the lock would thereby become the csnap
> > > > master server, and the old master will either have been fenced or
> > > > relinquished its duties willfully (and thus is no longer a
> > > > threat).
> > >
> > > Suppose that the winner of the race to get the exclusive lock is a
> > > bad choice to run the server.  Perhaps it has a fast connection to
> > > the net but is connected to the disk over the network instead of
> > > directly like the other nodes.   How do you fix that, within this
> > > model?
> >
> > Good question.  Another good question is how would a resource
> > manager know to pick the "best" choice?
> >
> > It would seem to me that the csnap-server is the best one
> > to know if this node is a good choice or not.
> >
> > I can think of a few of ways of handling this:
> >
> > 1. If this node is not a good choice to run csnap-server,
> >     do not run it at all.  If this node is not directly
> >     connected to the disk and is using the net to some
> >     other node, that other node has to be running, so that node
> >     can be the csnap-server.
> >
> > 2. Use 2 dlm locks.  1 for "better" choices (direct connected,
> >     faster connected), and 1 for "other" choices.  The "better"
> >     csnap-servers go for "better" lock exclusive while the "other"
> >     csnap-servers go the "better" lock for read and the "other"
> >     lock exclusive.  If a csnap-server gets the "better" lock
> >     exclusive, he is the master.  If a csnap-server gets the
> >     "better" lock for read AND the "other" lock exclusive,
> >     he's the master.  Same works for multiple priorities.
> >
> > 2. If a csnap-server get the lock to be master and he is not
> >     the best choice, the server can can check if other
> >     csnap-servers are queued behind him.  If there are, he
> >     can unlock the lock and the re-lock the lock to give
> >     another node the change to be master.
> 
> There are a few problems with this line of thinking:
> 
>   - You will be faced with the task of coding every possible resource
>     metric into some form of locking discipline.
> 
>   - Your resource metrics are step functions, the number of steps
>     being the number of locking layers you lather on.  Real resource
>     metrics are more analog than that.
> 
>   - You haven't done anything to address the inherent raciness of
>     giving the lock to the first node to grab it.  Chances are good
>     you'll always be giving it to the same node.
> 

Daniel,

I do not think of these as "problems".

You never answered, How would a resource manager know to pick the
"best" choice?

The cluster is made up of software components (see pretty picture
attached).  IMHO, it would be good to follow some simple rules:

	1. Components higher on the stack should only depend on
	    components lower on the stack.  Let's avoid circular
	    dependencies.

	2. When possible, use "standard" components and APIs.
	    We have agreed that some common components: 

		DLM
		cluster membership and quorum
		cluster communications (sort of)
		
AFAICT, resource management is higher up the stack and having shared
storage like cluster snapshot depend on it, would cause circular
dependencies.

SM, is a Sistina/Redhat specific thing.  Might be wonderful, but it
is not common.  David's email leads me to believe it is not the right
component to interface with.

So, what is currently implemented that we have to work with?
Membership and DLM.  These are core services and see to be
pretty solid right now.

So how can we use these?  Seems fairly simple:

1st implementation:
===================

	Add single DLM lock in csnap server.
	When a snap shot target is started, start up a csnap server.
	If the csnap server gets the lock, he is master.
	In normal operation, the csnap server is up and running	
	on all nodes.  One node has the DLM  lock and the others
	are ready to go, but waiting for the DLM lock to convert.
	On failure, the next node to get the lock is master.

	If administrator knows which machines is "best", have him
	start the snapshot targets on that machine 1st.  Not perfect,
	but simple and provides high availability.

	It is also possible for the csnap server to put its  
	server address and port information in the LVB.

	This seems simple, workable, and easy to program.

Follow on implementations
=========================
	Maybe multiple DLM locks for priorities, other options...

Questions:

	I do understand what you mean by inherent raciness.
	Once a cluster is up and running, the first csnap server
	starts up.  It does not stop until it dies, which I assume
	is rare.   What raciness are you talking about?
	
	How complicated of a resource metric were you thinking about?

	I have read through design doc and still thinking about client
	reconnect.  Are you planning on implementing the 4 message
	snapshot read protocol?

	There must be some internal cluster communication mechanisms
	for membership (cman) and DLM to work.  Is there some reason why
	these are not suitable for snapshot client to server
	communication?

Thanks,

Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arch.png
Type: image/png
Size: 8005 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20041011/892d3811/attachment.png>


More information about the Linux-cluster mailing list