[Linux-cluster] Interfacing csnap to cluster stack

Fri Oct 8 19:45:45 UTC 2004

On Friday 08 October 2004 15:13, Lon Hohberger wrote:
> On Thu, 2004-10-07 at 21:42 -0400, Daniel Phillips wrote:
> >     - There may never be more than one, or the snapshot metadata will
> >       be corrupted (this sounds like a good job for gdlm: let the
> >       server take an exclusive lock on the snapshot store).
>
> You mean one "master", right?  I thought you want the ability to have
> multiple csnap servers, which could handle misc requests, but only one
> 'master' - that is, only one actually handling client I/O requests and
> writing to disk.

I don't think I said anything about misc requests, that is a new one.  I did 
mention somewhere along the line that one requirement of starting a server 
in a failover path is that it not be loaded from disk.  So each agent 
capable of starting a server has to load it and let it initialize to the 
point of reserving its working memory, but not let it "start" where start 
means reading or writing the snapshot store.

> Anyway, rgmanager and/or a cluster lock can do the "one and only one
> master" bit, I think, as long as the csnap server is trustworthy.

Which aspect must we trust?

> >     - Server instance requests come from csnap agents, one per node.
> >       The reply to an instance request is always a server address and
> >       port, whether the server had to be instantiated or was already
> >       running.
>
> Ok, so the csnap agents get instance requests which tell the server
> port, and instantiate a master server if necessary?

"server port"?

Master server - there is no master server, since there is never more than 
one server running.

> (Um, why are we not using a floating IP address?)

You can if you want, but then you have to figure out how to make that fail 
over without using any memory, or get the whole mechanism into PF_MEMALLOC 
mode and do the necessary audit.

> If operating within the context of rgmanager (or another RM), it's
> probably a good idea to never have the csnap agent directly instantiate
> a master server as a result of a csnap client request

That's the plan: when an agent receives a request for a server from a device 
mapper target, it just passes it on up the chain, and stands by to 
instantiate a server if requested.  Sorry if I didn't make that clear.

> >     - When instantiated in a failover path, the local part of the
> >       failover path must restrict itself to bounded  memory use.
> >       Only a limited set of syscalls may be used in the entire
> >       failover path, and all must be known.  Accessing a host
> >       filesystem is pretty much out of the question, as is
> >       on-demand library or plugin loading.  If anything like this
> >       is required, it must be done at initialization time, not
> >       during failover.
>
> Were you referring to the case where the csnap master server has failed
> (but not anything else on the node, so the cluster is still fine) and it
> must first be cleaned up (via the csnap server agent) prior to being
> relocated.  However, there is such a low amount of memory available that
> we can't get enough juice to tell the csnap server agent to stop it?

No, this is the case where the csnap server has failed anywhere on the 
clusternetwork whether because of a node failure or otherwise, and a new 
server is to be instantiated on a node that has GFS mount on the cluster.  
We have to follow the guidelines above to prevent memory inversion.

> Hmm... <ponders>
>
> The start path is a bit less interesting; if we fail to become a master
> (for any reason), we try another node.  The reasons (low memory, lost
> disk connection, lunar surge) don't really matter.

Well, you don't want to destroy the node you just tried to instantiate a 
node on, so "try another if that doesn't work" is a little too cavalier.

> It's not 
> uninteresting, particularly if all of the nodes of the cluster are under
> huge memory strain (though, most server machines are never expected to
> operate continually at that capacity!).
>
> >     - If a snapshot client disconnects, the server needs to know if
> >       it is coming back or has left the cluster, so that it can
> >       decide whether to release the client's read locks.
>
> A membership change ought to be able to tell you this much.  If the
> csnap master server sees that a node is out of the cluster, that client
> isn't coming back (We hope?).

It's ok if it comes back, it just can't expect its read locks to still be 
around.  It should have been fenced from the shared disk before the 
membership event is generated, so there no risk its completed reads will 
have been overwritten due to the server prematurely throwing away the read 
locks and allowing another node to write on the previously locked data.

> >     - If a server fails over, the new incarnation needs to know
> >       that all snapshot clients of the former incarnation have
> >       either reconnected or left the cluster.
>
> If the clients are part of the csnap master server's state, perhaps that
> list of clients ought to be moved with the rest of the csnap data.

The rest of the csnap data is persistently recorded on disk.  That's one way 
to go about it all right, let me have a ponder.

> Storing the client list along with the csnap data on shared storage has
> the added benefit of surviving total cluster outage, not just a mere
> failover.  Not sure if this is interesting or not.

Not interesting: all IO will have completed and no read lock recovery is 
needed in that case.

Regards,

Daniel