[Linux-cluster] Interfacing csnap to cluster stack
Daniel Phillips
phillips at redhat.com
Fri Oct 8 04:59:55 UTC 2004
On Thursday 07 October 2004 23:56, David Teigland wrote:
> On Thu, Oct 07, 2004 at 03:35:47PM -0400, Daniel Phillips wrote:
> > The executive summary of your post is "my pristine, perfect service
> > manager is for symmetric systems only and keep yer steenking
> > client-server mitts away from it."
>
> Cute characterization, but false. To quote the relevant point:
>
> "- I think it's possible that a client-server-based csnap system
> could be managed by SM (directly) if made to look and operate more
> symmetrically. This would eliminate RM from the picture."
>
> I reiterated this in the next point and have said it before. In
> fact, I think this sort of design, if done properly, could be quite
> nice. I'm not lobbying for one particular way of solving this
> problem, though.
If you think only of csnap agents and forget for the moment about device
mapper targets and servers, the agents seem to match the service group
model quite well. There is one per node, and each provides the service
"able to launch a csnap server". The recovery framework seems useful
for ensuring that a server is never launched on a node that has left
the cluster. How to choose a good candidate node is still an open
question, but starting Lon's "cute" proposal to use gdlm to both choose
a candidate and ensure that the server is unique will certainly get
something working. In the long run, taking an EX lock on the snapshot
store seems like a very good thing for a server to do. This gets the
resource manager off the critical (development) path.
Besides the server instantiation question, there is another problem that
needs solving: when a snapshot server fails over to a new server, the
new server must be sure that every client that was connected to the old
server has either reconnected to the new server or left the cluster.
Csnap clients don't map directly onto nodes, so cnxman can't directly
track the csnap client list, however it can provide membership change
events that the server (or alternatively, agents) can use to maintain
the list of currently connected clients. (The server doesn't need help
adding new clients to the list, but it needs to be told when a node has
left the cluster, so it can strike the clients belonging to that node
off the list, and disconnect them for good measure. It could also
refuse connections from clients not on cluster nodes.)
Since the list of clients isn't large and doesn't change very fast, the
server can reasonably require every csnap agent to replicate it. So
when a server fails over, it can retrieve the list from the first agent
that reconnects, and thus be able to tell when it is safe to continue
servicing requests.
Regards,
Daniel
More information about the Linux-cluster
mailing list