[Linux-cluster] Re: patches.

Tue Nov 16 20:24:18 UTC 2004

> > > > Once another client with the same id logs in, all
> > > > the zombie's locks are dropped, and it is finally cleaned up.
> > >
> > > This reconnecting client is the owner of the locks, I hope you
> > > mean, and it will soon upload a replacement, possibly smaller, set
> > > of locks.
> >
> > When the new client uploaded its locks, it already got the locks it
> > needed, so both the new client and the zombie client were on the
> > holdlist. When the new client is finished logging in, the zombie
> > client gets removed from the holdlist of all it's locks.  That's what
> > I meant.
> 
> So just to confirm, are the new client and zombie client actually the 
> same client, just reconnecting?

Whenever a client disconnects while holding locks, it becomes a zombie.
When a new snapshot client completes logging in it will already have
acquired all the locks it needs.  Once it finishes logging in, it checks
the zombie clients.  If there is a zombie client with the same id, it
removes the zombie client.

So in the case were a client disconnects from the server while holding locks,
and reconnects to the same server, there will be a zombie, and it will be
from the client's previous connection.

If a client looses connection with a server and reconnects to a new server,
(because presumably, the old server went down), there will not be any zombie
clients.

> > > > Appropriate Zombies are also cleaned up when the server gets a
> > > > REMOVE_CLIENT_IDS request.
> > >
> > > This would instead happen when the agent logs the client out.
> >
> > Um... I'm missing something here.  Say the other machine crashes
> > while one of it's clients is hold locks. Then the client becomes a
> > zombie, but there is no agent to remove it.  To solve this, the agent
> > on the master server node would not just need to keep a list of
> > client to wait for, but would have to always keep a list of all
> > clients that are connected to the server. Then it could simply log
> > the clients out for the defunct agent.  This would also necessitate
> > that the server contacts the its agent after each client is
> > completely logged out.  This method would work. And I can do this if
> > that's what you want. But something needs to be done for this case.
> 
> Good observation, the master agent has to keep its list of snapshot 
> clients around forever, all right.  Also, if an agent has to log out a 
> disconnected client, it's cleaner to forward the event to the master 
> agent and have the master agent do it, otherwise there is a risk of 
> logouts arriving at the server from two different places (not actually 
> destructive, but untidy).  The non-master agent has to forward the 
> client disconnection anyway, in order to implement recovery.  Does this 
> make sense to you?

Yes.

> If it's done that way, then the server doesn't have to tell its agent 
> about logouts.

This part I don't get. What you said is true, but it's got some odd
implications.  One, the list that the agent stores will contain every
client that ever has connected to the server from each node. And Two,
if another agent dies, the master's agent will send log offs for all of
these clients, regardless of whether or not they are currently logged in.
There is one annoying issue with this.  It doesn't let us preallocate
a fixed size client list.  If the agent only keeps track of the clients that
are actually logged into the master server, The agent just needs a list the same
size as the servers client list, which is currently a static size.  If the agent
doesn't get logout messages, the list could very well be bigger that the
maximum number of clients that could be logged into the master.  That's my
only issue.

> > > A P.S. here, I just looked over the agent list rebuilding code, and
> > > my race detector is beeping full on.  I'll have a ponder before
> > > offering specifics though.
> >
> > That's a question I have.  The ast() function gets called by
> > dlm_dispatch(). right? If so, I don't see the race. If not, there is
> > one hell of dangerous race.  If the agent_list is changing when agent
> > is trying to contact the other agents, bad things will most likely
> > happen.
> 
> If the list changes and we don't know about the changes while waiting to 
> get answers back from other agents, we're dead in the water.  So the 
> recovery algorithm must handle membership changes that happen in 
> parallel.  After much pondering, I think I've got a reasonably simple 
> algorithm, I'll write it up now.

Um... but since we wait for agent responses in the same poll loop that we wait
for membership change notifications, these two things already do happen in
parallel... well... mostly. The only issue I see is that we could get the event
from magma, and then block trying to get the member_list.  But since
that's a local call, if that's hangs forever, then cman is in trouble,
and there isn't much we can do anyway. But there is no chance of not getting
a membership change because we are waiting on a agent response.

> Well, we have for sure gotten to the interesting part of this, how about 
> we continue in linux-cluster?
>

Sure. But I'm not sure if anyone else is interested in implementation details.

> Regards,
> 
> Daniel

-Ben