[Linux-cluster] Re: Csnap instantiation and failover using libdlm
Daniel Phillips
phillips at redhat.com
Fri Oct 22 00:27:46 UTC 2004
On Thursday 21 October 2004 17:56, Benjamin Marzinski wrote:
> Um.. I just realized that there's a problem here.
> If the agent dies but the server doesn't, the lock will get revoked.
> While this won't interfere with the clients currently connected to
> the server, any new client (or client that gets disconnected) will
> think that there is no server, and promote it's server to master....
> and data corruption will follow.
>
> As far as I can tell, the way to ensure that this doesn't happen is
> to have the server process take out the lock. That way the lock won't
> be freed unless the server process dies. Agreed?
No, the way to ensure this is to have the server die if its control
socket goes away.
However, you have pointed out why it's bad for the new server to rely
only on the lock to decide when its safe to start processing requests,
or even to recover the journal: there may still be writes in flight
from the old server. If a server dies but its node is still in the
cluster, the new server's agent has to regard that as a valid reason
for fencing the node. This can only be handled properly at the
membership level, not at the lock level.
> If that's the case, should the server also be responsible for
> contacting the agents in the appropriate service group and getting
> the client information?
It's not the case, so we don't have to worry about it.
The only interesting argument I know of for moving infrastructure
details into the server is to get rid of one daemon, but daemons are
cheap, particularly if they sleep nearly all the time like the agent
does. It's better to keep the agent and daemon separate and
specialized for the time being.
Regards,
Daniel
More information about the Linux-cluster
mailing list