[Linux-cluster] Re: Samba failover "impossible" due to missing cifs client reconnect?

Thu Sep 8 18:18:55 UTC 2005

> A cifs client performs a largish copy operation. During that the
share
> is relocated to a different node. The copy operations should stall
> during the relocation and resume after 10-20 seconds.

Microsoft can't do this even with their own cluster server product and
CIFS client.

Recent versions of some applications like office have masked the
drive-letter reconnect internal to the application, but in general, any
client side open file handles are lost and have to be re-opened by the
client application (involving human intervention, e.g. save the file
again, or under the covers in a reconnect aware application). Consider
the problem for the client, after transport level reconnect to the
virtual IP address associated with the Samba service. Suppose the client
had an exclusive lock on a file. How can it be sure some other client
didn't gain the lock in the meantime? What should the application do
when it discovers the lock it once had on a connection is no longer
valid. The protocol and client side APIs weren't designed for dealing
with session level failover issues.

> Perhaps there are magic registry keys that can persuade Windows
> clients to do otherwise.

Fwiw, some (e.g. Novell) clients are designed to detect they've
connected to a clustered file server and optimize transport level
drive-letter reconnect (under the assumption the virtual IP will back
soon). Newer protocols like NFSv4 have provision for dealing with these
kinds of situations.

>>> Axel.Thimm at ATrpms.net 9/8/2005 1:15 am >>>
On Wed, Sep 07, 2005 at 04:12:52PM -0500, Christopher R. Hertel wrote:
> On Wed, Sep 07, 2005 at 10:51:16PM +0200, Axel Thimm wrote:
> : :
> > > I just tested this.  On a W/XP box I browsed through some
directories on a 
> > > share served by Samba.  I then shut Samba down, and tried viewing
some 
> > > different subdirectories of the same share.  Windows coughed up
an error 
> > > dialog.  I then restarted Samba and Windows got happy again.  I
could 
> > > browse through all of the subdirectories in the share.
> > 
> > Yes, that does work, but what I wanted to setup is a transparent
> > failover, so that network I/O recovers w/o any manual interaction.
> >
> > I.e. I don't want to (soft) relocate the samba shares onto another
> > node due to load ballancing considerations and generate user
visible
> > I/O errors and failures on a dozen clients.
> 
> I guess I'm not really clear on what it is you're trying to
accomplish.
> Can you provide a little more description of what you'd like to see 
> happen, and what kinds of environments you expect?

A cifs client performs a largish copy operation. During that the share
is relocated to a different node. The copy operations should stall
during the relocation and resume after 10-20 seconds.

But if the cifs client does not perform a retry on smb/cifs protocol
level (on TCP level it will get a RST, it's the next level protocol
that needs to decide on retransmit the read/write request), then there
is nothing you can do server-side.

Perhaps there are magic registry keys that can persuade Windows
clients to do otherwise.
-- 
Axel.Thimm at ATrpms.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050908/6684721f/attachment.htm>