[Linux-cluster] Re: Samba failover "impossible" due to missing cifs client reconnect?

Fri Sep 9 02:59:36 UTC 2005

Axel Thimm wrote:
> On Wed, Sep 07, 2005 at 04:12:52PM -0500, Christopher R. Hertel wrote:
> 
>>On Wed, Sep 07, 2005 at 10:51:16PM +0200, Axel Thimm wrote:
>>: :
>>
>>>>I just tested this.  On a W/XP box I browsed through some directories on a 
>>>>share served by Samba.  I then shut Samba down, and tried viewing some 
>>>>different subdirectories of the same share.  Windows coughed up an error 
>>>>dialog.  I then restarted Samba and Windows got happy again.  I could 
>>>>browse through all of the subdirectories in the share.
>>>
>>>Yes, that does work, but what I wanted to setup is a transparent
>>>failover, so that network I/O recovers w/o any manual interaction.
>>>
>>>I.e. I don't want to (soft) relocate the samba shares onto another
>>>node due to load ballancing considerations and generate user visible
>>>I/O errors and failures on a dozen clients.
>>
>>I guess I'm not really clear on what it is you're trying to accomplish.
>>Can you provide a little more description of what you'd like to see 
>>happen, and what kinds of environments you expect?
> 
> 
> A cifs client performs a largish copy operation. During that the share
> is relocated to a different node. The copy operations should stall
> during the relocation and resume after 10-20 seconds.

Okay, now I have a clearer picture.

> But if the cifs client does not perform a retry on smb/cifs protocol
> level (on TCP level it will get a RST, it's the next level protocol
> that needs to decide on retransmit the read/write request), then there
> is nothing you can do server-side.

Yep...

> Perhaps there are magic registry keys that can persuade Windows
> clients to do otherwise.

Not likely.

Others on the list have already done a better job than I at working this 
through.  I can only add that I am not aware of anything in the protocol
itself that would handle retransmission.

Two things to condsider:

- The core of SMB is quite old and was not written to run on top of TCP.
   SMB had to deal with a variety of transport semantics.

- SMB was designed, originally, as a request/response protocol (client
   sends a request, server responds).  In theory, the client could re-send
   the original request if the TCP connection drops and is restarted...but
   how does the SMB client know that the first request did or didn't
   succeed?  The server might have finished the operation as the connection
   failed.

The solution, in general, is for SMB to report a failure and let the user
decide how to handle it.  (Eg.  Try saving your MS-Word doc to a different
drive or something.)

Chris -)-----

-- 
"Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org