[Linux-cluster] GFS and samba problem, again

Wed Oct 11 15:33:42 UTC 2006

Hi Abhi,

 >I'm in the process of gathering a few windows boxes to run your 
test. >I should hopefully have 4 windows clients tomorrow.

Tell me something about the results.

 >A warning first up, I'd recommend that you *not* use GFS2 and the 
 >latest cluster suite for your tests just yet. With constant 
 >development going on, some components are unstable and more problems 
 >is not what you need right now :-) . The RHEL4 tag in CVS has stable 
 >code from the most recent release. I'd suggest you compile gfs and e 
 >cluster suite from that CVS branch.

Yes, I've been installing GFS2 in our test environment, and it seems 
very experimental for production use. I'm almost finishing 
installation, so I'll try it with samba to see how it works.

 >I'm running a 3-node x86 cluster with RHEL4. The cluster suite and 
 >gfs are from the RHEL4 branch of CVS along with some innocuous 
 >patches. The samba version is 3.0.10-1.4E.2. I'm using an smb.conf 
 >almost identical to the one you posted in your previous mail. I 
don't >have any other kernel/samba locking settings that I'm aware of.

Ok, I'm going to try with the CVS version you're proposing.

 >You did mention in an email few weeks ago that you were trying to 
 >export the same GFS mount over multiple samba servers on multiple 
 >nodes simultaneously (active-active samba). I'm guessing you 
achieved >this by setting the locking and pid directories of samba to 
be on the >shared gfs filesystem. (This is a wrong approach and 
doesn't work. >There's a lot of debate on this in the samba and 
samba-technical list >archives are samba.org). I'm wondering if you 
still have these >directories on the GFS filesystem, which could 
possibly be causing >your hang?

Well, this was one of the unsuccessful test I did, but now I have 
samba in ext3 filesystem (locking and pids included). A few days ago, 
in samba-technical list, a proposal for a clustered samba was made. 
Details are in the following document: 
http://wiki.samba.org/index.php/Samba_%26_Clustering if you're 
interested in.
Of course, it's a proposal and I guess It won't be opperative soon.

 >Also, do you see anything unusual in /var/log/messages on the GFS 
 >node when this hang occurs? I'm interested in any 
 >kernel-panic/assertion failures in GFS that might indicate some 
 >problem.

I don't see nothing abnormal in GFS logs when samba hangs occur, but I 
made strace of smbd and I saw a lot of call systems that were 
unfinished until samba is restarted.

4665  11:09:31.242316 <... geteuid32 resumed> ) = 503 <0.000118>
4665  11:09:31.242405 write(19, "close fd=22 fnum=6371 (numopen=2"..., 
34) = 34 <0.000031>
4665  11:09:31.242572 nanosleep({0, 2000001},  <unfinished ...>
4667  11:09:31.245063 kill(4665, SIG_0) = 0 <0.000018>
4665  11:09:31.248047 <... nanosleep resumed> NULL) = 0 <0.005406>
4665  11:09:31.249355 nanosleep({0, 2000001}, NULL) = 0 <0.002621>
4665  11:09:31.252091 nanosleep({0, 2000001}, NULL) = 0 <0.003853>
4665  11:09:31.256088 nanosleep({0, 2000001}, NULL) = 0 <0.003906>
.................. a lot of nanosleeps ..............................
4665  11:10:04.887037 nanosleep({0, 2000001},  <unfinished ...>
4665  11:10:04.887219 <... nanosleep resumed> 0) = ? 
ERESTART_RESTARTBLOCK (To be restarted) <0.000111>
4665  11:10:04.888197 +++ killed by SIGKILL +++
4667  11:10:04.890712 kill(4665, SIG_0 <unfinished ...>
4666  11:10:04.920965 kill(4665, SIG_0) = -1 ESRCH (No such process) 
<0.000017>
4667  11:10:04.934486 kill(4665, SIG_0 <unfinished ...>

Many Thanks,

		Sandra Hernández