[Linux-cluster] samba on top of GFS

Mon Nov 1 20:30:47 UTC 2004

I am running a cluster with GFS-formatted file systems mounted on multiple 
nodes.  What I was hoping to do was to set up one node running httpd to be 
my webserver and another node running samba to share the same data 
internally.
What I am getting when running that is instability.  The samba serving node 
keeps crashing.  I have heartbeat set up so that failover happens to the 
webserver node, at which point the system apparently behaves well.
After reading a few articles on the list it seemed to me that the problem 
might be samba using oplocks or some other caching mechanism that breaks 
synchronization.  I tried turning oplocks=off in my smb.conf file, but that 
made the system unusably slow (over 3 minutes to right-click on a two-meg 
file).
I am also not sure that is the extent of the problem, as I seem to be able 
to re-create the crash simply by accessing the same file on multiple 
clients just via samba (which locking should be able to handle).  If the 
problem were merely that the remote node and the samba node were both 
accessing an oplocked file I could understand, but that doesn't always seem 
to be the case.

has anyone had any success running the same type of setup?  I am also 
serving nfs on the samba server, though with very little load there.
below is the syslog output of a crash.  I'm running 2.6.8-1.521smp with a 
GFS CVS dump from mid-september.
-alan

  Code: 8b 03 0f 18 00 90 3b 5c 24 04 75 97 8b 04 24 5b 5e 5b 5e 5f
  <1>Unable to handle kernel paging request at virtual address 00100100
  printing eip:
f2ef1e8d
*pde = 00003001
Oops: 0000 [#3]
SMP
Modules linked in: udf nfsd exportfs lock_dlm(U) dlm(U) cman(U) gfs(U) lock_harness(U) nfs lockd sunrpc tg3 floppy sg microcode joydev dm_mod ohci_hcd ext3 jbd aacraid megaraid sd_mod scsi_mod
CPU:    0
EIP:    0060:[<f2ef1e8d>]    Not tainted
EFLAGS: 00010246   (2.6.8-1.521smp)
EIP is at query_lkb_queue+0x85/0x9b [dlm]
eax: ccf485d8   ebx: 00100100   ecx: 00000000   edx: 00000100
esi: 13012e48   edi: 00000000   ebp: 00000130   esp: 13012dc4
ds: 007b   es: 007b   ss: 0068
Process smbd (pid: 13049, threadinfo=13012000 task=7617b1f0)
Stack: 00000000 4543aad0 00000130 950670d8 13012e48 3644d458 f2ef209e 
13012e48
        00000000 00000000 f2ef133d 34326633 68478400 950670d8 00000137 000000d0
        ef239980 dea26800 13012e48 00000380 f2b79169 13012e48 f2b7905d be437380
Call Trace:
  [<f2ef209e>] query_locks+0x6f/0xad [dlm]
  [<f2ef133d>] dlm_query+0x155/0x238 [dlm]
  [<f2b79169>] get_conflict_global+0x104/0x2ae [lock_dlm]
  [<f2b7905d>] query_ast+0x0/0x8 [lock_dlm]
  [<0227c989>] release_sock+0xa5/0xab
  [<f2b794c2>] lm_dlm_plock_get+0xcb/0x10f [lock_dlm]
  [<f314b4e1>] do_plock+0xc2/0x171 [gfs]
  [<f314b5d4>] gfs_lock+0x44/0x52 [gfs]
  [<f314b590>] gfs_lock+0x0/0x52 [gfs]
  [<02170571>] fcntl_getlk64+0x75/0x12e
  [<02170841>] fcntl_setlk64+0x217/0x221
  [<0216c7e0>] sys_fcntl64+0x4d/0x7b