[Linux-cluster] gfs2, kvm setup

Thu Jun 26 18:35:29 UTC 2008

On Thu, Jun 26, 2008 at 10:27:33AM -0500, David Teigland wrote:
> On Wed, Jun 25, 2008 at 06:45:44PM -0400, J. Bruce Fields wrote:
> > I'm trying to get a gfs2 file system running on some kvm hosts, using an
> > ordinary qemu disk for the shared storage (is there any reason this
> > can't work?).
> > 
> > I installed openais80.3 from source (after modifying Makefile so "make
> > install" would install to /), and installed gfs2 from the STABLE2 branch
> > of git://sources.redhat.com/git/cluster.git, plus this patch:
> > 
> > 	https://www.redhat.com/archives/cluster-devel/2008-April/msg00143.html
> > 
> > (with conflict in write_result() resolved in the obvious way).  The
> > kernel is from recent git: 2.6.26-rc4-00103-g1beee8d.  I created a
> > minimal cluster.conf and did a mkfs -tgfs2 following doc/usage.txt, then
> > did the startup steps from usage.txt by hand.  Everything works up to
> > the mount, at which point the first host gets the following lock bug in
> > the logs.  Other mounts fail or hang.
> 
> I don't know why other mounts fail or hang, but it's not related to this:

OK, darn I was hoping it was that simple.

> > Jun 25 18:31:01 piglet1 kernel: =====================================
> > Jun 25 18:31:01 piglet1 kernel: [ BUG: bad unlock balance detected! ]
> > Jun 25 18:31:01 piglet1 kernel: -------------------------------------
> > Jun 25 18:31:01 piglet1 kernel: dlm_recoverd/3061 is trying to release lock (&ls->ls_in_recovery) at:
> > Jun 25 18:31:01 piglet1 kernel: [<c01c3930>] dlm_recoverd+0x440/0x510
> > Jun 25 18:31:01 piglet1 kernel: but there are no more locks to release!
> > Jun 25 18:31:01 piglet1 kernel: 
> > Jun 25 18:31:01 piglet1 kernel: other info that might help us debug this:
> > Jun 25 18:31:01 piglet1 kernel: 3 locks held by dlm_recoverd/3061:
> > Jun 25 18:31:01 piglet1 kernel:  #0:  (&ls->ls_recoverd_active){--..}, at: [<c01c35c5>] dlm_recoverd+0xd5/0x510
> > Jun 25 18:31:01 piglet1 kernel:  #1:  (&ls->ls_recv_active){--..}, at: [<c01c38c5>] dlm_recoverd+0x3d5/0x510
> > Jun 25 18:31:01 piglet1 kernel:  #2:  (&ls->ls_recover_lock){--..}, at: [<c01c38cd>] dlm_recoverd+0x3dd/0x510
> > Jun 25 18:31:01 piglet1 kernel: 
> > Jun 25 18:31:01 piglet1 kernel: stack backtrace:
> > Jun 25 18:31:01 piglet1 kernel: Pid: 3061, comm: dlm_recoverd Not tainted 2.6.26-rc4-00103-g1beee8d #38
> > Jun 25 18:31:01 piglet1 kernel:  [<c0137bb9>] print_unlock_inbalance_bug+0xc9/0xf0
> 
> This is actually a false warning that's triggered by different threads
> doing the up and down.  To remove this we'd need down_write_non_owner() /
> up_write_non_owner() to parallel the "read" variants in rwsem.h.

Thanks for the explanation.

> 
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0, already locked for use
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0: Looking at journal...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=0: Done
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Trying to acquire journal lock...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Looking at journal...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=1: Done
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Trying to acquire journal lock...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Looking at journal...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=2: Done
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Trying to acquire journal lock...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Looking at journal...
> > Jun 25 18:31:01 piglet1 kernel: GFS2: fsid=piglet:test.0: jid=3: Done
> 
> This mount appears to have been successful.  Usual things to collect for
> debugging the other problems:
> - any errors in /var/log/messages from all nodes
> - cman_tool nodes; cman_tool status from all nodes
> - group_tool -v from all nodes

Thanks, I'll see what more information I can collect.

--b.

(PS: Can I get cc'd?  I filter mailing list traffic to a different
folder that I don't look at as often....)