[Linux-cluster] dlm spinlock BUG
Jens Beyer
jbe at webde.de
Thu Apr 19 08:04:17 UTC 2007
Hi,
On Wed, Apr 18, 2007 at 04:04:13PM +0100, Patrick Caulfield wrote:
> Jens Beyer wrote:
> >
> > I am using a vanilla 2.6.20.6 (same with 2.6.20.x).
> >
>
> Hmm, I'm not sure how that got left unfixed upstream
>
> Here's the patch:
>
the Patch did fix one spinlock BUG; now I get an otherone:
[ 315.040936] BUG: spinlock already unlocked on CPU#1, dlm_recvd/14593
[ 315.040949] lock: ee108f64, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1
[ 315.040964] [<c01d62ac>] _raw_spin_unlock+0x70/0x72
[ 315.040976] [<f0b63f09>] dlm_lowcomms_commit_buffer+0x2f/0x9a [dlm]
[ 315.040998] [<f0b5fb67>] send_rcom+0xa/0x12 [dlm]
...
which seems to be fixed in 2.6.21-rc6 from where I got
--- fs/dlm/lowcomms-tcp.c.orig 2007-04-19 09:42:53.000000000 +0200
+++ fs/dlm/lowcomms-tcp.c 2007-04-19 09:43:23.000000000 +0200
@@ -748,6 +748,7 @@
struct connection *con = e->con;
int users;
+ spin_lock(&con->writequeue_lock);
users = --e->users;
if (users)
goto out;
But now it hangs during mount:
boxfe01:/home/jbe # mount -t gfs2 -v /dev/sdd1 /export/vol1
/sbin/mount.gfs2: mount /dev/sdd1 /export/vol1
/sbin/mount.gfs2: parse_opts: opts = "rw"
/sbin/mount.gfs2: clear flag 1 for "rw", flags = 0
/sbin/mount.gfs2: parse_opts: flags = 0
/sbin/mount.gfs2: parse_opts: extra = ""
/sbin/mount.gfs2: parse_opts: hostdata = ""
/sbin/mount.gfs2: parse_opts: lockproto = ""
/sbin/mount.gfs2: parse_opts: locktable = ""
/sbin/mount.gfs2: message to gfs_controld: asking to join mountgroup:
/sbin/mount.gfs2: write "join /export/vol1 gfs2 lock_dlm boxfe:clustervol1 rw /dev/sdd1"
/sbin/mount.gfs2: message from gfs_controld: response to join request:
/sbin/mount.gfs2: lock_dlm_join: read "0"
/sbin/mount.gfs2: message from gfs_controld: mount options:
/sbin/mount.gfs2: lock_dlm_join: read "hostdata=jid=1:id=262146:first=0"
/sbin/mount.gfs2: lock_dlm_join: hostdata: "hostdata=jid=1:id=262146:first=0"
/sbin/mount.gfs2: lock_dlm_join: extra_plus: "hostdata=jid=1:id=262146:first=0"
boxfe01:/home/jbe # dmesg | tail -15
[ 137.276428] GFS2 (built Apr 19 2007 09:15:21) installed
[ 137.285199] Lock_DLM (built Apr 19 2007 09:15:33) installed
[ 149.628806] drbd1: role( Secondary -> Primary )
[ 149.628827] drbd1: Writing meta data super block now.
[ 156.324500] GFS2: fsid=: Trying to join cluster "lock_dlm", "boxfe:clustervol1"
[ 156.397920] dlm: got connection from 2
[ 156.399738] dlm: clustervol1: recover 1
[ 156.399792] dlm: clustervol1: add member 2
[ 156.399796] dlm: clustervol1: add member 1
[ 156.400514] dlm: clustervol1: config mismatch: 32,0 nodeid 2: 11,0
[ 156.400519] dlm: clustervol1: ping_members aborted -22 last nodeid 2
[ 156.400523] dlm: clustervol1: total members 2 error -22
[ 156.400526] dlm: clustervol1: recover_members failed -22
[ 156.400529] dlm: clustervol1: recover 1 error -22
[ 156.404760] GFS2: fsid=boxfe:clustervol1.1: Joined cluster. Now mounting FS...
Regards,
Jens
More information about the Linux-cluster
mailing list