[Linux-cluster] Re: gfs withdrawed in function xmote_bh with ret = 0x00000002

David Teigland teigland at redhat.com
Fri Jun 16 16:37:53 UTC 2006


On Fri, Jun 16, 2006 at 10:38:58PM +0800, ?????? wrote:
> Hi,all
> 
> I run the latest STABLE cluster code with 3 nodes,
> I get the message on one node after about 38 hours as:
> <--
> Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: fatal: assertion "FALSE" failed
> Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1:   function = xmote_bh
> Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1:   file = /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/gfs/glock.
> c, line = 1093
> Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1:   time = 1150408904
> Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: about to withdraw from the cluster
> Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: waiting for outstanding I/O
> Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: telling LM to withdraw
> Jun 16 06:01:48 nd04 kernel: lock_dlm: withdraw abandoned memory
> Jun 16 06:01:48 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: withdrawn
> Jun 16 06:01:48 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: ret = 0x00000002
> -->
> My test program has 'df', 'write', 'ls' and 'read'. 
> and each node connect to RAID controller's host port directly with FC.

Hi, I've attached a small patch to print more information and call BUG
instead of withdrawing.  It may also be helpful to see a dlm lock dump and
a gfs_tool lockdump on the machine after you hit the BUG.

Thanks,
Dave

-------------- next part --------------
--- ./glock.c.orig	2006-06-16 11:17:48.313980418 -0500
+++ ./glock.c	2006-06-16 11:31:20.617855661 -0500
@@ -30,6 +30,9 @@
 #include "quota.h"
 #include "recovery.h"
 
+int dump_glock(struct gfs_glock *gl, char *buf, unsigned int size,
+	       unsigned int *count)
+
 /*  Must be kept in sync with the beginning of struct gfs_glock  */
 struct glock_plug {
 	struct list_head gl_list;
@@ -1090,9 +1093,15 @@
 		spin_unlock(&gl->gl_spin);
 
 	} else {
-		if (gfs_assert_withdraw(sdp, FALSE) == -1)
-			printk("GFS: fsid=%s: ret = 0x%.8X\n",
-			       sdp->sd_fsname, ret);
+		char *buf;
+		int junk;
+		printk("GFS: fsid=%s: ret = 0x%.8X prev_state = %d\n",
+		       sdp->sd_fsname, ret, prev_state);
+		buf = kmalloc(4096);
+		memset(buf, 0, sizeof(buf));
+		dump_glock(gl, buf, 4096, &junk);
+		printk("%s\n", buf);
+		BUG();
 	}
 
 	if (glops->go_xmote_bh)


More information about the Linux-cluster mailing list