[Linux-cluster] GFS1: node get withdrawn intermittent
David Teigland
teigland at redhat.com
Thu Feb 8 18:16:12 UTC 2007
On Thu, Feb 08, 2007 at 10:02:50AM -0800, Sridharan Ramaswamy (srramasw) wrote:
> Interesting. While testing GFS with low jounrnal size and ResourceGroup
> size, I hit the same issue,
>
>
> Feb 7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: fatal: assertion "x
> <= length" failed
> Feb 7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: function =
> blkalloc_internal
> Feb 7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: file =
> /download/gfs/cluster.cvs-rhel4/gfs-kernel/src/gfs/rgrp.c, line = 1458
> Feb 7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: time = 1170896502
> Feb 7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: about to withdraw
> from the cluster
> Feb 7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: waiting for
> outstanding I/O
> Feb 7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: telling LM to
> withdraw
>
>
> This happened on a 3 node GFS over 512M device.
>
> $ gfs_mkfs -t cisco:gfs2 -p lock_dlm -j 3 -J 8 -r 16 -X /dev/hda12
>
> I was using bonnie++ to create about 10K files of 1K each from each of 3
> nodes simulataneous.
>
> Look at the code in rgrp.c it seems related to failure to find a
> particular resource group block. Could this be due to a very low RG size
> I'm using (16M) ??
This is bz 215793 which has been around for quite a while and has been
very difficult for us to reproduce. Perhaps using a smaller rg size is a
way to reproduce the bug more easily.
Dave
More information about the Linux-cluster
mailing list