[Linux-cluster] Re: Re: GFS/GFS2 problems with iozone

Mon May 4 21:37:51 UTC 2009

Date: Mon, 4 May 2009 11:05:20 -0400 (EDT)
> From: Bob Peterson <rpeterso at redhat.com>
> Subject: Re: [Linux-cluster] GFS/GFS2 problems with iozone
> To: linux clustering <linux-cluster at redhat.com>
> Message-ID:
> 	<494270654.5591241449520328.JavaMail.root at zmail06.collab.prod.int.phx2.redhat.com>
> 	
> Content-Type: text/plain; charset=utf-8
>
> ----- "Michael O'Sullivan" <michael.osullivan at auckland.ac.nz> wrote:
> | Hi everyone,
> | 
> | I am having some problems testing a GFS system using iozone. I am 
> | running CentOS 2.6.18-128.1.6.el5 and have a two node cluster with a
> | GFS 
> | installed on a shared iSCSI target. The GFS sits on top of a 1.79TB 
> | clustered logical volume and can be mounted successfully on both
> | cluster 
> | nodes.
> | 
> | When using iozone to test performance everything goes smoothly until I
> | 
> | get to a file size of 2GB and a record length of 2048. Then iozone
> | exits 
> | with the error
> | 
> | Error fwriting block 250, fd= 7
> | 
> | and (as far as I can tell) the GFS becomes corrupted
> | 
> | fatal: invalid metadata block
> | bh = 12912396 (magic)
> | function = gfs_get_meta_buffer
> | file =
> | /builddir/build/BUILD/gfs-kmod-0.1.31/_kmod_build_/src/gfs/dio.c, 
> | line = 1225
> | 
> | Can anyone shed some light on what is happening?
> | 
> | Kind regards, Mike O'S
>
> Hi Mike,
>
> Are you running iozone on a single node or both simultaneously?
> If it's running on two nodes, please make sure that both nodes have
> the iSCSI target mounted with lock_dlm protocol (not lock_nolock).
> Also, we need to make sure that they're not trying to use the same
> files in the file system because I think iozone is not cluster-aware.
> But even so, the file system should not be corrupted unless one of
> the nodes is using lock_nolock protocol, or if other boxes are
> using the iSCSI target without the knowledge of GFS.
>
> We regularly run iozone here, in single-node performance trials, and
> we have never seen this kind of problem.
>
> Also, you didn't specify what version of the kmod-gfs package you have
> installed.  I've fixed at least one bug that might account for it,
> depending on what version of kmod-gfs you're running.
>
> I'm not aware of any other problems in the GFS kernel code that can
> account for this kind of corruption, except for possibly this one:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=491369
>
> (A gfs bug that really goes well beyond the nfs usage described in the bug).
> You can find the patch in the attachments, although I won't guarantee
> it'll solve your problem.  There's a slight chance though.
> My apologies if you don't have permission to see the bug; that sometimes
> happens and it's out of my control.  I can, however, post the patch
> if needed.
>
> If iozone is being run on a single node, this might be a new bug.  If you can
> still recreate the problem with that patch in place, or if you don't want
> to try the patch for some reason, perhaps you should open up a bugzilla
> record and we'll investigate the problem.  If we can reproduce it, we'll
> figure it out and fix it.
>
> Regards,
>
> Bob Peterson
> Red Hat GFS
>   
Hi Bob,

I have changed back to GFS2 (as I realised this is now production ready, 
is that correct?), but I am still having similar problems. I am running 
iozone on a single node and accessing the mount point of GFS2 running 
with lock_dlm. Note that the GFS2 is created on a multipathed device 
created via iSCSI/DRBD. However, I run the following commands:

gfs2_fsck # which shows no errors on either node

mount -t gfs2 /dev/iscsi_mirror/lvol0 /mnt/iscsi_mirror/ #mounts the 
file system (on top of iSCSI/DRBD) on both nodes

/usr/src/ioszone3_321/src/current/iozone -Ra -g 4G -f 
/mnt/iscsi_mirror/test # Only on node 1

This gets to 1048576 KB and reclen 256 before giving

Error reading block 1018 b6e00000

I can fix the GFS2 using gfs2_fsck (it fixes some dirty journals, but no 
other changes). I don't have the error messages from this latest test as 
I ran it over the weekend and /var/log/messages doesn't have the error 
messages anymore. I can recreate this test and record the error messages 
if necessary, but I wonder if the patch you talked about also exists for 
GFS2?

Thanks very much for your help, Mike