[Linux-cluster] GFS2 fatal: invalid metadata block

Tue Oct 20 10:08:53 UTC 2009

Hi,

On Tue, 2009-10-20 at 02:22 -0700, Steven Dake wrote:
> On Tue, 2009-10-20 at 10:07 +0100, Steven Whitehouse wrote:
> > Hi,
> > 
> > On Mon, 2009-10-19 at 16:30 -0600, Kai Meyer wrote:
> > > Ok, so our lab test results have turned up some fun events.
> > > 
> > > Firstly, we were able to duplicate the invalid metadata block exactly 
> > > under the following circumstances:
> > > 
> > > We wanted to monkey with the VLan that fenced/openais ran on. We failed 
> > > miserably causing all three of my test nodes to believe that they became 
> > > lone islands in the cluster, unable to get enough votes themselves to 
> > > fence anybody. So we chose to simply power cycle the nodes with out 
> > > trying to gracefully leave the cluster or reboot (they are diskless 
> > > servers with NFS root filesystems so the GFS2 filesystem is the only 
> > > thing we were risking corruption with.) After the nodes came back 
> > > online, we began to see the same random reboots and filesystem withdraws 
> > > within 24 hours. The filesystem taht went into production that 
> > > eventually hit these errors was likely not reformatted just before 
> > > putting into production, and I believe it is highly likely that the last 
> > > format done on that production filesystem was done while we were still 
> > > doing testing. I hope that as we continue in our lab, we can reproduce 
> > > the same circumstances, and give you a step-by-step that will cause this 
> > > issue. It'll make me feel much better about our current GFS2 filesystem 
> > > that was created and unmounted cleanly by a single node, and then put 
> > > straight into production, and has been only mounted once by our current 
> > > production servers since it was formatted.
> > > 
> > That very interesting information. We are not there yet, but there are a
> > number of useful hints in that. Any further information you are able to
> > gather would be very interesting.
> > 
> > > Secondly, the way our VMs are doing I/O, we have found the cluster.conf 
> > > configuration settings:
> > > <dlm plock_ownership="1" plock_rate_limit="0"/>
> > > <gfs_controld plock_rate_limit="0"/>
> > > have lowered our %wa times from ~60% to ~30% utilization. I am curious 
> > > why the locking deamon is set to default to such a low number by default 
> > > (100). Adding these two parameters in the cluster.conf raised our locks 
> > > per second with the ping_pong binary from 93 to 3000+ in our 5 node 
> > > cluster. Our throughput doesn't seem to improve by either upping the 
> > > locking limit or setting up jumbo frames, but processes spend much less 
> > > time in I/O wait state than before (if my munin graphs are believable). 
> > > How likely is it that the low locking rate had a hand in causing the 
> > > filesystem withdraws and 'invalid metadata block' errors?
> > > 
> > I think there would be an argument for setting the default rate limit to
> > 0 (i.e. off) since we seem to spend so much time telling people to turn
> > off this particular feature. The reason that it was added is that under
> > certain circumstances it is possible to flood the network with plock
> > requests resulting in the blocking of openais traffic (so the cluster
> > thinks its been partitioned).
> > 
> > I've not seen or heard of any recent reports of this, though, but that
> > is the original reason the feature was added. Most applications tend to
> > be I/O bound rather than (fcntl) lock bound anyway, so that the chances
> > of it being a problem are fairly slim.
> > 
> 
> The reason the limiting was added was because the IPC system in original
> openais in fc6/rhel5.0 would disconnect heavy users of ipc connections,
> triggering a fencing operation of the node.  That problem has been
> resolved since 5.3.z (also f11+).
> 
In which case there would seem to be no argument about setting the
default to disabled now then.

> > Setting jumbo frames won't help as the issue is one of latency rather
> > than throughput (performance-wise). Using a low-latency interconnect in
> > the cluster should help fcntl lock performance though.
> > 
> 
> jumbo frames reduces latency AND increases throughput from origination
> to delivery for heavy message traffic.  For very light message traffic
> latency is increased but throughput is still improved.
> 
Yes, but in reality the traffic is unlikely to be very heavy in terms of
total bandwidth as there is a lot of "send message, wait for reply" type
traffic,

Steve.