[Linux-cluster] GFS2 fatal: invalid metadata block

Steven Whitehouse swhiteho at redhat.com
Tue Oct 20 09:07:50 UTC 2009


Hi,

On Mon, 2009-10-19 at 16:30 -0600, Kai Meyer wrote:
> Ok, so our lab test results have turned up some fun events.
> 
> Firstly, we were able to duplicate the invalid metadata block exactly 
> under the following circumstances:
> 
> We wanted to monkey with the VLan that fenced/openais ran on. We failed 
> miserably causing all three of my test nodes to believe that they became 
> lone islands in the cluster, unable to get enough votes themselves to 
> fence anybody. So we chose to simply power cycle the nodes with out 
> trying to gracefully leave the cluster or reboot (they are diskless 
> servers with NFS root filesystems so the GFS2 filesystem is the only 
> thing we were risking corruption with.) After the nodes came back 
> online, we began to see the same random reboots and filesystem withdraws 
> within 24 hours. The filesystem taht went into production that 
> eventually hit these errors was likely not reformatted just before 
> putting into production, and I believe it is highly likely that the last 
> format done on that production filesystem was done while we were still 
> doing testing. I hope that as we continue in our lab, we can reproduce 
> the same circumstances, and give you a step-by-step that will cause this 
> issue. It'll make me feel much better about our current GFS2 filesystem 
> that was created and unmounted cleanly by a single node, and then put 
> straight into production, and has been only mounted once by our current 
> production servers since it was formatted.
> 
That very interesting information. We are not there yet, but there are a
number of useful hints in that. Any further information you are able to
gather would be very interesting.

> Secondly, the way our VMs are doing I/O, we have found the cluster.conf 
> configuration settings:
> <dlm plock_ownership="1" plock_rate_limit="0"/>
> <gfs_controld plock_rate_limit="0"/>
> have lowered our %wa times from ~60% to ~30% utilization. I am curious 
> why the locking deamon is set to default to such a low number by default 
> (100). Adding these two parameters in the cluster.conf raised our locks 
> per second with the ping_pong binary from 93 to 3000+ in our 5 node 
> cluster. Our throughput doesn't seem to improve by either upping the 
> locking limit or setting up jumbo frames, but processes spend much less 
> time in I/O wait state than before (if my munin graphs are believable). 
> How likely is it that the low locking rate had a hand in causing the 
> filesystem withdraws and 'invalid metadata block' errors?
> 
I think there would be an argument for setting the default rate limit to
0 (i.e. off) since we seem to spend so much time telling people to turn
off this particular feature. The reason that it was added is that under
certain circumstances it is possible to flood the network with plock
requests resulting in the blocking of openais traffic (so the cluster
thinks its been partitioned).

I've not seen or heard of any recent reports of this, though, but that
is the original reason the feature was added. Most applications tend to
be I/O bound rather than (fcntl) lock bound anyway, so that the chances
of it being a problem are fairly slim.

Setting jumbo frames won't help as the issue is one of latency rather
than throughput (performance-wise). Using a low-latency interconnect in
the cluster should help fcntl lock performance though.

The locking rate should have no bearing on the filesystem itself. The
locking (fcntl only this refers to, btw) is performed in userspace by
dlm_controld (gfs_controld on older clusters) and merely passed through
the filesystem. The fcntl code is identical between gfs1 and gfs2.

> I'm still not completely confident I won't see this happen again on my 
> production servers. I'm hoping you can help me with that.
> 
Yes, so am I :-) It sounds like we are making progress if we can reduce
the search space for the problem and it sounds very much from your
message as if you believe that it is a recovery issue, and it sounds
plausible to me,





More information about the Linux-cluster mailing list