[Cluster-devel] [PATCH 2/4] mkfs.gfs2: Align resource groups to RAID stripes

Thu Jun 6 13:11:29 UTC 2013

Hi,

On Thu, 2013-06-06 at 08:57 -0400, Bob Peterson wrote:
> Hi,
> 
> | +			/* Squeeze the last 1 or 2 rgs into the remaining space */
> | +			if ((nextaddr < sdp->device.length) && (sdp->device.length - nextaddr >=
> | minrgsz)) {
> | +				rglen = sdp->device.length - nextaddr;
> | +			} else {
> | +				if (sdp->device.length - rgaddr <= maxrgsz)
> | +					rgt->length = sdp->device.length - rgaddr;
> | +				else
> | +					rgt->length = maxrgsz;
> | +				/* This is the last rg */
> | +				nextaddr = 0;
> 
> In GFS1, we allowed mix-and-match resource group sizes, but we originally
> designed mkfs.gfs2 to ensure that all rgrps were the same uniform size. This
> usually means some space is wasted at the end of the last resource group.
> 
> We did this primarily so that fsck.gfs2 could more easily detect and repair
> damaged resource groups and rindex values. At the time it was designed, I got
> the buy-in of a bunch of developers and we all agreed to it. Since that time,
> I've had to change fsck.gfs2 to take more drastic measures to repair damaged
> resource groups, due to the fact that gfs2_convert can convert a GFS1 file
> system to GFS2, and thus, we can still end up with non-uniform resource groups.
> Many customers were adding storage and doing multiple gfs_grow ops,
> which resulted in metadata sets where the rgrps and rindex were complete chaos.
> 
> Still, my assumption has always been: If the file system was made by
> mkfs.gfs2, all resource groups (but the first one) are identical in size.
> 
> I think gfs2_grow takes some steps to ensure that new rgrps are also created
> using the same size as the current resource groups. If we don't enforce
> that rule, the rindex could once again become chaos, which means our chances
> of rgrp and rindex repair get worse.
> 
> Do we still want to enforce this rule?
> 
> With the improved rgrp repair algorithms in fsck.gfs2, it may not be
> necessary anymore. I'm not trying to be dogmatic; I'm looking for opinions here.
> 
> Regards,
> 
> Bob Peterson
> Red Hat File Systems
> 

It has never been valid to assume that all the rgrps are the same size.
It may be useful as a hint, but we should not be relying on that being
true. Obviously it makes sense to try and keep them to an even spacing
where possible but we must allow for them to be placed and sized
independently as required for alignment, etc.

There are some restrictions on rgrp length as they need to be aligned
such as to give an integer number of bitmap bytes, and in fact it would
be good if that could be further enforced to ensure that all rgrp
bitmaps are an integer number of 64 bit words in length (so a multiple
of 32 blocks, excluding the headers). So depending on the various
restrictions, there may be a few unused blocks between rgrps in some
cases.

It would be worth abstracting the details about alignment from mkfs in
due course, so that fsck also has access to the same information about
where rgrps are likely to have been put, I think.

Some further thoughts:

 - Would it be useful to introduce a flag to show the source of an rgrp
(whether mkfs or gfs2_grow) ?
 - Would it be useful to add a creation date stamp to each rgrp so that
we can see when things have happened in the past?

There are probably spare fields we can use for this kind of thing,
without needing to breack backwards compatibility,

Steve.