[Cluster-devel] [PATCH v2] GFS2: Add a next-resource-group pointer to resource groups

Thu Feb 2 14:57:40 UTC 2017

So it's about time I revived this patch and got it finalised...

On 12/01/16 15:50, Bob Peterson wrote:
> ----- Original Message -----
>> Add a new rg_skip field to struct gfs2_rgrp, replacing __pad. The
>> rg_skip field has the following meaning:
>>
>> - If rg_skip is zero, it is considered unset and not useful.
>> - If rg_skip is non-zero, its value will be the number of blocks between
>>   this rgrp's address and the next rgrp's address. This can be used as a
>>   hint by fsck.gfs2 when rebuilding a bad rindex, for example.
>>
>> When gfs2_rgrp_bh_get() reads a resource group header and finds rg_skip
>> to be 0 it will attempt to set it to the difference between its rd_addr
>> and the rd_addr of the next resource group.
>>
>> The only special case is the final rgrp, which always has a rg_skip of
>> 0. It is not set to a special value (like -1) because, when the
>> filesystem is grown, the rgrp will no longer be the final one and it
>> will then need to have its rg_skip field set. The overhead of this
>> special case is a gfs2_rgrpd_get_next() call each time
>> gfs2_rgrp_bh_get() is called for the final resource group.
>>
>> For the other resource groups, if the rg_skip field is 0, it is set
>> appropriately and then the only overhead becomes the rgd->rg_skip == 0
>> comparison in gfs2_rgrp_bh_get().
>>
>> Before this patch, gfs2_rgrp_out() zeroes the __pad field explicitly, so
>> the rg_skip field can get set back to 0 in cases where nodes with and
>> without this patch are mixed in a cluster. In some cases, the field may
>> bounce between being set by one node and then zeroed by another which
>> may harm performance slightly, e.g. when two nodes create many small
>> files. In testing this situation is rare but it becomes more likely as
>> the filesystem fills up and there are fewer resource groups to choose
>> from. The problem goes away when all nodes are running with this patch.
>> Dipping into the space currently occupied by the rg_reserved field would
>> have resulted in the same problem as it is also explicitly zeroed, so
>> unfortunately there is no other way around it.
>>
>> Signed-off-by: Andrew Price <anprice at redhat.com>
>
> Hi Andy,
>
> I've been talking about doing something like this for years, so it's
> good to see someone finally acting on it.
>
> Although this is a good first stab at the solution, my main concern about
> this implementation is that, AFAICT, it doesn't take read-only mounts into
> account. In fact, a "spectator" mount might even cause it to BUG_ON from
> gfs2_trans_begin, since there's no journal. But it's close.

I've been testing but I haven't found a way to trigger that BUG_ON yet. 
Are there any syscalls I can exercise that make gfs2 read in the rgrp 
headers in spectator mode? I'm guessing the reason I haven't been able 
to trigger it is because they're just doing path lookups and don't need 
to look up allocation states.

Andy