[Cluster-devel] [PATCH] gfs2: ignore rindex_update failure in dinode_dealloc

Fri May 5 12:29:38 UTC 2023

Hi Andy,

On 5/5/23 3:44 AM, Andrew Price wrote:
> Hi Bob,
> 
> On 04/05/2023 18:43, Bob Peterson wrote:
>> Before this patch function gfs2_dinode_dealloc would abort if it got a
>> bad return code from gfs2_rindex_update. The problem is that it left the
>> dinode in the unlinked (not free) state, which meant subsequent fsck
>> would clean it up and flag an error. That meant some of our QE tests
>> would fail.
> 
> As I understand it the test is an interrupted rename loop workload and 
> gfs2_grow at the same time, and the bad return code is -EINTR, right?

Correct.

>> The sole purpose of gfs2_rindex_update, in this code path, is to read in
>> any newer rgrps added by gfs2_grow. But since this is a delete operation
>> it won't actually use any of those new rgrps. It can really only twiddle
>> the bits from "Unlinked" to "Free" in an existing rgrp. Therefore the
>> error should not prevent the transition from unlinked to free.
>>
>> This patch makes gfs2_dinode_dealloc ignore the bad return code and
>> proceed with freeing the dinode so the QE tests will not be tripped up.
> 
> Is it really ok to ignore all potential errors here? I wonder if it 
> should just ignore -EINTR (or whichever error the test produces) so that 
> it can still fail well for errors like -EIO.

Good question.

The call to gfs2_rindex_update is really not even needed in 
gfs2_dinode_dealloc because this is the last stage of the delete where 
we are freeing the dinode itself. I've even considered removing the call 
altogether. So to fail the operation for such an inconsequential 
action's failure seems like throwing the proverbial baby out with the 
bath water.

Maybe we should just remove the call to gfs2_rindex_update altogether 
and delegate it to earlier parts of the evict/delete process.

The original intent of calling gfs2_rindex_update in the evict/delete 
sequence was to ensure we have the newest resource groups from gfs2_grow 
because any file being evicted may have references to the new rgrps 
created by gfs2_grow that need to be freed, even if the dinode itself 
resides in an old rgrp. This is pretty much true for all parts of the 
process that evicts deleted dinodes except for gfs2_dinode_dealloc 
itself. For example, a new dinode might have an eattr, indirect block, 
data block, or whatever, in one of the new rgrps added by gfs2_grow.

However, since the inode was created/instantiated (which must be true in 
order for it to be evicted), the dinode itself must reside in a 
previously instantiated rgrp, and therefore the call to 
gfs2_rindex_update is not needed at all.

So if the call to it fails, imho, it shouldn't fail the rest of the 
gfs2_dinode_dealloc, regardless of the failure.

The next question you may ask is: why don't we get the -EINTR when 
reading in new rgrps for the purposes of deleting other parts of the 
file, its eattrs, indirect blocks, data blocks, etc.? The answer is: I 
don't know, but I suspect we have other bugs lurking in that area. I 
suspect if we try hard enough we can create other problems in which the 
punch_hole code doesn't read in new rgrps.

It may be tempting to think that this also cannot happen because the 
rgrps must also be instantiated for any eattrs, metadata, data to be 
assigned to the dinode being evicted/deleted. But that's non-clustered 
file system thinking.

In gfs2, it is possible for one cluster node to read in new rgrps from 
gfs2_grow, then assign those blocks to a new dinode that's already open 
on a different node, then delete that file, causing a second cluster 
node to evict, and try to reference those new blocks before the new 
rgrps are read in. So we need to be very careful.

We should probably spend some time trying to force these conditions to 
see if we can flush out more bugs.

For some reason, with this test, we only see this particular problem 
with gfs2_dinode_dealloc, and that's the problem I'm trying to fix with 
the patch.

Regards,

Bob Peterson