[Cluster-devel] [PATCH 4/4] GFS2: Make deletes use regular inode_lookup with special code path

Benjamin Marzinski bmarzins at redhat.com
Mon Mar 21 18:56:54 UTC 2016


On Mon, Mar 21, 2016 at 02:12:43PM -0400, Bob Peterson wrote:
> Hi Ben,
> 
> ----- Original Message -----
> > On Fri, Dec 18, 2015 at 01:02:30PM -0600, Bob Peterson wrote:
> > > This patch changes the inode reclaiming code, delete_work_func, use
> > > the normal gfs2_inode_lookup, but with a special parameter, for_del
> > > which causes additional checks to be made.
> > 
> > I get what you're doing here, and I can't see anything really wrong with
> > it, but I haven't gone through all the delete code as carefully as I
> > could.  I do have two questions.
> > 
> > 1. I don't see any reason why, even if it's just for paranoias sake, we
> > don't avoid dereferencing the gfs2_inode in verify_unlinked before
> > locking and checking it. It seems like a simple fix that avoids any
> > chance that there is a problem due to not holding the directory glock
> > when we're getting the inode.
> 
> I'm not sure I understand the question. Today, function verify_unlinked
> dereferences the gfs2_inode, ip, but so does its only caller,
> gfs2_inode_lookup. We already know ip is valid at that point, so more
> references shouldn't matter. Unless you mean the inode's glock?

So before, delete_work_func called gfs2_lookup_by_inum, which locked the
glock without dereferencing the gfs2_inode pointer, and then called
gfs2_inode_lookup. Other code paths call gfs2_inode_lookup without the
glock on the inode itself locked, but with the containing directory's
inode glock held. But now, we are deferencing the gfs2_inode pointer
without either holding its glock or the glock of the directory that the
file is in.

Assuming that everything has to hold the inode lock first now, once
we've locked the that, and locked the glock and verified the file in
verify inode, it should be safe to unlock the glock in verify_inode, and
continue to dereference it the gfs2_inode until we drop the inode lock.

>  
> > 2. gfs2_inode_lookup calls iget_locked, which creates an inode if one
> > doesn't already exist.  If there's any chance of racing and calling
> > delete_work_func for an inode that has already been removed, won't we
> > just end up creating a new one here?
> > 
> > -Ben
> 
> Yes, and that's exactly what we want. I found cases in which the
> delete_work_func races with the evict() code. In those cases, we need
> the inode to be recreated in order to go through another evict() cycle
> which will end up deleting it, whereas the first evict didn't.
> For example, the shrinker can arbitrarily call evict() and get rid of
> the inode while its nlink is still positive, IOW, it's not deleted,
> but after that point in time, the system gets a callback from another
> node instructing it to delete the inode because it was unlinked
> remotely from that other node. In that case, we still need to
> transition it from unlinked to free, but we need a valid inode
> in order to do the proper delete. So we recreate it and then it gets
> evicted again, in the name of deleting it.

O.k. clearly I didn't read through all the delete code as carefully as I
could have.

> Incidentally, my latest version has changed just a bit. I decided
> to keep the call to gfs2_ilookup in delete_work_func. So the latest
> version of the delete_work_func() is pasted below.
> 
> Regards,
> 
> Bob Peterson
> Red Hat File Systems
> ---
> static void delete_work_func(struct work_struct *work)
> {
> 	struct gfs2_glock *gl = container_of(work, struct gfs2_glock, gl_delete);
> 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
> 	struct gfs2_inode *ip;
> 	struct inode *inode = NULL;
> 	u64 no_addr = gl->gl_name.ln_number;
> 
> 	/* If someone's using this glock to create a new dinode, the block must
> 	   have been freed by another node, then re-used, in which case our
> 	   iopen callback is too late after the fact. Ignore it. */
> 	if (test_bit(GLF_INODE_CREATING, &gl->gl_flags))
> 		goto out;
> 
> 	ip = gl->gl_object;
> 	/* Note: Unsafe to dereference ip as we don't hold right refs/locks */
> 
> 	if (ip)
> 		inode = gfs2_ilookup(sdp->sd_vfs, no_addr);
> 
> 	if (inode == NULL || IS_ERR(inode))
> 		inode = gfs2_inode_lookup(sdp->sd_vfs, DT_UNKNOWN, no_addr,
> 					  0, 1);
> 	if (inode && !IS_ERR(inode)) {
> 		d_prune_aliases(inode);
> 		iput(inode);
> 	}
> out:
> 	gfs2_glock_put(gl);
> }




More information about the Cluster-devel mailing list