[Cluster-devel] [GFS2 Patch] pass formal ino in do_filldir_main

Mon Feb 26 16:11:49 UTC 2007

Please note, I haven't been following current GFS2 development...

On Feb 26, 2007, at 9:45 AM, Steven Whitehouse wrote:

> Hi,
>
> On Fri, 2007-02-23 at 22:56 -0500, Wendy Cheng wrote:
>> Steve,
>>
>> I gave this issue some more thoughts - would like to suggest we take
>> this patch (at least for now) since it aligns with current code base.
>>
>> The no_formal_ino is apparently intented to get returned back to user
>> space due to its unique-ness (and we have to trust pick_formal_ino()
>> does its job right). With normal readdir system call, after the inode
>> number is sent to user space, there is no route (I've checked) for it 
>> to
>> come back to kernel. So the only user that would use these filldir ino
>> inside the kernel is NFSD. NFSD calls gfs2_ilookup() that requires
>> no_formal_ino.
>>
> no_formal_ino is not ever guaranteed to be unique which is why I want 
> to
> keep it away from userspace. I don't really want to have to change that
> in order to fix NFS.

Why is it not unique?  (from Ken's doc on GFS2:  "If the filesystem 
allocated 1 billion inodes per second, it would take over 500 years for 
roll-over.")

> As you say there is no route for the inode number returned via readdir
> to come back to the kernel, but it should match the inode number
> returned by stat and I don't see that we should change either of them
> just to get NFS to work when userspace is perfectly ok as it is.

There were alot of things done to make NFS work properly with GFS - 
many of them through great pain, because GFS is a clustered FS.

>> If you look further... The current lookup code actually uses
>> no_formal_ino, not no_addr. The two main "gate" routines that controls
>> ino-to-inode conversion are:
>>
>> * gfs2_ilookup() (used by NFS route)
>> * gfs2_inode_lookup() (used by VFS that calls gfs2_lookup()).
>>
> Neither of them need to use no_formal_ino at all. The lookup is using
> no_addr as there is no other index which makes sense. I know that
> no_formal_ino is used as part of the hash, but it shouldn't be really
> and its trivial to change that. You can't swap to just using
> no_formal_ino on its own as there is no mapping to find its on-disk
> location from no_formal_ino.
>
>> Both use no_formal_ino - gfs2_inode_lookup() logic hides this behind 
>> the
>> little wrapper "gfs2_iget()". Since current VFS side's lookup has been
>> working fine, this no_formal_ino idea apparently is working. So let's
>> make NFSD side work the *same* way. I'm convinced this patch does a
>> right thing.
>>
>> I don't dispute using generation number may not be a bad idea and may
>> perform better. However, if we really look into the details, it is not
>> easy for current code structure. Since we have something already
>> working, it is not wise to mess this code layout at this moment.
>>
>> Make sense ?
>>
>> -- Wendy
>>
> I'm afraid that I'm still not convinced on that. It seems that ext3 
> uses
> the method that I'm proposing in that it looks up the inode by inode
> number and then only afterwards checks the generation number. It also
> has the advantage of separating the NFS interface from the rest of the
> filesystem. That seems to me to be a lot of the problem that we are
> looking at here in that the VFS's lookup function wants to look up by
> no_addr and (currently) NFS is asking for something slightly different,
>

How do you plan on handling inode migration when using NFS?  I believe 
that was one of the central reasons for having no_formal_inode.  There 
may be other reasons as well, as I am not familiar with the code.

  brassow