[Linux-cachefs] Finding fscache contents for a file

Thu Nov 13 17:40:55 UTC 2008

Thanks for the thorough reply, David...

Responses are inserted below, but I think the executive summary is this: In
the short term I think I'm stuck with EL5 kernels due to voluminous Lustre
dependencies (I'm separatly looking into how to get past this).  That begs
the question as to how I can get the latest possible fscache code into an
EL5 kernel.  I'm currently running a 2.6.18-53.1.14-el5 kernel, and the
fscache code is quite old, I think.  More detail below...

Thanks!
John

On Tue, Nov 11, 2008 at 5:33 PM, David Howells <dhowells at redhat.com> wrote:

> John Groves <John at groves.net> wrote:
>
> > I have modified the lustre client filesystem to use fscache, and it is in
> a
> > rudimentary working state.
>
> Excellent!
>
> > Among my most pressing requirements is to purge the fscache for any
> extent
> > for which a DLM lock is revoked.
>
> Hmmm...  Do you have an open cookie on an FS-Cache object at the time this
> lock
> is revoked?  Or are you looking for a shortcut - the equivalent of a delete
> op
> - by which you can supply a key and say 'delete that if it's there'?
>
> Note that I cannot provide you with functionality to punch holes in files
> in
> the cache very easily, not until the filesystems available to CacheFiles
> get
> that capability.

Hole punching would be ideal, but I understand the limitation.  Yes, I have
a cookie.

Currently, if a DLM lock is revoked, I just blow away the whole file in the
fscache -- at least that's what I think I'm doing.  I call a function
derived from nfs_fscache_disable_cookie(), which appears to clean up the
page cache and then call fscache_relinquish_cookie().

Actually, I'm not sure I should have kept the page cache cleanup part from
the nfs' "fscache.[ch]" since lustre does its own page cache cleanup (and
it's conceivable that a DLM lock extent is not a whole file, although my
current fscache approach is to blow away the whole file in fscache).

> To the end of proving that functionality, I would like to give myself a
> file
> > ioctl that would determine what is in the fscache for a given file.
>  Since
> > this is for testing, performance isn't a major concern.  I'm already
> doing
> > this with the page cache, and I hope something similar would be possible
> > with the fscache.
>
>
> So, you want to be able to get, say, a bitmap of all the pages resident in
> the
> disk cache for a particular cookie - mass bmap() if you will?

A bit map would be cool.  An extent list would be OK too.  Or just an
ability to ask fscache whether a given page (or extent) is in the disk
cache...

> > Is there a supported way to query whether a given page_index is in the
> > fscache?  If not, I'd appreciate suggestions as to how to go about this
> (or
> > insight into how other implementers have proven functionality without
> this
> > feature).  I'm fairly ignorant as to the internals of fscache...
>
>
> Currently, the only way to do this is to try reading it, and observe the
> error
> code.  It's not a requirement I've come across to date.
>
> What exactly is it that you want this functionality for?  Just debugging
> (proving) that what you ask to be cached actually winds up in the cache?

I'm more concerned about the converse: proving that what should have been
removed from the fscache has been duly removed.

>
> What you ask for shouldn't be too hard to provide - after all, I have to do
> the
> work anyway in order to determine whether I should return ENODATA or begin
> a
> read op in CacheFiles.

The problem here (I think) is that I don't want to load the page cache in
order to check whether a page is in the fscache.  And there might be cases
in testing where I would want to check without regard to whether the page is
in the page cache already.

>
>
> If it's merely for debugging, then there's probably no particular need to
> optimise it to be fast.

Certainly for my current purposes there isn't a need for optimization.  My
offhand impression is that lustre users might actually want an ioctl-based
utility that will tell them the cache status of a file (both page cache and
fscache), but even then it's not entirely clear to me that performance of
this code path is important.

>
>
> John Groves <John at groves.net> also wrote:
>
> > I'd like to add one more question... when I explicitly clean out the page
> > cache, so as to force reads to be satisfied from the fscache, I
> frequently
> > find that not all of my pages are available from the fscache.
>
> Hmmm...  That doesn't sound good.  What version of fscache and kernel are
> you
> using?

Hmmm indeed.  You may have hit on one of my problems here.  I'm currently on
a 2.6.18-53.1.14.el5 kernel which was chosen because Lustre likes it.  We
noted early on that there was a big difference between the fscache code here
and in "current" kernels, and that grafting the latest fscache code into the
2.6.18 tree didn't look trivial...is there a way to get a "modern" fscache
patched into a more or less EL5 kernel?  Getting Lustre substantially beyond
EL5 may be a non-starter in the short term (though I'll check with the
Lustre community).

I did some more experiments, and the missing pages seem not to occur if I
take a lunch break after reading them into the page cache (and writing to
fscache), and then blow away the page cache after lunch.  If I just wait a
minute or two, the pages may still not make it into the fscache (and running
"sync" does not help).  For production use, this may not be  a showstopper,
but for performance testing (to justify the effort) it may cause a penalty.
Note that I'm mostly doing tests with very few pages at the moment.  It may
be less of an issue when fscache/cachefiles' dirty list is much bigger,
which will of course be the case in meaningful performance tests.

The best compromise for the moment is likely to get the latest
kernel/fscache that lustre will work with...

>
> Have you checked the statistics that are put in /proc/fs/fscache/stats to
> see
> if they give you some clue?

Doh...my kernel doesn't even have a /proc/fs/fscache.  I'm pretty far
downlevel, I guess. Do you know what version of fscache the /proc entry
appeared in?

 > I don't know why this is, but I suspect that calling my releasepage
> method
> > (from an ioctl, after loading the cache & fscache) sometimes frees the
> > page(s) before fscache gets around to storing them...though that doesn't
> make
> > sense if fscache bumps the page reference count until it has made a copy
> or
> > written it out.
>
>
> fscache doesn't keep a ref on the pages directly, though the cache might
> (the
> cache that writes directly to blockdev certainly does by pasting them into
> BIOs).
>
> What fscache does is to use a couple of page bits on the page to mark its
> interest in a netfs page.  One (PG_fscache) merely notes that fscache has
> an
> interest in that page and that fscache_uncache_page() should be called on
> it;
> the other (PG_fscache_write) indicates that a page is being written to the
> cache, and that the caller should wait on it till it gets cleared if they
> need
> the page.
>
> Can you show me your releasepage() method?

It's attached at the bottom of this message.  Should it be looking at the
fscache bits?

> > (does fscache consider my page dirty for the purpose of writing to
> > cachefiles, or does it make a copy,
>
>
> fscache doesn't make a copy of your page, but the cache might.  In this
> case,
> CacheFiles does because I can't work out how to use the AIO interface from
> the
> kernel.
>
> As I mentioned above, fscache marks its interest in the page at this point
> by
> marking it with PG_fscache_write.  This means the page may be written to
> the
> cache at some point.  Of course, the cache is always at liberty to refuse
> due
> to things like ENOSPC, EIO and ENOMEM.  If this happens, it _should_ show
> up in
> /proc/fs/fscache/stats.
>
> The main purpose of fscache is to insulate as best it can the netfs from
> errors
> in the cache and to hide at least some of the delays involved.
>
> > and is it susceptible to having a page freed out from under it?
>
> In such a case, firstly __free_pages() should bark, and secondly, you're
> likely
> to get gibberish in the cache, not just missing pages.
>
> > ...in which case is there a way to perform an explicit flush [preferably
> on
> > the whole file/object rather than one page at a time]?
>
> That's something I can look at.  The problem with performing an explicit
> flush
> is that involves flushing stuff that's on the queues to be processed by
> other
> processes.  Part of the problem is that stores are batched to save a
> certain
> amount of common time when it comes to actually doing the work.  I really
> should move the batching further down, and, in CacheFiles's case, offer it
> to
> the underlying fs to do.  The BTRFS person is in favour of that.
>
> David
>
> -

###

Here are the releasepage and removepage methods; not much going on except
internal tracking (llap = lustre lite async page I/O tracking stuff):

void ll_removepage(struct page *page)
{
        struct ll_async_page *llap = llap_cast_private(page);
        ENTRY;

        JGDEBUG(D_FSCACHE, "ll_removepage %p\n", page);

        LASSERT(!in_interrupt());

        /* sync pages or failed read pages can leave pages in the page
         * cache that don't have our data associated with them anymore */
        if (page_private(page) == 0) {
                EXIT;
                JGDEBUG(D_FSCACHE, "ll_removepage private err!\n");
                return;
        }

        LASSERT(!llap->llap_lockless_io_page);
        LASSERT(!llap->llap_nocache);

        LL_CDEBUG_PAGE(D_PAGE, page, "being evicted\n");
        __ll_put_llap(page);

        EXIT;
}

static int ll_releasepage(struct page *page, gfp_t gfp_mask)
{
        JGDEBUG(D_FSCACHE, "ll_releasepage %p\n", page);
        if (PagePrivate(page))
                ll_removepage(page);
        return 1;
}