[Linux-cachefs] Page reference leak in fscache or cachefiles?
Shantanu Goel
sgoel01 at yahoo.com
Fri Jul 25 15:22:05 UTC 2014
drop_caches does not help so it's a permanent leak. Yeah, we're looking into updating out tree to a newer one.
Shantanu
> On Friday, July 25, 2014 11:17 AM, Milosz Tanski <milosz at adfin.com> wrote:
> >T he question is? Does it become un-reclaimable period (it's still full
> when you force the kernel to dump caches)... or it's unreclaimable by
> other processes allocating via GFP_NOFS?
>
> Also, you should be able to port the fscache changes onto you're tree
> by looking David's commit history.
>
> - Milosz
>
> On Fri, Jul 25, 2014 at 11:15 AM, Shantanu Goel <sgoel01 at yahoo.com> wrote:
>> Our codebase is a bit older than 3.15.5 so the only patch I was able to
> integrate immediately and deploy is the one which papers over the
> wait_on_page_write deadlock. We are still evaluating the others. The
> underlying problem of a page reference leak remains which I'm still trying
> to chase down. It literally gets to the point where the entire file page cache
> becomes unreclaimable.
>>
>> Thanks,
>> Shantanu
>>
>>
>>> On Friday, July 25, 2014 11:04 AM, Milosz Tanski
> <milosz at adfin.com> wrote:
>>> > Did you have a chance to try any of my patches?
>>>
>>> On Tue, Jul 22, 2014 at 10:11 AM, Milosz Tanski
> <milosz at adfin.com> wrote:
>>>> Shantanu,
>>>>
>>>> You are correct in the way that there is a correlation between
> those
>>>> messages and fscache hangs. The correlation is that there are a
> number
>>>> of bugs in fscache code when things fail (like when allocation
> fails).
>>>> I have 3 different patches that I'm going to submit later on
> for
>>>> review for these issues against 3.15.5. I'll post them late on
> today
>>>> when I get a free minute.
>>>>
>>>> Also, sorry for not CCing the list in the first place.
>>>>
>>>> On Tue, Jul 22, 2014 at 8:56 AM, Shantanu Goel
> <sgoel01 at yahoo.com>
>>> wrote:
>>>>> Hi Milosz,
>>>>>
>>>>> We thought they were harmless too initially, however we have
> seen quite
>>> a few hangs at this point on hosts where we enabled fscache +
> cachefiles and
>>> think there is some correlation with these messages. It could be some
> generic
>>> VM issue which is exacerbated by the use of fscache. We posted our
> initial
>>> message to get some ideas about where the problem might be since we are
> not
>>> certain of where the problem lies.
>>>>>
>>>>> Thanks,
>>>>> Shantanu
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Monday, July 21, 2014 9:31 PM, Milosz Tanski
>>> <milosz at adfin.com> wrote:
>>>>>> > Shantanu,
>>>>>>
>>>>>> This message is harmless. I see this a lot a well as we
> have
>>> fscache
>>>>>> on a SSD disks that do 1.1GB/s transfer and a demanding
> application
>>>>>> and a lot of pages waiting for write out. The kernel
> cannot force a
>>>>>> write out because fscache has t o allocate pages with the
> GPF_NOFS
>>>>>> flag (to prevent a recursive hang).
>>>>>>
>>>>>> On our production machines we changed the
> vm.min_free_kbytes kernel
>>>>>> tunable to 256Mb and (the machine has a lot of RAM) and
> this
>>> happens a
>>>>>> lot less often.
>>>>>>
>>>>>> - Milosz
>>>>>>
>>>>>> On Wed, Jul 9, 2014 at 12:35 PM, Shantanu Goel
>>> <sgoel01 at yahoo.com> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We are running Linux 3.10 + fscache-20130702 + commit
>>>>>> bae6b235905ab9dc6659395f7802c1d36fb63f15 from
> dhowells' git
>>> tree and suspect
>>>>>> there might be a page reference leak somewhere as
> evidenced by the
>>> low reclaim
>>>>>> ratio. We also see the kernel complaining about memory
> allocation
>>> failures in
>>>>>> the readahead code as seen below.
>>>>>>> The machine has plenty of filecache and there
> isn't any
>>> heavy write
>>>>>> activity which could also pin the pages.
>>>>>>> We wrote a script to monitor /proc/vmstat and see the
>>> following output with
>>>>>> a one second interval.
>>>>>>>
>>>>>>>
>>>>>>> Refill Scan Free FrRatio Slab
> Runs
>>> Stalls
>>>>>>> 0 154663 6855 4 1024
> 2
>>> 0
>>>>>>> 0 7152 7152 100 0
> 4
>>> 0
>>>>>>> 0 5960 5960 100 0
> 4
>>> 0
>>>>>>> 0 7152 7152 100 1024
> 3
>>> 0
>>>>>>> 0 152407 21698 14 1024
> 2
>>> 0
>>>>>>>
>>>>>>>
>>>>>>> The fields are:
>>>>>>>
>>>>>>> Refill - # active pages scanned, i.e. deactivated
>>>>>>> Scan - # inactive pages scanned
>>>>>>> Free - # pages freed
>>>>>>> FrRatio - free / scan ratio as percentage
>>>>>>> Runs - # times kswapd ran
>>>>>>> Stalls - # times direct reclaim ran
>>>>>>>
>>>>>>>
>>>>>>> The free memory looks like this (free -m):
>>>>>>>
>>>>>>> total used free shared
>>> buffers cached
>>>>>>> Mem: 24175 23986 188 0
>>> 16 10453
>>>>>>> -/+ buffers/cache: 13517 10657
>>>>>>> Swap: 8191 85 8106
>>>>>>>
>>>>>>>
>>>>>>> Are you aware of any such issue with fscache /
> cachefiles? If
>>> not, could
>>>>>> you suggest what other information we could gather to
> debug it
>>> further?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Shantanu
>>>>>>> SysRq : HELP : loglevel(0-9) reboot(b) crash(c)
>>> terminate-all-tasks(e)
>>>>>> memory-full-oom-kill(f) kill-all-tasks(i)
> thaw-filesystems(j)
>>> sak(k)
>>>>>> show-backtrace-all-active-cpus(l) show-memory-usage(m)
>>> nice-all-RT-tasks(n)
>>>>>> poweroff(o) show-registers(p) show-all-timers(q) unraw(r)
> sync(s)
>>>>>> show-task-states(t) unmount(u) show-blocked-tasks(w)
>>> dump-ftrace-buffer(z)
>>>>>>> python: page allocation failure: order:0,
> mode:0x11110
>>>>>>> CPU: 5 PID: 18997 Comm: python Tainted: G W O
>>>>>> 3.10.36-el5.ia32e.lime.0 #1
>>>>>>> Hardware name: Supermicro X8STi/X8STi, BIOS 2.0
> 06/03/2010
>>>>>>> 0000000000000000 ffff880104d39578 ffffffff81426c43
>>> ffff880104d39608
>>>>>>> ffffffff810fdcfc ffff88061fffd468 0000004000000000
>>> ffff880104d395a8
>>>>>>> ffffffff810fd506 0000000000000000 ffffffff8109120d
>>> 0001111000000000
>>>>>>> Call Trace:
>>>>>>> [<ffffffff81426c43>] dump_stack+0x19/0x1e
>>>>>>> [<ffffffff810fdcfc>]
> warn_alloc_failed+0xfc/0x140
>>>>>>> [<ffffffff810fd506>] ?
> drain_local_pages+0x16/0x20
>>>>>>> [<ffffffff8109120d>] ?
> on_each_cpu_mask+0x4d/0x70
>>>>>>> [<ffffffff810fef1c>]
> __alloc_pages_nodemask+0x6cc/0x910
>>>>>>> [<ffffffff81139a9a>]
> alloc_pages_current+0xba/0x160
>>>>>>> [<ffffffff810f6617>]
> __page_cache_alloc+0xa7/0xc0
>>>>>>> [<ffffffffa050a4ea>]
>>> cachefiles_read_backing_file+0x2ba/0x7b0
>>>>>> [cachefiles]
>>>>>>> [<ffffffffa050ac87>]
>>> cachefiles_read_or_alloc_pages+0x2a7/0x3d0
>>>>>> [cachefiles]
>>>>>>> [<ffffffff81062c4f>] ? wake_up_bit+0x2f/0x40
>>>>>>> [<ffffffffa04da24a>] ?
> fscache_run_op+0x5a/0xa0
>>> [fscache]
>>>>>>> [<ffffffffa04daacb>] ?
> fscache_submit_op+0x1db/0x4c0
>>> [fscache]
>>>>>>> [<ffffffffa04dbd65>]
>>> __fscache_read_or_alloc_pages+0x1f5/0x2c0
>>>>>> [fscache]
>>>>>>> [<ffffffffa0531e9e>]
>>> __nfs_readpages_from_fscache+0x7e/0x1b0 [nfs]
>>>>>>> [<ffffffffa052bc6a>] nfs_readpages+0xca/0x1f0
> [nfs]
>>>>>>> [<ffffffff81139a9a>] ?
> alloc_pages_current+0xba/0x160
>>>>>>> [<ffffffff81101ae2>]
>>> __do_page_cache_readahead+0x1b2/0x260
>>>>>>> [<ffffffff81101bb1>] ra_submit+0x21/0x30
>>>>>>> [<ffffffff81101e85>]
> ondemand_readahead+0x115/0x240
>>>>>>> [<ffffffff81102038>]
>>> page_cache_async_readahead+0x88/0xb0
>>>>>>> [<ffffffff810f5f5e>] ? find_get_page+0x1e/0xa0
>>>>>>> [<ffffffff810f81fc>]
> generic_file_aio_read+0x4dc/0x720
>>>>>>> [<ffffffffa05224b9>] nfs_file_read+0x89/0x100
> [nfs]
>>>>>>> [<ffffffff81158a4f>] do_sync_read+0x7f/0xb0
>>>>>>> [<ffffffff8115a3a5>] vfs_read+0xc5/0x190
>>>>>>> [<ffffffff8115a57f>] SyS_read+0x5f/0xa0
>>>>>>> [<ffffffff814327bb>] tracesys+0xdd/0xe2
>>>>>>> Mem-Info:
>>>>>>> Node 0 DMA32 per-cpu:
>>>>>>> CPU 0: hi: 186, btch: 31 usd: 2
>>>>>>> CPU 1: hi: 186, btch: 31 usd: 159
>>>>>>> CPU 2: hi: 186, btch: 31 usd: 82
>>>>>>> CPU 3: hi: 186, btch: 31 usd: 182
>>>>>>> CPU 4: hi: 186, btch: 31 usd: 57
>>>>>>> CPU 5: hi: 186, btch: 31 usd: 0
>>>>>>> Node 0 Normal per-cpu:
>>>>>>> CPU 0: hi: 186, btch: 31 usd: 22
>>>>>>> CPU 1: hi: 186, btch: 31 usd: 117
>>>>>>> CPU 2: hi: 186, btch: 31 usd: 16
>>>>>>> CPU 3: hi: 186, btch: 31 usd: 79
>>>>>>> CPU 4: hi: 186, btch: 31 usd: 59
>>>>>>> CPU 5: hi: 186, btch: 31 usd: 0
>>>>>>> active_anon:3461555 inactive_anon:451550
> isolated_anon:0
>>>>>>> active_file:236104 inactive_file:1899298
> isolated_file:0
>>>>>>> unevictable:467 dirty:109 writeback:0 unstable:0
>>>>>>> free:33860 slab_reclaimable:19654
> slab_unreclaimable:5110
>>>>>>> mapped:15284 shmem:2 pagetables:10231 bounce:0
>>>>>>> free_cma:0
>>>>>>> Node 0 DMA32 free:91764kB min:4824kB low:6028kB
> high:7236kB
>>>>>> active_anon:2185528kB inactive_anon:566976kB
> active_file:108808kB
>>>>>> inactive_file:670996kB unevictable:0kB isolated(anon):0kB
>>> isolated(file):0kB
>>>>>> present:3660960kB managed:3646160kB mlocked:0kB
> dirty:152kB
>>> writeback:0kB
>>>>>> mapped:1344kB shmem:0kB slab_reclaimable:17300kB
>>> slab_unreclaimable:580kB
>>>>>> kernel_stack:80kB pagetables:6448kB unstable:0kB
> bounce:0kB
>>> free_cma:0kB
>>>>>> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>>>>> lowmem_reserve[]: 0 20607 20607
>>>>>>> Node 0 Normal free:39164kB min:27940kB low:34924kB
>>> high:41908kB
>>>>>> active_anon:11663332kB inactive_anon:1239224kB
> active_file:835608kB
>>>>>> inactive_file:6926636kB unevictable:1868kB
> isolated(anon):0kB
>>> isolated(file):0kB
>>>>>> present:21495808kB managed:21101668kB mlocked:1868kB
> dirty:612kB
>>> writeback:0kB
>>>>>> mapped:60120kB shmem:8kB slab_reclaimable:61536kB
>>> slab_unreclaimable:19860kB
>>>>>> kernel_stack:2088kB pagetables:34804kB unstable:0kB
> bounce:0kB
>>> free_cma:0kB
>>>>>> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>>>>> lowmem_reserve[]: 0 0 0
>>>>>>> Node 0 DMA32: 506*4kB (UEM) 545*8kB (UEM) 569*16kB
> (UEM)
>>> 897*32kB (UEM)
>>>>>> 541*64kB (UEM) 87*128kB (UM) 4*256kB (UM) 2*512kB (M)
> 0*1024kB
>>> 0*2048kB 0*4096kB
>>>>>> = 92000kB
>>>>>>> Node 0 Normal: 703*4kB (UE) 285*8kB (UEM) 176*16kB
> (UEM)
>>> 836*32kB (UEM)
>>>>>> 7*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB
> 0*4096kB =
>>> 35108kB
>>>>>>> Node 0 hugepages_total=0 hugepages_free=0
> hugepages_surp=0
>>>>>> hugepages_size=2048kB
>>>>>>> 2137331 total pagecache pages
>>>>>>> 1215 pages in swap cache
>>>>>>> Swap cache stats: add 7056879, delete 7055664, find
>>> 633567/703482
>>>>>>> Free swap = 8301172kB
>>>>>>> Total swap = 8388604kB
>>>>>>> 6291455 pages RAM
>>>>>>> 102580 pages reserved
>>>>>>> 4391935 pages shared
>>>>>>> 4746779 pages non-shared
>>>>>>>
>>>>>>> --
>>>>>>> Linux-cachefs mailing list
>>>>>>> Linux-cachefs at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cachefs
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Milosz Tanski
>>>>>> CTO
>>>>>> 16 East 34th Street, 15th floor
>>>>>> New York, NY 10016
>>>>>>
>>>>>> p: 646-253-9055
>
>>>
>>>>>>
>>>>>> e: milosz at adfin.com
>>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Milosz Tanski
>>>> CTO
>>>> 16 East 34th Street, 15th floor
>>>> New York, NY 10016
>>>>
>>>> p: 646-253-9055
>>>> e: milosz at adfin.com
>>>
>>>
>>>
>>> --
>>> Milosz Tanski
>>> CTO
>>> 16 East 34th Street, 15th floor
>>> New York, NY 10016
>>>
>>> p: 646-253-9055
>>> e: milosz at adfin.com
>>>
>
>
>
> --
> Milosz Tanski
> CTO
> 16 East 34th Street, 15th floor
> New York, NY 10016
>
> p: 646-253-9055
> e: milosz at adfin.com
>
More information about the Linux-cachefs
mailing list