[Linux-cachefs] Page reference leak in fscache or cachefiles?

Fri Jul 25 15:22:05 UTC 2014

drop_caches does not help so it's a permanent leak.  Yeah, we're looking into updating out tree to a newer one.

Shantanu

> On Friday, July 25, 2014 11:17 AM, Milosz Tanski <milosz at adfin.com> wrote:
> >T he question is? Does it become un-reclaimable period (it's still full
> when you force the kernel to dump caches)... or it's unreclaimable by
> other processes allocating via GFP_NOFS?
> 
> Also, you should be able to port the fscache changes onto you're tree
> by looking David's commit history.
> 
> - Milosz
> 
> On Fri, Jul 25, 2014 at 11:15 AM, Shantanu Goel <sgoel01 at yahoo.com> wrote:
>>  Our codebase is a bit older than 3.15.5 so the only patch I was able to 
> integrate immediately and deploy is the one which papers over the 
> wait_on_page_write deadlock.  We are still evaluating the others.  The 
> underlying problem of a page reference leak remains which I'm still trying 
> to chase down.  It literally gets to the point where the entire file page cache 
> becomes unreclaimable.
>> 
>>  Thanks,
>>  Shantanu
>> 
>> 
>>>  On Friday, July 25, 2014 11:04 AM, Milosz Tanski 
> <milosz at adfin.com> wrote:
>>>  > Did you have a chance to try any of my patches?
>>> 
>>>  On Tue, Jul 22, 2014 at 10:11 AM, Milosz Tanski 
> <milosz at adfin.com> wrote:
>>>>   Shantanu,
>>>> 
>>>>   You are correct in the way that there is a correlation between 
> those
>>>>   messages and fscache hangs. The correlation is that there are a 
> number
>>>>   of bugs in fscache code when things fail (like when allocation 
> fails).
>>>>   I have 3 different patches that I'm going to submit later on 
> for
>>>>   review for these issues against 3.15.5. I'll post them late on 
> today
>>>>   when I get a free minute.
>>>> 
>>>>   Also, sorry for not CCing the list in the first place.
>>>> 
>>>>   On Tue, Jul 22, 2014 at 8:56 AM, Shantanu Goel 
> <sgoel01 at yahoo.com>
>>>  wrote:
>>>>>   Hi Milosz,
>>>>> 
>>>>>   We thought they were harmless too initially, however we have 
> seen quite
>>>  a few hangs at this point on hosts where we enabled fscache + 
> cachefiles and
>>>  think there is some correlation with these messages.  It could be some 
> generic
>>>  VM issue which is exacerbated by the use of fscache.  We posted our 
> initial
>>>  message to get some ideas about where the problem might be since we are 
> not
>>>  certain of where the problem lies.
>>>>> 
>>>>>   Thanks,
>>>>>   Shantanu
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>>   On Monday, July 21, 2014 9:31 PM, Milosz Tanski
>>>  <milosz at adfin.com> wrote:
>>>>>>   > Shantanu,
>>>>>> 
>>>>>>   This message is harmless. I see this a lot a well as we 
> have
>>>  fscache
>>>>>>   on a SSD disks that do 1.1GB/s transfer and a demanding 
> application
>>>>>>   and a lot of pages waiting for write out. The kernel 
> cannot force a
>>>>>>   write out because fscache has t o allocate pages with the 
> GPF_NOFS
>>>>>>   flag (to prevent a recursive hang).
>>>>>> 
>>>>>>   On our production machines we changed the 
> vm.min_free_kbytes kernel
>>>>>>   tunable to 256Mb and (the machine has a lot of RAM) and 
> this
>>>  happens a
>>>>>>   lot less often.
>>>>>> 
>>>>>>   - Milosz
>>>>>> 
>>>>>>   On Wed, Jul 9, 2014 at 12:35 PM, Shantanu Goel
>>>  <sgoel01 at yahoo.com> wrote:
>>>>>>>    Hi,
>>>>>>> 
>>>>>>>    We are running Linux 3.10 + fscache-20130702 + commit
>>>>>>   bae6b235905ab9dc6659395f7802c1d36fb63f15 from 
> dhowells' git
>>>  tree and suspect
>>>>>>   there might be a page reference leak somewhere as 
> evidenced by the
>>>  low reclaim
>>>>>>   ratio.  We also see the kernel complaining about memory 
> allocation
>>>  failures in
>>>>>>   the readahead code as seen below.
>>>>>>>    The machine has plenty of filecache and there 
> isn't any
>>>  heavy write
>>>>>>   activity which could also pin the pages.
>>>>>>>    We wrote a script to monitor /proc/vmstat and see the
>>>  following output with
>>>>>>   a one second interval.
>>>>>>> 
>>>>>>> 
>>>>>>>       Refill      Scan      Free   FrRatio      Slab    
>   Runs
>>>  Stalls
>>>>>>>             0    154663      6855         4      1024    
>      2
>>>         0
>>>>>>>             0      7152      7152       100         0    
>      4
>>>         0
>>>>>>>             0      5960      5960       100         0    
>      4
>>>         0
>>>>>>>             0      7152      7152       100      1024    
>      3
>>>         0
>>>>>>>             0    152407     21698        14      1024    
>      2
>>>         0
>>>>>>> 
>>>>>>> 
>>>>>>>    The fields are:
>>>>>>> 
>>>>>>>    Refill  - # active pages scanned, i.e. deactivated
>>>>>>>    Scan    - # inactive pages scanned
>>>>>>>    Free    - # pages freed
>>>>>>>    FrRatio - free / scan ratio as percentage
>>>>>>>    Runs    - # times kswapd ran
>>>>>>>    Stalls  - # times direct reclaim ran
>>>>>>> 
>>>>>>> 
>>>>>>>    The free memory looks like this (free -m):
>>>>>>> 
>>>>>>>                 total       used       free     shared
>>>  buffers     cached
>>>>>>>    Mem:         24175      23986        188          0
>>>  16      10453
>>>>>>>    -/+ buffers/cache:      13517      10657
>>>>>>>    Swap:         8191         85       8106
>>>>>>> 
>>>>>>> 
>>>>>>>    Are you aware of any such issue with fscache / 
> cachefiles?  If
>>>  not, could
>>>>>>   you suggest what other information we could gather to 
> debug it
>>>  further?
>>>>>>> 
>>>>>>> 
>>>>>>>    Thanks,
>>>>>>>    Shantanu
>>>>>>>    SysRq : HELP : loglevel(0-9) reboot(b) crash(c)
>>>  terminate-all-tasks(e)
>>>>>>   memory-full-oom-kill(f) kill-all-tasks(i) 
> thaw-filesystems(j)
>>>  sak(k)
>>>>>>   show-backtrace-all-active-cpus(l) show-memory-usage(m)
>>>  nice-all-RT-tasks(n)
>>>>>>   poweroff(o) show-registers(p) show-all-timers(q) unraw(r) 
> sync(s)
>>>>>>   show-task-states(t) unmount(u) show-blocked-tasks(w)
>>>  dump-ftrace-buffer(z)
>>>>>>>    python: page allocation failure: order:0, 
> mode:0x11110
>>>>>>>    CPU: 5 PID: 18997 Comm: python Tainted: G        W  O
>>>>>>   3.10.36-el5.ia32e.lime.0 #1
>>>>>>>    Hardware name: Supermicro X8STi/X8STi, BIOS 2.0    
> 06/03/2010
>>>>>>>     0000000000000000 ffff880104d39578 ffffffff81426c43
>>>  ffff880104d39608
>>>>>>>     ffffffff810fdcfc ffff88061fffd468 0000004000000000
>>>  ffff880104d395a8
>>>>>>>     ffffffff810fd506 0000000000000000 ffffffff8109120d
>>>  0001111000000000
>>>>>>>    Call Trace:
>>>>>>>     [<ffffffff81426c43>] dump_stack+0x19/0x1e
>>>>>>>     [<ffffffff810fdcfc>] 
> warn_alloc_failed+0xfc/0x140
>>>>>>>     [<ffffffff810fd506>] ? 
> drain_local_pages+0x16/0x20
>>>>>>>     [<ffffffff8109120d>] ? 
> on_each_cpu_mask+0x4d/0x70
>>>>>>>     [<ffffffff810fef1c>] 
> __alloc_pages_nodemask+0x6cc/0x910
>>>>>>>     [<ffffffff81139a9a>] 
> alloc_pages_current+0xba/0x160
>>>>>>>     [<ffffffff810f6617>] 
> __page_cache_alloc+0xa7/0xc0
>>>>>>>     [<ffffffffa050a4ea>]
>>>  cachefiles_read_backing_file+0x2ba/0x7b0
>>>>>>   [cachefiles]
>>>>>>>     [<ffffffffa050ac87>]
>>>  cachefiles_read_or_alloc_pages+0x2a7/0x3d0
>>>>>>   [cachefiles]
>>>>>>>     [<ffffffff81062c4f>] ? wake_up_bit+0x2f/0x40
>>>>>>>     [<ffffffffa04da24a>] ? 
> fscache_run_op+0x5a/0xa0
>>>  [fscache]
>>>>>>>     [<ffffffffa04daacb>] ? 
> fscache_submit_op+0x1db/0x4c0
>>>  [fscache]
>>>>>>>     [<ffffffffa04dbd65>]
>>>  __fscache_read_or_alloc_pages+0x1f5/0x2c0
>>>>>>   [fscache]
>>>>>>>     [<ffffffffa0531e9e>]
>>>  __nfs_readpages_from_fscache+0x7e/0x1b0 [nfs]
>>>>>>>     [<ffffffffa052bc6a>] nfs_readpages+0xca/0x1f0 
> [nfs]
>>>>>>>     [<ffffffff81139a9a>] ? 
> alloc_pages_current+0xba/0x160
>>>>>>>     [<ffffffff81101ae2>]
>>>  __do_page_cache_readahead+0x1b2/0x260
>>>>>>>     [<ffffffff81101bb1>] ra_submit+0x21/0x30
>>>>>>>     [<ffffffff81101e85>] 
> ondemand_readahead+0x115/0x240
>>>>>>>     [<ffffffff81102038>]
>>>  page_cache_async_readahead+0x88/0xb0
>>>>>>>     [<ffffffff810f5f5e>] ? find_get_page+0x1e/0xa0
>>>>>>>     [<ffffffff810f81fc>] 
> generic_file_aio_read+0x4dc/0x720
>>>>>>>     [<ffffffffa05224b9>] nfs_file_read+0x89/0x100 
> [nfs]
>>>>>>>     [<ffffffff81158a4f>] do_sync_read+0x7f/0xb0
>>>>>>>     [<ffffffff8115a3a5>] vfs_read+0xc5/0x190
>>>>>>>     [<ffffffff8115a57f>] SyS_read+0x5f/0xa0
>>>>>>>     [<ffffffff814327bb>] tracesys+0xdd/0xe2
>>>>>>>    Mem-Info:
>>>>>>>    Node 0 DMA32 per-cpu:
>>>>>>>    CPU    0: hi:  186, btch:  31 usd:   2
>>>>>>>    CPU    1: hi:  186, btch:  31 usd: 159
>>>>>>>    CPU    2: hi:  186, btch:  31 usd:  82
>>>>>>>    CPU    3: hi:  186, btch:  31 usd: 182
>>>>>>>    CPU    4: hi:  186, btch:  31 usd:  57
>>>>>>>    CPU    5: hi:  186, btch:  31 usd:   0
>>>>>>>    Node 0 Normal per-cpu:
>>>>>>>    CPU    0: hi:  186, btch:  31 usd:  22
>>>>>>>    CPU    1: hi:  186, btch:  31 usd: 117
>>>>>>>    CPU    2: hi:  186, btch:  31 usd:  16
>>>>>>>    CPU    3: hi:  186, btch:  31 usd:  79
>>>>>>>    CPU    4: hi:  186, btch:  31 usd:  59
>>>>>>>    CPU    5: hi:  186, btch:  31 usd:   0
>>>>>>>    active_anon:3461555 inactive_anon:451550 
> isolated_anon:0
>>>>>>>     active_file:236104 inactive_file:1899298 
> isolated_file:0
>>>>>>>     unevictable:467 dirty:109 writeback:0 unstable:0
>>>>>>>     free:33860 slab_reclaimable:19654 
> slab_unreclaimable:5110
>>>>>>>     mapped:15284 shmem:2 pagetables:10231 bounce:0
>>>>>>>     free_cma:0
>>>>>>>    Node 0 DMA32 free:91764kB min:4824kB low:6028kB 
> high:7236kB
>>>>>>   active_anon:2185528kB inactive_anon:566976kB 
> active_file:108808kB
>>>>>>   inactive_file:670996kB unevictable:0kB isolated(anon):0kB
>>>  isolated(file):0kB
>>>>>>   present:3660960kB managed:3646160kB mlocked:0kB 
> dirty:152kB
>>>  writeback:0kB
>>>>>>   mapped:1344kB shmem:0kB slab_reclaimable:17300kB
>>>  slab_unreclaimable:580kB
>>>>>>   kernel_stack:80kB pagetables:6448kB unstable:0kB 
> bounce:0kB
>>>  free_cma:0kB
>>>>>>   writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>>>>>    lowmem_reserve[]: 0 20607 20607
>>>>>>>    Node 0 Normal free:39164kB min:27940kB low:34924kB
>>>  high:41908kB
>>>>>>   active_anon:11663332kB inactive_anon:1239224kB 
> active_file:835608kB
>>>>>>   inactive_file:6926636kB unevictable:1868kB 
> isolated(anon):0kB
>>>  isolated(file):0kB
>>>>>>   present:21495808kB managed:21101668kB mlocked:1868kB 
> dirty:612kB
>>>  writeback:0kB
>>>>>>   mapped:60120kB shmem:8kB slab_reclaimable:61536kB
>>>  slab_unreclaimable:19860kB
>>>>>>   kernel_stack:2088kB pagetables:34804kB unstable:0kB 
> bounce:0kB
>>>  free_cma:0kB
>>>>>>   writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>>>>>    lowmem_reserve[]: 0 0 0
>>>>>>>    Node 0 DMA32: 506*4kB (UEM) 545*8kB (UEM) 569*16kB 
> (UEM)
>>>  897*32kB (UEM)
>>>>>>   541*64kB (UEM) 87*128kB (UM) 4*256kB (UM) 2*512kB (M) 
> 0*1024kB
>>>  0*2048kB 0*4096kB
>>>>>>   = 92000kB
>>>>>>>    Node 0 Normal: 703*4kB (UE) 285*8kB (UEM) 176*16kB 
> (UEM)
>>>  836*32kB (UEM)
>>>>>>   7*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 
> 0*4096kB =
>>>  35108kB
>>>>>>>    Node 0 hugepages_total=0 hugepages_free=0 
> hugepages_surp=0
>>>>>>   hugepages_size=2048kB
>>>>>>>    2137331 total pagecache pages
>>>>>>>    1215 pages in swap cache
>>>>>>>    Swap cache stats: add 7056879, delete 7055664, find
>>>  633567/703482
>>>>>>>    Free swap  = 8301172kB
>>>>>>>    Total swap = 8388604kB
>>>>>>>    6291455 pages RAM
>>>>>>>    102580 pages reserved
>>>>>>>    4391935 pages shared
>>>>>>>    4746779 pages non-shared
>>>>>>> 
>>>>>>>    --
>>>>>>>    Linux-cachefs mailing list
>>>>>>>   Linux-cachefs at redhat.com
>>>>>>>   https://www.redhat.com/mailman/listinfo/linux-cachefs
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>   --
>>>>>>   Milosz Tanski
>>>>>>   CTO
>>>>>>   16 East 34th Street, 15th floor
>>>>>>   New York, NY 10016
>>>>>> 
>>>>>>   p: 646-253-9055
> 
>>> 
>>>>>> 
>>>>>>   e: milosz at adfin.com
>>>>>> 
>>>> 
>>>> 
>>>> 
>>>>   --
>>>>   Milosz Tanski
>>>>   CTO
>>>>   16 East 34th Street, 15th floor
>>>>   New York, NY 10016
>>>> 
>>>>   p: 646-253-9055
>>>>   e: milosz at adfin.com
>>> 
>>> 
>>> 
>>>  --
>>>  Milosz Tanski
>>>  CTO
>>>  16 East 34th Street, 15th floor
>>>  New York, NY 10016
>>> 
>>>  p: 646-253-9055
>>>  e: milosz at adfin.com
>>> 
> 
> 
> 
> -- 
> Milosz Tanski
> CTO
> 16 East 34th Street, 15th floor
> New York, NY 10016
> 
> p: 646-253-9055
> e: milosz at adfin.com
>