[Linux-cachefs] Page reference leak in fscache or cachefiles?

Milosz Tanski milosz at adfin.com
Tue Jul 22 14:11:56 UTC 2014


Shantanu,

You are correct in the way that there is a correlation between those
messages and fscache hangs. The correlation is that there are a number
of bugs in fscache code when things fail (like when allocation fails).
I have 3 different patches that I'm going to submit later on for
review for these issues against 3.15.5. I'll post them late on today
when I get a free minute.

Also, sorry for not CCing the list in the first place.

On Tue, Jul 22, 2014 at 8:56 AM, Shantanu Goel <sgoel01 at yahoo.com> wrote:
> Hi Milosz,
>
> We thought they were harmless too initially, however we have seen quite a few hangs at this point on hosts where we enabled fscache + cachefiles and think there is some correlation with these messages.  It could be some generic VM issue which is exacerbated by the use of fscache.  We posted our initial message to get some ideas about where the problem might be since we are not certain of where the problem lies.
>
> Thanks,
> Shantanu
>
>
>
>
>> On Monday, July 21, 2014 9:31 PM, Milosz Tanski <milosz at adfin.com> wrote:
>> > Shantanu,
>>
>> This message is harmless. I see this a lot a well as we have fscache
>> on a SSD disks that do 1.1GB/s transfer and a demanding application
>> and a lot of pages waiting for write out. The kernel cannot force a
>> write out because fscache has t o allocate pages with the GPF_NOFS
>> flag (to prevent a recursive hang).
>>
>> On our production machines we changed the vm.min_free_kbytes kernel
>> tunable to 256Mb and (the machine has a lot of RAM) and this happens a
>> lot less often.
>>
>> - Milosz
>>
>> On Wed, Jul 9, 2014 at 12:35 PM, Shantanu Goel <sgoel01 at yahoo.com> wrote:
>>>  Hi,
>>>
>>>  We are running Linux 3.10 + fscache-20130702 + commit
>> bae6b235905ab9dc6659395f7802c1d36fb63f15 from dhowells' git tree and suspect
>> there might be a page reference leak somewhere as evidenced by the low reclaim
>> ratio.  We also see the kernel complaining about memory allocation failures in
>> the readahead code as seen below.
>>>  The machine has plenty of filecache and there isn't any heavy write
>> activity which could also pin the pages.
>>>  We wrote a script to monitor /proc/vmstat and see the following output with
>> a one second interval.
>>>
>>>
>>>     Refill      Scan      Free   FrRatio      Slab      Runs    Stalls
>>>           0    154663      6855         4      1024         2         0
>>>           0      7152      7152       100         0         4         0
>>>           0      5960      5960       100         0         4         0
>>>           0      7152      7152       100      1024         3         0
>>>           0    152407     21698        14      1024         2         0
>>>
>>>
>>>  The fields are:
>>>
>>>  Refill  - # active pages scanned, i.e. deactivated
>>>  Scan    - # inactive pages scanned
>>>  Free    - # pages freed
>>>  FrRatio - free / scan ratio as percentage
>>>  Runs    - # times kswapd ran
>>>  Stalls  - # times direct reclaim ran
>>>
>>>
>>>  The free memory looks like this (free -m):
>>>
>>>               total       used       free     shared    buffers     cached
>>>  Mem:         24175      23986        188          0         16      10453
>>>  -/+ buffers/cache:      13517      10657
>>>  Swap:         8191         85       8106
>>>
>>>
>>>  Are you aware of any such issue with fscache / cachefiles?  If not, could
>> you suggest what other information we could gather to debug it further?
>>>
>>>
>>>  Thanks,
>>>  Shantanu
>>>  SysRq : HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e)
>> memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k)
>> show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n)
>> poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s)
>> show-task-states(t) unmount(u) show-blocked-tasks(w) dump-ftrace-buffer(z)
>>>  python: page allocation failure: order:0, mode:0x11110
>>>  CPU: 5 PID: 18997 Comm: python Tainted: G        W  O
>> 3.10.36-el5.ia32e.lime.0 #1
>>>  Hardware name: Supermicro X8STi/X8STi, BIOS 2.0     06/03/2010
>>>   0000000000000000 ffff880104d39578 ffffffff81426c43 ffff880104d39608
>>>   ffffffff810fdcfc ffff88061fffd468 0000004000000000 ffff880104d395a8
>>>   ffffffff810fd506 0000000000000000 ffffffff8109120d 0001111000000000
>>>  Call Trace:
>>>   [<ffffffff81426c43>] dump_stack+0x19/0x1e
>>>   [<ffffffff810fdcfc>] warn_alloc_failed+0xfc/0x140
>>>   [<ffffffff810fd506>] ? drain_local_pages+0x16/0x20
>>>   [<ffffffff8109120d>] ? on_each_cpu_mask+0x4d/0x70
>>>   [<ffffffff810fef1c>] __alloc_pages_nodemask+0x6cc/0x910
>>>   [<ffffffff81139a9a>] alloc_pages_current+0xba/0x160
>>>   [<ffffffff810f6617>] __page_cache_alloc+0xa7/0xc0
>>>   [<ffffffffa050a4ea>] cachefiles_read_backing_file+0x2ba/0x7b0
>> [cachefiles]
>>>   [<ffffffffa050ac87>] cachefiles_read_or_alloc_pages+0x2a7/0x3d0
>> [cachefiles]
>>>   [<ffffffff81062c4f>] ? wake_up_bit+0x2f/0x40
>>>   [<ffffffffa04da24a>] ? fscache_run_op+0x5a/0xa0 [fscache]
>>>   [<ffffffffa04daacb>] ? fscache_submit_op+0x1db/0x4c0 [fscache]
>>>   [<ffffffffa04dbd65>] __fscache_read_or_alloc_pages+0x1f5/0x2c0
>> [fscache]
>>>   [<ffffffffa0531e9e>] __nfs_readpages_from_fscache+0x7e/0x1b0 [nfs]
>>>   [<ffffffffa052bc6a>] nfs_readpages+0xca/0x1f0 [nfs]
>>>   [<ffffffff81139a9a>] ? alloc_pages_current+0xba/0x160
>>>   [<ffffffff81101ae2>] __do_page_cache_readahead+0x1b2/0x260
>>>   [<ffffffff81101bb1>] ra_submit+0x21/0x30
>>>   [<ffffffff81101e85>] ondemand_readahead+0x115/0x240
>>>   [<ffffffff81102038>] page_cache_async_readahead+0x88/0xb0
>>>   [<ffffffff810f5f5e>] ? find_get_page+0x1e/0xa0
>>>   [<ffffffff810f81fc>] generic_file_aio_read+0x4dc/0x720
>>>   [<ffffffffa05224b9>] nfs_file_read+0x89/0x100 [nfs]
>>>   [<ffffffff81158a4f>] do_sync_read+0x7f/0xb0
>>>   [<ffffffff8115a3a5>] vfs_read+0xc5/0x190
>>>   [<ffffffff8115a57f>] SyS_read+0x5f/0xa0
>>>   [<ffffffff814327bb>] tracesys+0xdd/0xe2
>>>  Mem-Info:
>>>  Node 0 DMA32 per-cpu:
>>>  CPU    0: hi:  186, btch:  31 usd:   2
>>>  CPU    1: hi:  186, btch:  31 usd: 159
>>>  CPU    2: hi:  186, btch:  31 usd:  82
>>>  CPU    3: hi:  186, btch:  31 usd: 182
>>>  CPU    4: hi:  186, btch:  31 usd:  57
>>>  CPU    5: hi:  186, btch:  31 usd:   0
>>>  Node 0 Normal per-cpu:
>>>  CPU    0: hi:  186, btch:  31 usd:  22
>>>  CPU    1: hi:  186, btch:  31 usd: 117
>>>  CPU    2: hi:  186, btch:  31 usd:  16
>>>  CPU    3: hi:  186, btch:  31 usd:  79
>>>  CPU    4: hi:  186, btch:  31 usd:  59
>>>  CPU    5: hi:  186, btch:  31 usd:   0
>>>  active_anon:3461555 inactive_anon:451550 isolated_anon:0
>>>   active_file:236104 inactive_file:1899298 isolated_file:0
>>>   unevictable:467 dirty:109 writeback:0 unstable:0
>>>   free:33860 slab_reclaimable:19654 slab_unreclaimable:5110
>>>   mapped:15284 shmem:2 pagetables:10231 bounce:0
>>>   free_cma:0
>>>  Node 0 DMA32 free:91764kB min:4824kB low:6028kB high:7236kB
>> active_anon:2185528kB inactive_anon:566976kB active_file:108808kB
>> inactive_file:670996kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
>> present:3660960kB managed:3646160kB mlocked:0kB dirty:152kB writeback:0kB
>> mapped:1344kB shmem:0kB slab_reclaimable:17300kB slab_unreclaimable:580kB
>> kernel_stack:80kB pagetables:6448kB unstable:0kB bounce:0kB free_cma:0kB
>> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>  lowmem_reserve[]: 0 20607 20607
>>>  Node 0 Normal free:39164kB min:27940kB low:34924kB high:41908kB
>> active_anon:11663332kB inactive_anon:1239224kB active_file:835608kB
>> inactive_file:6926636kB unevictable:1868kB isolated(anon):0kB isolated(file):0kB
>> present:21495808kB managed:21101668kB mlocked:1868kB dirty:612kB writeback:0kB
>> mapped:60120kB shmem:8kB slab_reclaimable:61536kB slab_unreclaimable:19860kB
>> kernel_stack:2088kB pagetables:34804kB unstable:0kB bounce:0kB free_cma:0kB
>> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>  lowmem_reserve[]: 0 0 0
>>>  Node 0 DMA32: 506*4kB (UEM) 545*8kB (UEM) 569*16kB (UEM) 897*32kB (UEM)
>> 541*64kB (UEM) 87*128kB (UM) 4*256kB (UM) 2*512kB (M) 0*1024kB 0*2048kB 0*4096kB
>> = 92000kB
>>>  Node 0 Normal: 703*4kB (UE) 285*8kB (UEM) 176*16kB (UEM) 836*32kB (UEM)
>> 7*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 35108kB
>>>  Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
>> hugepages_size=2048kB
>>>  2137331 total pagecache pages
>>>  1215 pages in swap cache
>>>  Swap cache stats: add 7056879, delete 7055664, find 633567/703482
>>>  Free swap  = 8301172kB
>>>  Total swap = 8388604kB
>>>  6291455 pages RAM
>>>  102580 pages reserved
>>>  4391935 pages shared
>>>  4746779 pages non-shared
>>>
>>>  --
>>>  Linux-cachefs mailing list
>>>  Linux-cachefs at redhat.com
>>>  https://www.redhat.com/mailman/listinfo/linux-cachefs
>>
>>
>>
>> --
>> Milosz Tanski
>> CTO
>> 16 East 34th Street, 15th floor
>> New York, NY 10016
>>
>> p: 646-253-9055
>>
>> e: milosz at adfin.com
>>



-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz at adfin.com




More information about the Linux-cachefs mailing list