[Linux-cachefs] Page reference leak in fscache or cachefiles?
Shantanu Goel
sgoel01 at yahoo.com
Fri Jul 25 15:15:01 UTC 2014
Our codebase is a bit older than 3.15.5 so the only patch I was able to integrate immediately and deploy is the one which papers over the wait_on_page_write deadlock. We are still evaluating the others. The underlying problem of a page reference leak remains which I'm still trying to chase down. It literally gets to the point where the entire file page cache becomes unreclaimable.
Thanks,
Shantanu
> On Friday, July 25, 2014 11:04 AM, Milosz Tanski <milosz at adfin.com> wrote:
> > Did you have a chance to try any of my patches?
>
> On Tue, Jul 22, 2014 at 10:11 AM, Milosz Tanski <milosz at adfin.com> wrote:
>> Shantanu,
>>
>> You are correct in the way that there is a correlation between those
>> messages and fscache hangs. The correlation is that there are a number
>> of bugs in fscache code when things fail (like when allocation fails).
>> I have 3 different patches that I'm going to submit later on for
>> review for these issues against 3.15.5. I'll post them late on today
>> when I get a free minute.
>>
>> Also, sorry for not CCing the list in the first place.
>>
>> On Tue, Jul 22, 2014 at 8:56 AM, Shantanu Goel <sgoel01 at yahoo.com>
> wrote:
>>> Hi Milosz,
>>>
>>> We thought they were harmless too initially, however we have seen quite
> a few hangs at this point on hosts where we enabled fscache + cachefiles and
> think there is some correlation with these messages. It could be some generic
> VM issue which is exacerbated by the use of fscache. We posted our initial
> message to get some ideas about where the problem might be since we are not
> certain of where the problem lies.
>>>
>>> Thanks,
>>> Shantanu
>>>
>>>
>>>
>>>
>>>> On Monday, July 21, 2014 9:31 PM, Milosz Tanski
> <milosz at adfin.com> wrote:
>>>> > Shantanu,
>>>>
>>>> This message is harmless. I see this a lot a well as we have
> fscache
>>>> on a SSD disks that do 1.1GB/s transfer and a demanding application
>>>> and a lot of pages waiting for write out. The kernel cannot force a
>>>> write out because fscache has t o allocate pages with the GPF_NOFS
>>>> flag (to prevent a recursive hang).
>>>>
>>>> On our production machines we changed the vm.min_free_kbytes kernel
>>>> tunable to 256Mb and (the machine has a lot of RAM) and this
> happens a
>>>> lot less often.
>>>>
>>>> - Milosz
>>>>
>>>> On Wed, Jul 9, 2014 at 12:35 PM, Shantanu Goel
> <sgoel01 at yahoo.com> wrote:
>>>>> Hi,
>>>>>
>>>>> We are running Linux 3.10 + fscache-20130702 + commit
>>>> bae6b235905ab9dc6659395f7802c1d36fb63f15 from dhowells' git
> tree and suspect
>>>> there might be a page reference leak somewhere as evidenced by the
> low reclaim
>>>> ratio. We also see the kernel complaining about memory allocation
> failures in
>>>> the readahead code as seen below.
>>>>> The machine has plenty of filecache and there isn't any
> heavy write
>>>> activity which could also pin the pages.
>>>>> We wrote a script to monitor /proc/vmstat and see the
> following output with
>>>> a one second interval.
>>>>>
>>>>>
>>>>> Refill Scan Free FrRatio Slab Runs
> Stalls
>>>>> 0 154663 6855 4 1024 2
> 0
>>>>> 0 7152 7152 100 0 4
> 0
>>>>> 0 5960 5960 100 0 4
> 0
>>>>> 0 7152 7152 100 1024 3
> 0
>>>>> 0 152407 21698 14 1024 2
> 0
>>>>>
>>>>>
>>>>> The fields are:
>>>>>
>>>>> Refill - # active pages scanned, i.e. deactivated
>>>>> Scan - # inactive pages scanned
>>>>> Free - # pages freed
>>>>> FrRatio - free / scan ratio as percentage
>>>>> Runs - # times kswapd ran
>>>>> Stalls - # times direct reclaim ran
>>>>>
>>>>>
>>>>> The free memory looks like this (free -m):
>>>>>
>>>>> total used free shared
> buffers cached
>>>>> Mem: 24175 23986 188 0
> 16 10453
>>>>> -/+ buffers/cache: 13517 10657
>>>>> Swap: 8191 85 8106
>>>>>
>>>>>
>>>>> Are you aware of any such issue with fscache / cachefiles? If
> not, could
>>>> you suggest what other information we could gather to debug it
> further?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Shantanu
>>>>> SysRq : HELP : loglevel(0-9) reboot(b) crash(c)
> terminate-all-tasks(e)
>>>> memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j)
> sak(k)
>>>> show-backtrace-all-active-cpus(l) show-memory-usage(m)
> nice-all-RT-tasks(n)
>>>> poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s)
>>>> show-task-states(t) unmount(u) show-blocked-tasks(w)
> dump-ftrace-buffer(z)
>>>>> python: page allocation failure: order:0, mode:0x11110
>>>>> CPU: 5 PID: 18997 Comm: python Tainted: G W O
>>>> 3.10.36-el5.ia32e.lime.0 #1
>>>>> Hardware name: Supermicro X8STi/X8STi, BIOS 2.0 06/03/2010
>>>>> 0000000000000000 ffff880104d39578 ffffffff81426c43
> ffff880104d39608
>>>>> ffffffff810fdcfc ffff88061fffd468 0000004000000000
> ffff880104d395a8
>>>>> ffffffff810fd506 0000000000000000 ffffffff8109120d
> 0001111000000000
>>>>> Call Trace:
>>>>> [<ffffffff81426c43>] dump_stack+0x19/0x1e
>>>>> [<ffffffff810fdcfc>] warn_alloc_failed+0xfc/0x140
>>>>> [<ffffffff810fd506>] ? drain_local_pages+0x16/0x20
>>>>> [<ffffffff8109120d>] ? on_each_cpu_mask+0x4d/0x70
>>>>> [<ffffffff810fef1c>] __alloc_pages_nodemask+0x6cc/0x910
>>>>> [<ffffffff81139a9a>] alloc_pages_current+0xba/0x160
>>>>> [<ffffffff810f6617>] __page_cache_alloc+0xa7/0xc0
>>>>> [<ffffffffa050a4ea>]
> cachefiles_read_backing_file+0x2ba/0x7b0
>>>> [cachefiles]
>>>>> [<ffffffffa050ac87>]
> cachefiles_read_or_alloc_pages+0x2a7/0x3d0
>>>> [cachefiles]
>>>>> [<ffffffff81062c4f>] ? wake_up_bit+0x2f/0x40
>>>>> [<ffffffffa04da24a>] ? fscache_run_op+0x5a/0xa0
> [fscache]
>>>>> [<ffffffffa04daacb>] ? fscache_submit_op+0x1db/0x4c0
> [fscache]
>>>>> [<ffffffffa04dbd65>]
> __fscache_read_or_alloc_pages+0x1f5/0x2c0
>>>> [fscache]
>>>>> [<ffffffffa0531e9e>]
> __nfs_readpages_from_fscache+0x7e/0x1b0 [nfs]
>>>>> [<ffffffffa052bc6a>] nfs_readpages+0xca/0x1f0 [nfs]
>>>>> [<ffffffff81139a9a>] ? alloc_pages_current+0xba/0x160
>>>>> [<ffffffff81101ae2>]
> __do_page_cache_readahead+0x1b2/0x260
>>>>> [<ffffffff81101bb1>] ra_submit+0x21/0x30
>>>>> [<ffffffff81101e85>] ondemand_readahead+0x115/0x240
>>>>> [<ffffffff81102038>]
> page_cache_async_readahead+0x88/0xb0
>>>>> [<ffffffff810f5f5e>] ? find_get_page+0x1e/0xa0
>>>>> [<ffffffff810f81fc>] generic_file_aio_read+0x4dc/0x720
>>>>> [<ffffffffa05224b9>] nfs_file_read+0x89/0x100 [nfs]
>>>>> [<ffffffff81158a4f>] do_sync_read+0x7f/0xb0
>>>>> [<ffffffff8115a3a5>] vfs_read+0xc5/0x190
>>>>> [<ffffffff8115a57f>] SyS_read+0x5f/0xa0
>>>>> [<ffffffff814327bb>] tracesys+0xdd/0xe2
>>>>> Mem-Info:
>>>>> Node 0 DMA32 per-cpu:
>>>>> CPU 0: hi: 186, btch: 31 usd: 2
>>>>> CPU 1: hi: 186, btch: 31 usd: 159
>>>>> CPU 2: hi: 186, btch: 31 usd: 82
>>>>> CPU 3: hi: 186, btch: 31 usd: 182
>>>>> CPU 4: hi: 186, btch: 31 usd: 57
>>>>> CPU 5: hi: 186, btch: 31 usd: 0
>>>>> Node 0 Normal per-cpu:
>>>>> CPU 0: hi: 186, btch: 31 usd: 22
>>>>> CPU 1: hi: 186, btch: 31 usd: 117
>>>>> CPU 2: hi: 186, btch: 31 usd: 16
>>>>> CPU 3: hi: 186, btch: 31 usd: 79
>>>>> CPU 4: hi: 186, btch: 31 usd: 59
>>>>> CPU 5: hi: 186, btch: 31 usd: 0
>>>>> active_anon:3461555 inactive_anon:451550 isolated_anon:0
>>>>> active_file:236104 inactive_file:1899298 isolated_file:0
>>>>> unevictable:467 dirty:109 writeback:0 unstable:0
>>>>> free:33860 slab_reclaimable:19654 slab_unreclaimable:5110
>>>>> mapped:15284 shmem:2 pagetables:10231 bounce:0
>>>>> free_cma:0
>>>>> Node 0 DMA32 free:91764kB min:4824kB low:6028kB high:7236kB
>>>> active_anon:2185528kB inactive_anon:566976kB active_file:108808kB
>>>> inactive_file:670996kB unevictable:0kB isolated(anon):0kB
> isolated(file):0kB
>>>> present:3660960kB managed:3646160kB mlocked:0kB dirty:152kB
> writeback:0kB
>>>> mapped:1344kB shmem:0kB slab_reclaimable:17300kB
> slab_unreclaimable:580kB
>>>> kernel_stack:80kB pagetables:6448kB unstable:0kB bounce:0kB
> free_cma:0kB
>>>> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>>> lowmem_reserve[]: 0 20607 20607
>>>>> Node 0 Normal free:39164kB min:27940kB low:34924kB
> high:41908kB
>>>> active_anon:11663332kB inactive_anon:1239224kB active_file:835608kB
>>>> inactive_file:6926636kB unevictable:1868kB isolated(anon):0kB
> isolated(file):0kB
>>>> present:21495808kB managed:21101668kB mlocked:1868kB dirty:612kB
> writeback:0kB
>>>> mapped:60120kB shmem:8kB slab_reclaimable:61536kB
> slab_unreclaimable:19860kB
>>>> kernel_stack:2088kB pagetables:34804kB unstable:0kB bounce:0kB
> free_cma:0kB
>>>> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>>> lowmem_reserve[]: 0 0 0
>>>>> Node 0 DMA32: 506*4kB (UEM) 545*8kB (UEM) 569*16kB (UEM)
> 897*32kB (UEM)
>>>> 541*64kB (UEM) 87*128kB (UM) 4*256kB (UM) 2*512kB (M) 0*1024kB
> 0*2048kB 0*4096kB
>>>> = 92000kB
>>>>> Node 0 Normal: 703*4kB (UE) 285*8kB (UEM) 176*16kB (UEM)
> 836*32kB (UEM)
>>>> 7*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
> 35108kB
>>>>> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
>>>> hugepages_size=2048kB
>>>>> 2137331 total pagecache pages
>>>>> 1215 pages in swap cache
>>>>> Swap cache stats: add 7056879, delete 7055664, find
> 633567/703482
>>>>> Free swap = 8301172kB
>>>>> Total swap = 8388604kB
>>>>> 6291455 pages RAM
>>>>> 102580 pages reserved
>>>>> 4391935 pages shared
>>>>> 4746779 pages non-shared
>>>>>
>>>>> --
>>>>> Linux-cachefs mailing list
>>>>> Linux-cachefs at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cachefs
>>>>
>>>>
>>>>
>>>> --
>>>> Milosz Tanski
>>>> CTO
>>>> 16 East 34th Street, 15th floor
>>>> New York, NY 10016
>>>>
>>>> p: 646-253-9055
>
>>>>
>>>> e: milosz at adfin.com
>>>>
>>
>>
>>
>> --
>> Milosz Tanski
>> CTO
>> 16 East 34th Street, 15th floor
>> New York, NY 10016
>>
>> p: 646-253-9055
>> e: milosz at adfin.com
>
>
>
> --
> Milosz Tanski
> CTO
> 16 East 34th Street, 15th floor
> New York, NY 10016
>
> p: 646-253-9055
> e: milosz at adfin.com
>
More information about the Linux-cachefs
mailing list