[Linux-cachefs] Page leaking in cachefiles_read_backing_file while vmscan is active

Wed Aug 29 16:57:16 UTC 2018

Hi,

I have a system with fscache enabled on an NFS mount and it is leaking
pages and following are the symptoms.

The system is running ubuntu xenial kernel with

# uname -a
Linux XX-XXXX-XXX 4.4.0-124-generic #148-Ubuntu SMP Wed May 2 13:00:18
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

# uptime
 16:28:46 up 42 days, 20:20, 13 users,  load average: 3.85, 4.58, 4.73

page allocations in cachefiles_read_backing_file is failing with a
huge number of OOMs and finally the following failure.

~$ dmesg
[2434369.019106] [<ffffffff810c69a0>] ? __wake_up_bit+0x50/0x70
[2434369.019109] [<ffffffff810c6a85>] ? wake_up_bit+0x25/0x30
[2434369.019118] [<ffffffffc054d27c>] ?
fscache_run_op.isra.7+0x4c/0x80 [fscache]
[2434369.019123] [<ffffffffc054f4cf>]
__fscache_read_or_alloc_pages+0x1af/0x2e0 [fscache]
[2434369.019142] [<ffffffffc0826710>]
__nfs_readpages_from_fscache+0x130/0x1a0 [nfs]
[2434369.019152] [<ffffffffc081d111>] nfs_readpages+0xc1/0x200 [nfs]
[2434369.019155] [<ffffffff811e699c>] ? alloc_pages_current+0x8c/0x110
[2434369.019159] [<ffffffff811a1a39>] __do_page_cache_readahead+0x199/0x240
[2434369.019162] [<ffffffff810c6678>] ? __wake_up_common+0x58/0x90
[2434369.019165] [<ffffffff811a1c1d>] ondemand_readahead+0x13d/0x250
[2434369.019168] [<ffffffff811a1d9b>] page_cache_async_readahead+0x6b/0x70
[2434369.019171] [<ffffffff81194524>] generic_file_read_iter+0x464/0x690
[2434369.019180] [<ffffffffc08144a8>] ?
__nfs_revalidate_mapping+0xc8/0x290 [nfs]
[2434369.019188] [<ffffffffc080fdf2>] nfs_file_read+0x52/0xa0 [nfs]
[2434369.019192] [<ffffffff8121353e>] new_sync_read+0x9e/0xe0
[2434369.019195] [<ffffffff812135a9>] __vfs_read+0x29/0x40
[2434369.019197] [<ffffffff81213b76>] vfs_read+0x86/0x130
[2434369.019200] [<ffffffff81214a85>] SyS_pread64+0x95/0xb0
[2434369.019206] [<ffffffff8184f788>] entry_SYSCALL_64_fastpath+0x1c/0xbb
[2434369.019223] Mem-Info:
[2434369.019235] active_anon:9626413 inactive_anon:761627 isolated_anon:0
                  active_file:905105 inactive_file:116090946
isolated_file:424                                     >>> says around
116 MB was locked
                  unevictable:0 dirty:0 writeback:0 unstable:2
                  slab_reclaimable:2701894 slab_unreclaimable:185099
                  mapped:592946 shmem:1416519 pagetables:88826 bounce:0
                  free:283509 free_pcp:0 free_cma:0


== results after leaving for few more days , so the leak has increased
to 222 GB ====

Looking at the stats for retrievals and oom confirms that the system
ran into memory pressure,

# echo 3 > /proc/sys/vm/drop_caches ; echo 3 >
/proc/sys/vm/drop_caches ; ./test//vm/page-types -r -b
lru,~unevictable,~active,~locked,~referenced
             flags      page-count       MB  symbolic-flags
         long-symbolic-flags
0x0000000800000020             652        2
_____l_______________________P____________ lru,private
0x0000000400000028        15341560    59927
___U_l______________________d_____________ uptodate,lru,mappedtodisk
0x0000000000000028        42646984   166589
___U_l____________________________________ uptodate,lru
                                  >>>     leak in the pages
0x0001000400000028             291        1
___U_l______________________d______I______
uptodate,lru,mappedtodisk,readahead
0x0001000000000028               2        0
___U_l_____________________________I______ uptodate,lru,readahead
0x0000001000000028              80        0
___U_l________________________p___________ uptodate,lru,private_2
0x0001001000000028               1        0
___U_l________________________p____I______
uptodate,lru,private_2,readahead
0x0000000c00000028             152        0
___U_l______________________dP____________
uptodate,lru,mappedtodisk,private
0x0000000800000028               1        0
___U_l_______________________P____________ uptodate,lru,private
0x0001000000004030          183390      716
____Dl________b____________________I______
dirty,lru,swapbacked,readahead
0x0001000000004038          927079     3621
___UDl________b____________________I______
uptodate,dirty,lru,swapbacked,readahead
0x0000000000004038               2        0
___UDl________b___________________________
uptodate,dirty,lru,swapbacked
0x0000000c00000038               6        0
___UDl______________________dP____________
uptodate,dirty,lru,mappedtodisk,private
0x0000000400000828            2981       11
___U_l_____M________________d_____________
uptodate,lru,mmap,mappedtodisk
0x0000000000004828               1        0
___U_l_____M__b___________________________
uptodate,lru,mmap,swapbacked
0x0000000000004838            8140       31
___UDl_____M__b___________________________
uptodate,dirty,lru,mmap,swapbacked
             total        59111322   230903


# cat /proc/fs/fscache/stats
FS-Cache statistics
Cookies: idx=286 dat=94575250 spc=0
Objects: alc=93658872 nal=37 avl=93658871 ded=93573062
ChkAux : non=0 ok=80656494 upd=0 obs=145
Pages  : mrk=238684375647 unc=238677703085
Acquire: n=94575536 nul=0 noc=0 ok=94575536 nbf=0 oom=0
Lookups: n=93658872 neg=13002215 pos=80656657 crt=13002215 tmo=0
Invals : n=23210 run=23210
Updates: n=0 nul=0 run=23210
Relinqs: n=94489599 nul=0 wcr=0 rtr=0
AttrChg: n=0 ok=0 nbf=0 oom=0 run=0
Allocs : n=0 ok=0 wt=0 nbf=0 int=0
Allocs : ops=0 owt=0 abt=0
Retrvls: n=1055039450 ok=1017317776 wt=1966427 nod=15094761
nbf=2742833 int=0 oom=19884080          >>>>  huge number of OOMS
Retrvls: ops=1052296617 owt=1250626 abt=0
Stores : n=1808983598 ok=1808983598 agn=0 nbf=0 oom=0
Stores : ops=23287547 run=1832257814 pgs=1808970267 rxd=1808983598 olm=0 ipp=0
VmScan : nos=2045450030 gon=9 bsy=22 can=13331 wt=2013
Ops    : pend=1250891 run=1075607374 enq=729855483 can=0 rej=0
Ops    : ini=2861303425 dfr=418493 rel=2861303422 gc=418493
CacheOp: alo=0 luo=0 luc=0 gro=0
CacheOp: inv=0 upo=0 dro=0 pto=0 atc=0 syn=0
CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0
CacheEv: nsp=145 stl=0 rtr=0 cul=0

Looking  at the page flags using page-types

The bug seems to be in ext4 pages being used for fscache.

I have root caused the bug to be a leak in ext4 cache page when trying
to read a page from fscache and the page is still in the NFS page lru
list.

I have simulated the situation by not freeing the backpages  and we
can see the pattern matches with page flags

searching old archives found that there is a known fix, but was never
committed to upstream.

https://www.redhat.com/archives/linux-cachefs/2014-August/msg00017.html
This patch never made into mainline kernels, but looks like an obvious
fix for the problem,

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index ad74a6a..ead4981 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -523,7 +523,10 @@ static int cachefiles_read_backing_file(struct
cachefiles_object *object,
          netpage->index, cachefiles_gfp);
   if (ret < 0) {
    if (ret == -EEXIST) {
+ page_cache_release(backpage);
+ backpage = NULL;
     page_cache_release(netpage);
+ netpage = NULL;
     fscache_retrieval_complete(op, 1);
     continue;
    }
@@ -596,7 +599,10 @@ static int cachefiles_read_backing_file(struct
cachefiles_object *object,
          netpage->index, cachefiles_gfp);
   if (ret < 0) {
    if (ret == -EEXIST) {
+ page_cache_release(backpage);
+ backpage = NULL;
     page_cache_release(netpage);
+ netpage = NULL;
     fscache_retrieval_complete(op, 1);
     continue;
    }

David,

If applicable, can you please help pull this change into upstream release.

Thanks
Kiran