[Linux-cachefs] Page leaking in cachefiles_read_backing_file while vmscan is active
KiranKumar Modukuri
kiran.modukuri at gmail.com
Wed Aug 29 16:57:16 UTC 2018
Hi,
I have a system with fscache enabled on an NFS mount and it is leaking
pages and following are the symptoms.
The system is running ubuntu xenial kernel with
# uname -a
Linux XX-XXXX-XXX 4.4.0-124-generic #148-Ubuntu SMP Wed May 2 13:00:18
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
# uptime
16:28:46 up 42 days, 20:20, 13 users, load average: 3.85, 4.58, 4.73
page allocations in cachefiles_read_backing_file is failing with a
huge number of OOMs and finally the following failure.
~$ dmesg
[2434369.019106] [<ffffffff810c69a0>] ? __wake_up_bit+0x50/0x70
[2434369.019109] [<ffffffff810c6a85>] ? wake_up_bit+0x25/0x30
[2434369.019118] [<ffffffffc054d27c>] ?
fscache_run_op.isra.7+0x4c/0x80 [fscache]
[2434369.019123] [<ffffffffc054f4cf>]
__fscache_read_or_alloc_pages+0x1af/0x2e0 [fscache]
[2434369.019142] [<ffffffffc0826710>]
__nfs_readpages_from_fscache+0x130/0x1a0 [nfs]
[2434369.019152] [<ffffffffc081d111>] nfs_readpages+0xc1/0x200 [nfs]
[2434369.019155] [<ffffffff811e699c>] ? alloc_pages_current+0x8c/0x110
[2434369.019159] [<ffffffff811a1a39>] __do_page_cache_readahead+0x199/0x240
[2434369.019162] [<ffffffff810c6678>] ? __wake_up_common+0x58/0x90
[2434369.019165] [<ffffffff811a1c1d>] ondemand_readahead+0x13d/0x250
[2434369.019168] [<ffffffff811a1d9b>] page_cache_async_readahead+0x6b/0x70
[2434369.019171] [<ffffffff81194524>] generic_file_read_iter+0x464/0x690
[2434369.019180] [<ffffffffc08144a8>] ?
__nfs_revalidate_mapping+0xc8/0x290 [nfs]
[2434369.019188] [<ffffffffc080fdf2>] nfs_file_read+0x52/0xa0 [nfs]
[2434369.019192] [<ffffffff8121353e>] new_sync_read+0x9e/0xe0
[2434369.019195] [<ffffffff812135a9>] __vfs_read+0x29/0x40
[2434369.019197] [<ffffffff81213b76>] vfs_read+0x86/0x130
[2434369.019200] [<ffffffff81214a85>] SyS_pread64+0x95/0xb0
[2434369.019206] [<ffffffff8184f788>] entry_SYSCALL_64_fastpath+0x1c/0xbb
[2434369.019223] Mem-Info:
[2434369.019235] active_anon:9626413 inactive_anon:761627 isolated_anon:0
active_file:905105 inactive_file:116090946
isolated_file:424 >>> says around
116 MB was locked
unevictable:0 dirty:0 writeback:0 unstable:2
slab_reclaimable:2701894 slab_unreclaimable:185099
mapped:592946 shmem:1416519 pagetables:88826 bounce:0
free:283509 free_pcp:0 free_cma:0
== results after leaving for few more days , so the leak has increased
to 222 GB ====
Looking at the stats for retrievals and oom confirms that the system
ran into memory pressure,
# echo 3 > /proc/sys/vm/drop_caches ; echo 3 >
/proc/sys/vm/drop_caches ; ./test//vm/page-types -r -b
lru,~unevictable,~active,~locked,~referenced
flags page-count MB symbolic-flags
long-symbolic-flags
0x0000000800000020 652 2
_____l_______________________P____________ lru,private
0x0000000400000028 15341560 59927
___U_l______________________d_____________ uptodate,lru,mappedtodisk
0x0000000000000028 42646984 166589
___U_l____________________________________ uptodate,lru
>>> leak in the pages
0x0001000400000028 291 1
___U_l______________________d______I______
uptodate,lru,mappedtodisk,readahead
0x0001000000000028 2 0
___U_l_____________________________I______ uptodate,lru,readahead
0x0000001000000028 80 0
___U_l________________________p___________ uptodate,lru,private_2
0x0001001000000028 1 0
___U_l________________________p____I______
uptodate,lru,private_2,readahead
0x0000000c00000028 152 0
___U_l______________________dP____________
uptodate,lru,mappedtodisk,private
0x0000000800000028 1 0
___U_l_______________________P____________ uptodate,lru,private
0x0001000000004030 183390 716
____Dl________b____________________I______
dirty,lru,swapbacked,readahead
0x0001000000004038 927079 3621
___UDl________b____________________I______
uptodate,dirty,lru,swapbacked,readahead
0x0000000000004038 2 0
___UDl________b___________________________
uptodate,dirty,lru,swapbacked
0x0000000c00000038 6 0
___UDl______________________dP____________
uptodate,dirty,lru,mappedtodisk,private
0x0000000400000828 2981 11
___U_l_____M________________d_____________
uptodate,lru,mmap,mappedtodisk
0x0000000000004828 1 0
___U_l_____M__b___________________________
uptodate,lru,mmap,swapbacked
0x0000000000004838 8140 31
___UDl_____M__b___________________________
uptodate,dirty,lru,mmap,swapbacked
total 59111322 230903
# cat /proc/fs/fscache/stats
FS-Cache statistics
Cookies: idx=286 dat=94575250 spc=0
Objects: alc=93658872 nal=37 avl=93658871 ded=93573062
ChkAux : non=0 ok=80656494 upd=0 obs=145
Pages : mrk=238684375647 unc=238677703085
Acquire: n=94575536 nul=0 noc=0 ok=94575536 nbf=0 oom=0
Lookups: n=93658872 neg=13002215 pos=80656657 crt=13002215 tmo=0
Invals : n=23210 run=23210
Updates: n=0 nul=0 run=23210
Relinqs: n=94489599 nul=0 wcr=0 rtr=0
AttrChg: n=0 ok=0 nbf=0 oom=0 run=0
Allocs : n=0 ok=0 wt=0 nbf=0 int=0
Allocs : ops=0 owt=0 abt=0
Retrvls: n=1055039450 ok=1017317776 wt=1966427 nod=15094761
nbf=2742833 int=0 oom=19884080 >>>> huge number of OOMS
Retrvls: ops=1052296617 owt=1250626 abt=0
Stores : n=1808983598 ok=1808983598 agn=0 nbf=0 oom=0
Stores : ops=23287547 run=1832257814 pgs=1808970267 rxd=1808983598 olm=0 ipp=0
VmScan : nos=2045450030 gon=9 bsy=22 can=13331 wt=2013
Ops : pend=1250891 run=1075607374 enq=729855483 can=0 rej=0
Ops : ini=2861303425 dfr=418493 rel=2861303422 gc=418493
CacheOp: alo=0 luo=0 luc=0 gro=0
CacheOp: inv=0 upo=0 dro=0 pto=0 atc=0 syn=0
CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0
CacheEv: nsp=145 stl=0 rtr=0 cul=0
Looking at the page flags using page-types
The bug seems to be in ext4 pages being used for fscache.
I have root caused the bug to be a leak in ext4 cache page when trying
to read a page from fscache and the page is still in the NFS page lru
list.
I have simulated the situation by not freeing the backpages and we
can see the pattern matches with page flags
searching old archives found that there is a known fix, but was never
committed to upstream.
https://www.redhat.com/archives/linux-cachefs/2014-August/msg00017.html
This patch never made into mainline kernels, but looks like an obvious
fix for the problem,
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index ad74a6a..ead4981 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -523,7 +523,10 @@ static int cachefiles_read_backing_file(struct
cachefiles_object *object,
netpage->index, cachefiles_gfp);
if (ret < 0) {
if (ret == -EEXIST) {
+ page_cache_release(backpage);
+ backpage = NULL;
page_cache_release(netpage);
+ netpage = NULL;
fscache_retrieval_complete(op, 1);
continue;
}
@@ -596,7 +599,10 @@ static int cachefiles_read_backing_file(struct
cachefiles_object *object,
netpage->index, cachefiles_gfp);
if (ret < 0) {
if (ret == -EEXIST) {
+ page_cache_release(backpage);
+ backpage = NULL;
page_cache_release(netpage);
+ netpage = NULL;
fscache_retrieval_complete(op, 1);
continue;
}
David,
If applicable, can you please help pull this change into upstream release.
Thanks
Kiran
More information about the Linux-cachefs
mailing list