[Virtio-fs] Large memory consumption by virtiofsd, suspecting fd's aren't being closed?

Vivek Goyal vgoyal at redhat.com
Tue Mar 23 13:47:33 UTC 2021


On Tue, Mar 23, 2021 at 12:55:26PM +0100, Sergio Lopez wrote:
> On Mon, Mar 22, 2021 at 12:47:04PM -0400, Vivek Goyal wrote:
> > On Mon, Mar 22, 2021 at 05:09:32PM +0100, Miklos Szeredi wrote:
> > > On Mon, Mar 22, 2021 at 6:52 AM Eric Ernst <eric_ernst at apple.com> wrote:
> > > >
> > > > Hey ya’ll,
> > > >
> > > > One challenge I’ve been looking at is how to setup an appropriate memory cgroup limit for workloads that are leveraging virtiofs (ie, running pods with Kata Containers). I noticed that memory usage of the daemon itself can grow considerably depending on the workload; though much more than I’d expect.
> > > >
> > > > I’m running workload that simply runs a build on kernel sources with -j3. In doing this, the source of the linux kernel are shared via virtiofs (no DAX), so as the build goes on, there are a lot of files opened, closed, as well as created. The rss memory of virtiofsd grows into several hundreds of MBs.
> > > >
> > > > When taking a look, I’m suspecting that virtiofsd is carrying out the opens, but never actually closing fds. In the guest, I’m seeing fd’s on the order of 10-40 for all the container processes as it runs, whereas I see the number of fds for virtiofsd continually increasing, reaching over 80,000 fds. I’m guessing this isn’t expected?
> > > 
> > > The reason could be that guest is keeping a ref on the inodes
> > > (dcache->dentry->inode) and current implementation of server keeps an
> > > O_PATH fd open for each inode referenced by the client.
> > > 
> > > One way to avoid this is to use the "cache=none" option, which forces
> > > the client to drop dentries immediately from the cache if not in use.
> > > This is not desirable if cache is actually in use.
> > > 
> > > The memory use of the server should still be limited by the memory use
> > > of the guest:  if there's memory pressure in the guest kernel, then it
> > > will clean out caches, which results in the memory use decreasing in
> > > the server as well.  If the server memory use looks unbounded, that
> > > might be indicative of too much memory used for dcache in the guest
> > > (cat /proc/slabinfo | grep ^dentry).     Can you verify?
> > 
> > Hi Miklos,
> > 
> > Apart from above, we identified one more issue on IRC. I asked Eric
> > to drop caches manually in guest. (echo 3 > /proc/sys/vm/drop_caches)
> > and while it reduced the fds open it did not seem to free up significant
> > amount of memory.
> > 
> > So question remains where is that memory. One possibility is that we
> > have memory allocated for mapping arrays (inode and fd). These arrays
> > only grow and never shrink. So they can lock down some memory.
> > 
> > But still, lot of lo_inode memory should have been freed when
> > echo 3 > /proc/sys/vm/drop_caches was done. Why all that did not
> > show up in virtiofsd RSS usage, that's kind of little confusing.
> 
> Are you including "RssShmem" in "RSS usage"? If so, that could be
> misleading. When virtiofsd[-rs] touches pages that reside in the
> memory mapping that's shared with QEMU, those pages are accounted
> in the virtiofsd[-rs] process's RssShmem too.
> 
> In other words, the RSS value of the virtiofsd[-rs] process may be
> overinflated because it includes pages that are actually shared
> (there's no a second copy of them) with the QEMU process.
> 
> This can be observed using a tool like "smem". Here's an example
> 
>  - This virtiofsd-rs process appears to have a RSS of ~633 MiB
>  
> root       13879 46.1  7.9 8467492 649132 pts/1  Sl+  11:33   0:52 ./target/debug/virtiofsd-rs
> root       13947 69.3 13.4 5638580 1093876 pts/0 Sl+  11:33   1:14 qemu-system-x86_64
> 
>  - In /proc/13879/status we can observe most of that memory is
>    actually RssShmem:
> 
> RssAnon:	    9624 kB
> RssFile:	    5136 kB
> RssShmem:	  634372 kB

Hi Sergio,

Thanks for this observation about RssShmem. I also ran virtiofsd and
observed memory usage just now and it indeed looks like that only
RssShmem usage is very high.

RssAnon:            4884 kB
RssFile:            1900 kB
RssShmem:        1050244 kB

And as you point out that this memory is being shared with QEMU. So
looks like from cgroup point of view, we should put virtiofsd and
qemu in same cgroup and have a combined memory limit so that this
shared memory is accounting looks proper.

Eric, does this sound reasonable.

Thanks
Vivek

> 
>  - In "smem", we can see a similar amount of RSS, but the PSS is
>    roughly half the size because "smem" is splitting it up between
>    virtiofsd-rs and QEMU:
> 
> [root at localhost ~]# smem -P virtiofsd-rs -P qemu
>   PID User     Command                         Swap      USS      PSS      RSS 
> 13879 root     ./target/debug/virtiofsd-rs        0    13412   337019   662392 
> 13947 root     qemu-system-x86_64 -enable-        0   434224   760096  1094392 
> 
>  - If we terminate the virtiofsd-rs process, the output of "smem" now
>    shows that QEMU's PSS has grown to account for the PSS that was
>    previously assigned to virtiofsd-rs too, so we can confirm that was
>    memory shared between both processes.
> 
>   PID User     Command                         Swap      USS      PSS      RSS 
> 13947 root     qemu-system-x86_64 -enable-        0  1082656  1084966  1095692 
> 
> Just to be 100% sure, I've also run "heaptrack" on a virtiofsd-rs
> instance, and can confirm that the actual heap usage of the process
> was around 5-6 MiB.
> 
> Sergio.





More information about the Virtio-fs mailing list