[Virtio-fs] Large memory consumption by virtiofsd, suspecting fd's aren't being closed?

Sergio Lopez slp at redhat.com
Tue Mar 23 15:19:40 UTC 2021


On Tue, Mar 23, 2021 at 07:52:22AM -0700, Eric Ernst wrote:
> On Tue, Mar 23, 2021 at 6:47 AM Vivek Goyal <vgoyal at redhat.com> wrote:
> 
> > On Tue, Mar 23, 2021 at 12:55:26PM +0100, Sergio Lopez wrote:
> > > On Mon, Mar 22, 2021 at 12:47:04PM -0400, Vivek Goyal wrote:
> > > > On Mon, Mar 22, 2021 at 05:09:32PM +0100, Miklos Szeredi wrote:
> > > > > On Mon, Mar 22, 2021 at 6:52 AM Eric Ernst <eric_ernst at apple.com>
> > wrote:
> > > > > >
> > > > > > Hey ya’ll,
> > > > > >
> > > > > > One challenge I’ve been looking at is how to setup an appropriate
> > memory cgroup limit for workloads that are leveraging virtiofs (ie, running
> > pods with Kata Containers). I noticed that memory usage of the daemon
> > itself can grow considerably depending on the workload; though much more
> > than I’d expect.
> > > > > >
> > > > > > I’m running workload that simply runs a build on kernel sources
> > with -j3. In doing this, the source of the linux kernel are shared via
> > virtiofs (no DAX), so as the build goes on, there are a lot of files
> > opened, closed, as well as created. The rss memory of virtiofsd grows into
> > several hundreds of MBs.
> > > > > >
> > > > > > When taking a look, I’m suspecting that virtiofsd is carrying out
> > the opens, but never actually closing fds. In the guest, I’m seeing fd’s on
> > the order of 10-40 for all the container processes as it runs, whereas I
> > see the number of fds for virtiofsd continually increasing, reaching over
> > 80,000 fds. I’m guessing this isn’t expected?
> > > > >
> > > > > The reason could be that guest is keeping a ref on the inodes
> > > > > (dcache->dentry->inode) and current implementation of server keeps an
> > > > > O_PATH fd open for each inode referenced by the client.
> > > > >
> > > > > One way to avoid this is to use the "cache=none" option, which forces
> > > > > the client to drop dentries immediately from the cache if not in use.
> > > > > This is not desirable if cache is actually in use.
> > > > >
> > > > > The memory use of the server should still be limited by the memory
> > use
> > > > > of the guest:  if there's memory pressure in the guest kernel, then
> > it
> > > > > will clean out caches, which results in the memory use decreasing in
> > > > > the server as well.  If the server memory use looks unbounded, that
> > > > > might be indicative of too much memory used for dcache in the guest
> > > > > (cat /proc/slabinfo | grep ^dentry).     Can you verify?
> > > >
> > > > Hi Miklos,
> > > >
> > > > Apart from above, we identified one more issue on IRC. I asked Eric
> > > > to drop caches manually in guest. (echo 3 > /proc/sys/vm/drop_caches)
> > > > and while it reduced the fds open it did not seem to free up
> > significant
> > > > amount of memory.
> > > >
> > > > So question remains where is that memory. One possibility is that we
> > > > have memory allocated for mapping arrays (inode and fd). These arrays
> > > > only grow and never shrink. So they can lock down some memory.
> > > >
> > > > But still, lot of lo_inode memory should have been freed when
> > > > echo 3 > /proc/sys/vm/drop_caches was done. Why all that did not
> > > > show up in virtiofsd RSS usage, that's kind of little confusing.
> > >
> > > Are you including "RssShmem" in "RSS usage"? If so, that could be
> > > misleading. When virtiofsd[-rs] touches pages that reside in the
> > > memory mapping that's shared with QEMU, those pages are accounted
> > > in the virtiofsd[-rs] process's RssShmem too.
> > >
> > > In other words, the RSS value of the virtiofsd[-rs] process may be
> > > overinflated because it includes pages that are actually shared
> > > (there's no a second copy of them) with the QEMU process.
> > >
> > > This can be observed using a tool like "smem". Here's an example
> > >
> > >  - This virtiofsd-rs process appears to have a RSS of ~633 MiB
> > >
> > > root       13879 46.1  7.9 8467492 649132 pts/1  Sl+  11:33   0:52
> > ./target/debug/virtiofsd-rs
> > > root       13947 69.3 13.4 5638580 1093876 pts/0 Sl+  11:33   1:14
> > qemu-system-x86_64
> > >
> > >  - In /proc/13879/status we can observe most of that memory is
> > >    actually RssShmem:
> > >
> > > RssAnon:          9624 kB
> > > RssFile:          5136 kB
> > > RssShmem:       634372 kB
> >
> > Hi Sergio,
> >
> > Thanks for this observation about RssShmem. I also ran virtiofsd and
> > observed memory usage just now and it indeed looks like that only
> > RssShmem usage is very high.
> >
> > RssAnon:            4884 kB
> > RssFile:            1900 kB
> > RssShmem:        1050244 kB
> >
> > And as you point out that this memory is being shared with QEMU. So
> > looks like from cgroup point of view, we should put virtiofsd and
> > qemu in same cgroup and have a combined memory limit so that this
> > shared memory is accounting looks proper.
> >
> > Eric, does this sound reasonable.
> >
> 
> Sergio, Vivek --
> 
> Today QEMU/virtiofsd do live within the same memory cgroup, and are bound
> by that same overhead I need to introduce. Good to know regarding the
> sharing (this restores some sanity to my observations, thank you!), but the
> real crux of the problem is two items:
>  1) the FDs are held long after the application in guest is done with them
> because of dentry cache in the guest (when cache=auto for
> virtiofsd).

Yeah, we're looking into mechanisms that would allow us to avoid
needing to hold an FD for each reference in the dentry cache, but
it'll take a while. :-/

If this becomes a problem, I'm afraid the only short term solution
would be fixing mmap for "cache=none", and then accepting the
performance hit that may suppose.

>  2) virtiofsd/QEMU is holding on to the memory after the fds are released

The shared memory mapping between virtiofsd[-rs] and QEMU is actually
the same one that's backing the guest's RAM. It's expected that the
guest OS will eventually touch most of the pages available to it,
making QEMU's RSS to grow to a number close to the amount of RAM
configured for the VM. This should happen with or without virtio-fs.

Are you using virtio-balloon's free page reporting or some other
feature to return free pages to the Host?

Sergio.

> --Eric
> 
> 
> 
> >
> > Thanks
> > Vivek
> >
> > >
> > >  - In "smem", we can see a similar amount of RSS, but the PSS is
> > >    roughly half the size because "smem" is splitting it up between
> > >    virtiofsd-rs and QEMU:
> > >
> > > [root at localhost ~]# smem -P virtiofsd-rs -P qemu
> > >   PID User     Command                         Swap      USS      PSS
> >   RSS
> > > 13879 root     ./target/debug/virtiofsd-rs        0    13412   337019
> >  662392
> > > 13947 root     qemu-system-x86_64 -enable-        0   434224   760096
> > 1094392
> > >
> > >  - If we terminate the virtiofsd-rs process, the output of "smem" now
> > >    shows that QEMU's PSS has grown to account for the PSS that was
> > >    previously assigned to virtiofsd-rs too, so we can confirm that was
> > >    memory shared between both processes.
> > >
> > >   PID User     Command                         Swap      USS      PSS
> >   RSS
> > > 13947 root     qemu-system-x86_64 -enable-        0  1082656  1084966
> > 1095692
> > >
> > > Just to be 100% sure, I've also run "heaptrack" on a virtiofsd-rs
> > > instance, and can confirm that the actual heap usage of the process
> > > was around 5-6 MiB.
> > >
> > > Sergio.
> >
> >
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs at redhat.com
> > https://listman.redhat.com/mailman/listinfo/virtio-fs
> >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/virtio-fs/attachments/20210323/4430dfa8/attachment.sig>


More information about the Virtio-fs mailing list