[Virtio-fs] Large memory consumption by virtiofsd, suspecting fd's aren't being closed?

Eric Ernst eric.g.ernst at gmail.com
Tue Mar 23 14:52:22 UTC 2021


On Tue, Mar 23, 2021 at 6:47 AM Vivek Goyal <vgoyal at redhat.com> wrote:

> On Tue, Mar 23, 2021 at 12:55:26PM +0100, Sergio Lopez wrote:
> > On Mon, Mar 22, 2021 at 12:47:04PM -0400, Vivek Goyal wrote:
> > > On Mon, Mar 22, 2021 at 05:09:32PM +0100, Miklos Szeredi wrote:
> > > > On Mon, Mar 22, 2021 at 6:52 AM Eric Ernst <eric_ernst at apple.com>
> wrote:
> > > > >
> > > > > Hey ya’ll,
> > > > >
> > > > > One challenge I’ve been looking at is how to setup an appropriate
> memory cgroup limit for workloads that are leveraging virtiofs (ie, running
> pods with Kata Containers). I noticed that memory usage of the daemon
> itself can grow considerably depending on the workload; though much more
> than I’d expect.
> > > > >
> > > > > I’m running workload that simply runs a build on kernel sources
> with -j3. In doing this, the source of the linux kernel are shared via
> virtiofs (no DAX), so as the build goes on, there are a lot of files
> opened, closed, as well as created. The rss memory of virtiofsd grows into
> several hundreds of MBs.
> > > > >
> > > > > When taking a look, I’m suspecting that virtiofsd is carrying out
> the opens, but never actually closing fds. In the guest, I’m seeing fd’s on
> the order of 10-40 for all the container processes as it runs, whereas I
> see the number of fds for virtiofsd continually increasing, reaching over
> 80,000 fds. I’m guessing this isn’t expected?
> > > >
> > > > The reason could be that guest is keeping a ref on the inodes
> > > > (dcache->dentry->inode) and current implementation of server keeps an
> > > > O_PATH fd open for each inode referenced by the client.
> > > >
> > > > One way to avoid this is to use the "cache=none" option, which forces
> > > > the client to drop dentries immediately from the cache if not in use.
> > > > This is not desirable if cache is actually in use.
> > > >
> > > > The memory use of the server should still be limited by the memory
> use
> > > > of the guest:  if there's memory pressure in the guest kernel, then
> it
> > > > will clean out caches, which results in the memory use decreasing in
> > > > the server as well.  If the server memory use looks unbounded, that
> > > > might be indicative of too much memory used for dcache in the guest
> > > > (cat /proc/slabinfo | grep ^dentry).     Can you verify?
> > >
> > > Hi Miklos,
> > >
> > > Apart from above, we identified one more issue on IRC. I asked Eric
> > > to drop caches manually in guest. (echo 3 > /proc/sys/vm/drop_caches)
> > > and while it reduced the fds open it did not seem to free up
> significant
> > > amount of memory.
> > >
> > > So question remains where is that memory. One possibility is that we
> > > have memory allocated for mapping arrays (inode and fd). These arrays
> > > only grow and never shrink. So they can lock down some memory.
> > >
> > > But still, lot of lo_inode memory should have been freed when
> > > echo 3 > /proc/sys/vm/drop_caches was done. Why all that did not
> > > show up in virtiofsd RSS usage, that's kind of little confusing.
> >
> > Are you including "RssShmem" in "RSS usage"? If so, that could be
> > misleading. When virtiofsd[-rs] touches pages that reside in the
> > memory mapping that's shared with QEMU, those pages are accounted
> > in the virtiofsd[-rs] process's RssShmem too.
> >
> > In other words, the RSS value of the virtiofsd[-rs] process may be
> > overinflated because it includes pages that are actually shared
> > (there's no a second copy of them) with the QEMU process.
> >
> > This can be observed using a tool like "smem". Here's an example
> >
> >  - This virtiofsd-rs process appears to have a RSS of ~633 MiB
> >
> > root       13879 46.1  7.9 8467492 649132 pts/1  Sl+  11:33   0:52
> ./target/debug/virtiofsd-rs
> > root       13947 69.3 13.4 5638580 1093876 pts/0 Sl+  11:33   1:14
> qemu-system-x86_64
> >
> >  - In /proc/13879/status we can observe most of that memory is
> >    actually RssShmem:
> >
> > RssAnon:          9624 kB
> > RssFile:          5136 kB
> > RssShmem:       634372 kB
>
> Hi Sergio,
>
> Thanks for this observation about RssShmem. I also ran virtiofsd and
> observed memory usage just now and it indeed looks like that only
> RssShmem usage is very high.
>
> RssAnon:            4884 kB
> RssFile:            1900 kB
> RssShmem:        1050244 kB
>
> And as you point out that this memory is being shared with QEMU. So
> looks like from cgroup point of view, we should put virtiofsd and
> qemu in same cgroup and have a combined memory limit so that this
> shared memory is accounting looks proper.
>
> Eric, does this sound reasonable.
>

Sergio, Vivek --

Today QEMU/virtiofsd do live within the same memory cgroup, and are bound
by that same overhead I need to introduce. Good to know regarding the
sharing (this restores some sanity to my observations, thank you!), but the
real crux of the problem is two items:
 1) the FDs are held long after the application in guest is done with them
because of dentry cache in the guest (when cache=auto for virtiofsd).
 2) virtiofsd/QEMU is holding on to the memory after the fds are released

--Eric



>
> Thanks
> Vivek
>
> >
> >  - In "smem", we can see a similar amount of RSS, but the PSS is
> >    roughly half the size because "smem" is splitting it up between
> >    virtiofsd-rs and QEMU:
> >
> > [root at localhost ~]# smem -P virtiofsd-rs -P qemu
> >   PID User     Command                         Swap      USS      PSS
>   RSS
> > 13879 root     ./target/debug/virtiofsd-rs        0    13412   337019
>  662392
> > 13947 root     qemu-system-x86_64 -enable-        0   434224   760096
> 1094392
> >
> >  - If we terminate the virtiofsd-rs process, the output of "smem" now
> >    shows that QEMU's PSS has grown to account for the PSS that was
> >    previously assigned to virtiofsd-rs too, so we can confirm that was
> >    memory shared between both processes.
> >
> >   PID User     Command                         Swap      USS      PSS
>   RSS
> > 13947 root     qemu-system-x86_64 -enable-        0  1082656  1084966
> 1095692
> >
> > Just to be 100% sure, I've also run "heaptrack" on a virtiofsd-rs
> > instance, and can confirm that the actual heap usage of the process
> > was around 5-6 MiB.
> >
> > Sergio.
>
>
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs at redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/virtio-fs/attachments/20210323/8ccd3f96/attachment.htm>


More information about the Virtio-fs mailing list