virsh vol-download uses a lot of memory

Peter Crowther peter.crowther at melandra.com
Wed Jan 22 12:44:50 UTC 2020


Architecturally, separating the data and control channels feels like the
right approach - whether nbd or something else. Would need signposting for
those of us who routinely implement firewalling on hosts, but that's a
detail.

I presume there's no flow control on streams at the moment?

Cheers,

Peter

On Wed, 22 Jan 2020, 12:18 Daniel P. Berrangé, <berrange at redhat.com> wrote:

> On Wed, Jan 22, 2020 at 01:01:42PM +0100, Michal Privoznik wrote:
> > On 1/22/20 11:11 AM, Michal Privoznik wrote:
> > > On 1/22/20 10:03 AM, R. Diez wrote:
> > > > Hi all:
> > > >
> > > > I am using the libvirt version that comes with Ubuntu 18.04.3 LTS.
> > >
> > > I'm sorry, I don't have Ubuntu installed anywhere to look the version
> > > up. Can you run 'virsh version' to find it out for me please?
> >
> > Nevermind, I've managed to reproduce with the latest libvirt anyway.
> >
> > >
> > > >
> > > > I have written a script that backs up my virtual machines every
> > > > night. I want to limit the amount of memory that this backup
> > > > operation consumes, mainly to prevent page cache thrashing. I have
> > > > described the Linux page cache thrashing issue in detail here:
> > > >
> > > >
> http://rdiez.shoutwiki.com/wiki/Today%27s_Operating_Systems_are_still_incredibly_brittle#The_Linux_Filesystem_Cache_is_Braindead
> > > >
> > > >
> > > > The VM virtual disk weighs 140 GB at the moment. I thought 500 MiB
> > > > of RAM should be more than enough to back it up, so I added the
> > > > following options to the systemd service file associated to the
> > > > systemd timer I am using:
> > > >
> > > >    MemoryLimit=500M
> > > >
> > > > However, the OOM is killing "virsh vol-download":
> > > >
> > > > Jan 21 23:40:00 GS-CEL-L kernel: [55535.913525] [  pid  ]   uid
> > > > tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
> > > > Jan 21 23:40:00 GS-CEL-L kernel: [55535.913527] [  13232]  1000
> > > > 13232     5030      786    77824      103             0
> > > > BackupWindows10
> > > > Jan 21 23:40:00 GS-CEL-L kernel: [55535.913528] [  13267]  1000
> > > > 13267     5063      567    73728      132             0
> > > > BackupWindows10
> > > > Jan 21 23:40:00 GS-CEL-L kernel: [55535.913529] [  13421]  1000
> > > > 13421     5063      458    73728      132             0
> > > > BackupWindows10
> > > > Jan 21 23:40:00 GS-CEL-L kernel: [55535.913530] [  13428]  1000
> > > > 13428 712847   124686  5586944   523997             0 virsh
> > > > Jan 21 23:40:00 GS-CEL-L kernel: [55535.913532]
> oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/system.slice/VmBackup.service,task_memcg=/system.slice/VmBackup.service,task=virsh,pid=13428,uid=1000
> > > >
> > > > Jan 21 23:40:00 GS-CEL-L kernel: [55535.913538] Memory cgroup out of
> > > > memory: Killed process 13428 (virsh) total-vm:2851388kB,
> > > > anon-rss:486180kB, file-rss:12564kB, shmem-rss:0kB
> > > >
> > > > I wonder why "virsh vol-download" needs so much RAM. It does not get
> > > > killed straight away, it takes a few minutes to get killed. It
> > > > starts using a VMSIZE of around 295 MiB, which is not really frugal
> > > > for a file download operation, but then it grows and grows.
> > >
> > > This is very likely a memory leak somewhere.
> >
> > Actually, it is not. It's caused by our design of the client event loop.
> If
> > there are any incoming data, read as much as possible placing them at the
> > end of linked list of incoming stream data (stream is a way that libvirt
> > uses to transfer binary data). Problem is that instead of returning NULL
> to
> > our malloc()-s once the limit is reached, kernel decides to kill us.
> >
> > For anybody with libvirt insight: virNetClientIOHandleInput() ->
> > virNetClientCallDispatch() -> virNetClientCallDispatchStream() ->
> > virNetClientStreamQueuePacket().
> >
> >
> > The obvious fix would be to stop processing incoming packets if stream
> has
> > "too much" data cached (define "too much"). But this may lead to
> > unresponsive client event loop - if the client doesn't pull data from
> > incoming stream fast enough they won't be able to make any other RPC.
>
> IMHO if they're not pulling stream data and still expecting to make
> other RPC calls in a timely manner, then their code is broken.
>
> Having said that, in retrospect I rather regret ever implementing our
> stream APIs as we did. We really should have just exposed an API which
> lets you spawn an NBD server associated with a storage volume, or
> tunnelled NBD over libvirtd. The former is probably our best strategy
> these days, now that NBD has native TLS support.
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20200122/dd1b2091/attachment.htm>


More information about the libvirt-users mailing list