virsh vol-download uses a lot of memory

Michal Privoznik mprivozn at redhat.com
Wed Jan 22 12:01:42 UTC 2020


On 1/22/20 11:11 AM, Michal Privoznik wrote:
> On 1/22/20 10:03 AM, R. Diez wrote:
>> Hi all:
>>
>> I am using the libvirt version that comes with Ubuntu 18.04.3 LTS.
> 
> I'm sorry, I don't have Ubuntu installed anywhere to look the version 
> up. Can you run 'virsh version' to find it out for me please?

Nevermind, I've managed to reproduce with the latest libvirt anyway.

> 
>>
>> I have written a script that backs up my virtual machines every night. 
>> I want to limit the amount of memory that this backup operation 
>> consumes, mainly to prevent page cache thrashing. I have described the 
>> Linux page cache thrashing issue in detail here:
>>
>> http://rdiez.shoutwiki.com/wiki/Today%27s_Operating_Systems_are_still_incredibly_brittle#The_Linux_Filesystem_Cache_is_Braindead 
>>
>>
>> The VM virtual disk weighs 140 GB at the moment. I thought 500 MiB of 
>> RAM should be more than enough to back it up, so I added the following 
>> options to the systemd service file associated to the systemd timer I 
>> am using:
>>
>>    MemoryLimit=500M
>>
>> However, the OOM is killing "virsh vol-download":
>>
>> Jan 21 23:40:00 GS-CEL-L kernel: [55535.913525] [  pid  ]   uid  tgid 
>> total_vm      rss pgtables_bytes swapents oom_score_adj name
>> Jan 21 23:40:00 GS-CEL-L kernel: [55535.913527] [  13232]  1000 
>> 13232     5030      786    77824      103             0 BackupWindows10
>> Jan 21 23:40:00 GS-CEL-L kernel: [55535.913528] [  13267]  1000 
>> 13267     5063      567    73728      132             0 BackupWindows10
>> Jan 21 23:40:00 GS-CEL-L kernel: [55535.913529] [  13421]  1000 
>> 13421     5063      458    73728      132             0 BackupWindows10
>> Jan 21 23:40:00 GS-CEL-L kernel: [55535.913530] [  13428]  1000 13428 
>> 712847   124686  5586944   523997             0 virsh
>> Jan 21 23:40:00 GS-CEL-L kernel: [55535.913532] 
>> oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/system.slice/VmBackup.service,task_memcg=/system.slice/VmBackup.service,task=virsh,pid=13428,uid=1000 
>>
>> Jan 21 23:40:00 GS-CEL-L kernel: [55535.913538] Memory cgroup out of 
>> memory: Killed process 13428 (virsh) total-vm:2851388kB, 
>> anon-rss:486180kB, file-rss:12564kB, shmem-rss:0kB
>>
>> I wonder why "virsh vol-download" needs so much RAM. It does not get 
>> killed straight away, it takes a few minutes to get killed. It starts 
>> using a VMSIZE of around 295 MiB, which is not really frugal for a 
>> file download operation, but then it grows and grows.
> 
> This is very likely a memory leak somewhere. 

Actually, it is not. It's caused by our design of the client event loop. 
If there are any incoming data, read as much as possible placing them at 
the end of linked list of incoming stream data (stream is a way that 
libvirt uses to transfer binary data). Problem is that instead of 
returning NULL to our malloc()-s once the limit is reached, kernel 
decides to kill us.

For anybody with libvirt insight: virNetClientIOHandleInput() -> 
virNetClientCallDispatch() -> virNetClientCallDispatchStream() -> 
virNetClientStreamQueuePacket().


The obvious fix would be to stop processing incoming packets if stream 
has "too much" data cached (define "too much"). But this may lead to 
unresponsive client event loop - if the client doesn't pull data from 
incoming stream fast enough they won't be able to make any other RPC.

Anybody got any ideas?

Michal




More information about the libvirt-users mailing list