[PATCH v2 0/4] qemu: Add support for free-page-reporting

David Hildenbrand david at redhat.com
Wed Oct 14 14:07:35 UTC 2020


On 14.10.20 15:46, Michal Privoznik wrote:
> On 10/14/20 2:06 PM, David Hildenbrand wrote:
>> On 14.10.20 13:53, Michal Privoznik wrote:
>>> On 10/14/20 10:26 AM, David Hildenbrand wrote:
>>>> On 14.10.20 08:30, Michal Privoznik wrote:
>>>
>> <snip/>
>>
>> No, not at all. Thanks for reporting!
>>
>> And the "bad" thing is, that QEMU doesn't do anything too fancy. All it
>> does is "fallocate(FALLOC_FL_PUNCH_HOLE)" on hugetlbfs when trying to
>> zap reported pages. The same mechanism is also used for postcopy live
>> migration and virtio-mem with hugetlbfs.
>>
>> Which kernel are you running?
>>
>> 1. Is it an upstream kernel, lkml + -mm lists are the right place
>> (please cc me, or I can try to reproduce and report it).
>>
>> 2. Is it a distro kernel? Then create a BUG there.
>>
>> I was just recently testing virtio-mem with hugetlbfs and it worked on
>> decent upstream Fedora. But maybe I was not able to trigger it.
> 
> Okay, I've upgraded to 5.9.0-gentoo, but the problem persists. Gentoo 
> puts only a very few patches on top of vanilla kernel neither of which 
> touches that area of the code:
> 
> https://dev.gentoo.org/~mpagano/genpatches/trunk/5.9/
> 
> So I think this is reproducible on vanilla too.
> 
> BTW: Have you tried placing the qemu inside v1 cgroups? Libvirt does 
> that so maybe that's the problem. Anyway, here's the cmd line:
> 
> /home/zippy/work/qemu/qemu.git/build/qemu-system-x86_64 \
> -name guest=fedora,debug-threads=on \
> -S \
> -object 
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-fedora/master-key.aes 
> \
> -machine 
> pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram \
> -cpu host,migratable=on \
> -m 4096 \
> -object 
> memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,prealloc=yes,size=4294967296,host-nodes=0,policy=bind 
> \
> -overcommit mem-lock=off \
> -smp 4,sockets=1,dies=1,cores=2,threads=2 \
> -object iothread,id=iothread1 \
> -object iothread,id=iothread2 \
> -object iothread,id=iothread3 \
> -object iothread,id=iothread4 \
> -uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \
> -no-user-config \
> -nodefaults \
> -device sga \
> -chardev socket,id=charmonitor,fd=33,server,nowait \
> -mon chardev=charmonitor,id=monitor,mode=control \
> -rtc base=utc \
> -no-shutdown \
> -global PIIX4_PM.disable_s3=0 \
> -global PIIX4_PM.disable_s4=0 \
> -boot menu=on,strict=on \
> -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
> -blockdev 
> '{"driver":"file","filename":"/var/lib/libvirt/images/fedora.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' 
> \
> -blockdev 
> '{"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-1-storage","backing":null}' 
> \
> -device 
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-1-format,id=scsi0-0-0-0,bootindex=1 
> \
> -netdev tap,fd=35,id=hostnet0 \
> -device 
> virtio-net-pci,host_mtu=9000,netdev=hostnet0,id=net0,mac=52:54:00:a4:6f:91,bus=pci.0,addr=0x3 
> \
> -chardev pty,id=charserial0 \
> -device isa-serial,chardev=charserial0,id=serial0 \
> -chardev socket,id=charchannel0,fd=36,server,nowait \
> -device 
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 
> \
> -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on \
> -device virtio-vga,id=video0,virgl=on,max_outputs=1,bus=pci.0,addr=0x2 \
> -device 
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7,free-page-reporting=on \
> -sandbox 
> on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
> -msg timestamp=on
> 

Thanks!

Reproduced easily on Fedora 32 (5.7.16-200.fc32.x86_64).

[   70.641802] CPU: 3 PID: 2178 Comm: qemu-system-x86 Not tainted 5.7.16-200.fc32.x86_64 #1
[   70.641802] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020
[   70.641803] RIP: 0010:page_counter_uncharge+0x4b/0x50
[   70.641804] Code: 0f c1 45 00 4c 29 e0 48 89 ef 48 89 c3 48 89 c6 e8 2a fe ff ff 48 85 db 78 10 48 8b 6d 20 48 85 ed 75 d8 5b 5d 41 5c 41 5d c3 <0f> 0b eb ec 90 0f 1f 44 00 00 48 8b 17 48 39 d6 72 41 41 54 49 89
[   70.641804] RSP: 0018:ffffb4044139bb18 EFLAGS: 00010286
[   70.641805] RAX: fffffffffff94600 RBX: fffffffffff94600 RCX: ffff8da63d007000
[   70.641805] RDX: 000000000000046e RSI: fffffffffff94600 RDI: ffff8da678412e28
[   70.641806] RBP: ffff8da678412e28 R08: ffff8da678412e28 R09: ffff8da63d0078c0
[   70.641806] R10: ffff8da634173000 R11: 0000000000000007 R12: 000000000008dc00
[   70.641806] R13: fffffffffff72400 R14: ffff8da63d007000 R15: 0000000000000391
[   70.641807] FS:  00007fe7ab5fe700(0000) GS:ffff8da67eac0000(0000) knlGS:0000000000000000
[   70.641808] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   70.641808] CR2: 000055568796a468 CR3: 0000000fe9860000 CR4: 0000000000340ee0
[   70.641808] Call Trace:
[   70.641813]  hugetlb_cgroup_uncharge_file_region+0x4b/0x80
[   70.641815]  region_del+0x1d3/0x300
[   70.641816]  hugetlb_unreserve_pages+0x39/0xb0
[   70.641818]  remove_inode_hugepages+0x1a8/0x3d0
[   70.641831]  ? kvm_mmu_notifier_invalidate_range+0x38/0x60 [kvm]
[   70.641832]  ? tlb_finish_mmu+0x7a/0x1d0
[   70.641833]  hugetlbfs_fallocate+0x3ac/0x5e0
[   70.641835]  ? avc_has_perm+0x3b/0x160
[   70.641836]  ? file_has_perm+0xa2/0xb0
[   70.641837]  ? selinux_inode_follow_link+0x4c/0xb0
[   70.641838]  ? selinux_file_permission+0x4e/0x120
[   70.641839]  ? security_file_permission+0x2e/0x160
[   70.641840]  vfs_fallocate+0x146/0x280
[   70.641841]  __x64_sys_fallocate+0x3e/0x70
[   70.641843]  do_syscall_64+0x5b/0xf0
[   70.641846]  entry_SYSCALL_64_after_hwframe+0x44/0xa9


Note: prealloc=yes is a bad choice in this environment. It
contradicts memory overcommit - what we want to optimize with
free page reporting. You allocate all VM memory to throw it away
once the guest is up. Is there an option to turn this of with
hugetlbfs? I hope so.

I'll try reproducing upstream and send a BUG report upstream, ccing you. Thanks!

> 
> Michal
> 


-- 
Thanks,

David / dhildenb




More information about the libvir-list mailing list