[vfio-users] qemu stuck when hot-add memory to a virtual machine with a device passthrough
Wuzongyong (Euler Dept)
cordius.wu at huawei.com
Sat Apr 21 09:02:14 UTC 2018
> > > > Hi,
> > > >
> > > > The qemu process will stuck when hot-add large size memory to
> > > > the virtual machine with a device passtrhough.
> > > > We found it is too slow to pin and map pages in vfio_dma_do_map.
> > > > Is there any method to improve this process?
> > >
> > > At what size do you start to see problems? The time to map a
> > > section of memory should be directly proportional to the size. As
> > > the size is increased, it will take longer, but I don't know why
> > > you'd reach a point of not making forward progress. Is it actually
> > > stuck or is it just taking longer than you want? Using hugepages
> > > can certainly help, we still need to pin each PAGE_SIZE page within
> > > the hugepage, but we'll have larger contiguous regions and therefore
> > > call iommu_map() less frequently. Please share more data. Thanks,
> > >
> > > Alex
> > It just take longer time, instead of actually stuck.
> > We found that the problem exist when we hot-added 16G memory. And it
> > will consume tens of minutes when we hot-added 1T memory.
>
> Is the stall adding 1TB roughly 64 times the stall adding 16GB or do we
> have some inflection in the size vs time curve? There is a cost to
> pinning an mapping through the IOMMU, perhaps we can improve that, but I
> don't see how we can eliminate it or how it wouldn't be at least linear
> compared to the size of memory added without moving to a page request
> model, which hardly any hardware currently supports. A workaround might
> be to incrementally add memory in smaller chunks which generate a less
> noticeable stall. Thanks,
>
> Alex
I collected a part of report as below recorded by perf when I hot-added 24GB memory:
+ 63.41% 0.00% qemu-kvm qemu-kvm-2.8.1-25.127 [.] 0xffffffffffc7534a
+ 63.41% 0.00% qemu-kvm [kernel.vmlinux] [k] do_vfs_ioctl
+ 63.41% 0.00% qemu-kvm [kernel.vmlinux] [k] sys_ioctl
+ 63.41% 0.00% qemu-kvm libc-2.17.so [.] __GI___ioctl
+ 63.41% 0.00% qemu-kvm qemu-kvm-2.8.1-25.127 [.] 0xffffffffffc71c59
+ 63.10% 0.00% qemu-kvm [vfio] [k] vfio_fops_unl_ioctl
+ 63.10% 0.00% qemu-kvm qemu-kvm-2.8.1-25.127 [.] 0xffffffffffcbbb6a
+ 63.10% 0.02% qemu-kvm [vfio_iommu_type1] [k] vfio_iommu_type1_ioctl
+ 60.67% 0.31% qemu-kvm [vfio_iommu_type1] [k] vfio_pin_pages_remote
+ 60.06% 0.46% qemu-kvm [vfio_iommu_type1] [k] vaddr_get_pfn
+ 59.61% 0.95% qemu-kvm [kernel.vmlinux] [k] get_user_pages_fast
+ 54.28% 0.02% qemu-kvm [kernel.vmlinux] [k] get_user_pages_unlocked
+ 54.24% 0.04% qemu-kvm [kernel.vmlinux] [k] __get_user_pages
+ 54.13% 0.01% qemu-kvm [kernel.vmlinux] [k] handle_mm_fault
+ 54.08% 0.03% qemu-kvm [kernel.vmlinux] [k] do_huge_pmd_anonymous_page
+ 52.09% 52.09% qemu-kvm [kernel.vmlinux] [k] clear_page
+ 9.42% 0.12% swapper [kernel.vmlinux] [k] cpu_startup_entry
+ 9.20% 0.00% swapper [kernel.vmlinux] [k] start_secondary
+ 8.85% 0.02% swapper [kernel.vmlinux] [k] arch_cpu_idle
+ 8.79% 0.07% swapper [kernel.vmlinux] [k] cpuidle_idle_call
+ 6.16% 0.29% swapper [kernel.vmlinux] [k] apic_timer_interrupt
+ 5.73% 0.07% swapper [kernel.vmlinux] [k] smp_apic_timer_interrupt
+ 4.34% 0.99% qemu-kvm [kernel.vmlinux] [k] gup_pud_range
+ 3.56% 0.16% swapper [kernel.vmlinux] [k] local_apic_timer_interrupt
+ 3.32% 0.41% swapper [kernel.vmlinux] [k] hrtimer_interrupt
+ 3.25% 3.21% qemu-kvm [kernel.vmlinux] [k] gup_huge_pmd
+ 2.31% 0.01% qemu-kvm [kernel.vmlinux] [k] iommu_map
+ 2.30% 0.00% qemu-kvm [kernel.vmlinux] [k] intel_iommu_map
It seems that the bottleneck is trying to pin pages through get_user_pages instead of do iommu mapping.
Thanks,
Wu Zongyong
More information about the vfio-users
mailing list